参考文献/References:
[1] JOHNSON J,KRISHNA R,STARK M,et al. Image retrieval using scene graphs[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston,USA:IEEE,2015:3668-3678.
[2]田鑫,季怡,高海燕,等. 外部信息引导和残差置乱的场景图生成方法[J]. 计算机科学与探索,2021,15(10):1958-1968.
[3]黄勇韬,严华. 结合注意力机制与特征融合的场景图生成模型[J]. 计算机科学,2020,47(6):133-137.
[4]庄志刚,许青林. 一种结合多尺度特征图和环型关系推理的场景图生成模型[J]. 计算机科学,2020,47(4):136-141.
[5]LI Y K,OUYANG W L,ZHOU B L,et al. Factorizable net:an efficient subgraph-based framework for scene graph generation[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:335-351.
[6]GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus,USA:IEEE,2014:580-587.
[7]GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago,Chile:IEEE,2015:1440-1448.
[8]REN S Q,HE K M,GIRSHICK R,et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions of Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[9]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:Unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA:IEEE,2016:779-788.
[10]REDMON J,FARHADI A. Yolov3:An incremental improvement[J]. arXiv Preprint arXiv:1804.02767,2018.
[11]LIU W,ANGUELOV D,ERHAN D,et al. SSD:Single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision. Amsterdam,The Netherlands:Springer,2016:21-37.
[12]LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,32(2):318-327.
[13]WU Z H,PAN S R,CHEN F W,et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,32(1):4-24.
[14]KIPF T N,WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv Preprint arXiv:1609.02907,2016.
[15]VELIACˇG1KOVIAC’G1 P,CUCURULL G,CASANOVA A,et al. Graph attention networks[J]. arXiv Preprint arXiv:1710.10903,2017.
[16]HAMILTON W L,YING R,LESKOVEC J. Inductive representation learning on large graphs[J]. arXiv Preprint arXiv:1706.02216,2017.
[17]CHEN T S,YU W H,CHEN R Q,et al. Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,USA:IEEE,2019:6163-6171.
[18]NEUBECK A,VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition(ICPR 2006). Hong Kong,China:IEEE,2006:850-855.
[19]KRISHNA R,ZHU Y K,GROTH O,et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision,2017,123:32-73.
[20]XU D F,ZHU Y K,CHOY C B,et al. Scene graph generation by iterative message passing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,USA:IEEE,2017:3097-3106.
[21]LI Y K,OUYANG W L,ZHOU B L,et al. Scene graph generation from objects,phrases and region captions[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice,Italy:IEEE,2017:1270-1279.
[22]LU C W,KRISHNA R,BERNSTEIN M,et al. Visual relationship detection with language priors[C]//Proceedings of the 2016 European Conference on Computer Vision. Amsterdam,The Netherlands:Springer,2016:852-869.
[23]YANG J W,LU J S,LEE S,et al. Graph R-CNN for scene graph generation[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:670-685.
[24]Lü J M,XIAO Q Z,ZHONG J J. AVR:Attention based salient visual relationship detection[J]. arXiv Preprint arXiv:2003.07012,2020.