«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1672-1292.2022.02.008]
点击复制

基于语义连通图的场景图生成算法

分享到：

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

卷:: 22卷
期数:: 2022年02期

页码:: 048-55

栏目:: 计算机科学与技术

出版日期:: 2022-06-30

文章信息/Info

Title:: Scene Graph Generation Based on Semantic Connected Graph

文章编号:: 1672-1292(2022)02-0048-08

作者:: 姜有亮¹; 张锋军²; 沈沛意¹; 3; 张亮¹; 3; (1.西安电子科技大学计算机科学与技术学院,陕西西安 710071)(2.中国电子科技网络信息安全有限公司,四川成都 610041)(3.西安电子科技大学西安市智能软件工程重点实验室,陕西西安 710071)

Author(s):: Jiang Youliang¹; Zhang Fengjun²; Shen Peiyi¹; 3; Zhang Liang¹; 3; (1.School of Computer Science and Technology,Xidian University,Xi’an 710071,China)(2.China Electronics Technology Cyber Security Co.,Ltd.,Chengdu 610041,China)(3.Xi’an Key Laboratory of Intelligent Software Engineering,Xidian University,Xi’an 710071,China)

关键词:: 场景图生成; 图卷积神经网络; 目标检测; 视觉关系检测; 场景语义理解

Keywords:: scene graph generation; graph convolution network; object detection; visual relationship detection; scene semantic understanding

分类号:: TP311

DOI:: 10.3969/j.issn.1672-1292.2022.02.008

文献标志码:: A

摘要:: 提出了基于语义连通图的场景图生成算法. 将关系检测过程分为关系建议和关系推理两步; 以目标检测算法得到的候选对象为节点集合,构建一个全连接图; 使用物体的类别信息和相对空间关系计算物体之间存在关系的概率; 通过设置阈值来删除图中的无效连接,得到稀疏的语义连通图; 使用图神经网络聚合物体节点的特征进行聚合,融合上下文信息. 根据语义连通图的连接关系,结合更新后的主语和宾语特征以及两个物体联合区域的特征,构建关系特征,预测图中的每条边对应的关系类别.

Abstract:: A scene graph generation algorithm based on semantic connected graph is proposed. Relationship detection process can be divided into two steps:relationship advice and reasoning. The detected object candidates are used as nodes to build one fully connected diagram. Object category and relative space information are used to calculate the relationship probability between objects. A threshold is utilized to remove the invalid connection and build the sparse semantic connected graph. A graph neural network method is used to aggregate the node feature representation with contextual information. At last,the relation category corresponding to each edge of the graph is classified according to the connectivity of the semantic connectivity graph by combining the updated feature representations of the subject and object,and the characteristics of the joint region of the two objects.

参考文献/References:

[1] JOHNSON J,KRISHNA R,STARK M,et al. Image retrieval using scene graphs[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston,USA:IEEE,2015:3668-3678.
[2]田鑫,季怡,高海燕,等. 外部信息引导和残差置乱的场景图生成方法[J]. 计算机科学与探索,2021,15(10):1958-1968.
[3]黄勇韬,严华. 结合注意力机制与特征融合的场景图生成模型[J]. 计算机科学,2020,47(6):133-137.
[4]庄志刚,许青林. 一种结合多尺度特征图和环型关系推理的场景图生成模型[J]. 计算机科学,2020,47(4):136-141.
[5]LI Y K,OUYANG W L,ZHOU B L,et al. Factorizable net:an efficient subgraph-based framework for scene graph generation[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:335-351.
[6]GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus,USA:IEEE,2014:580-587.
[7]GIRSHICK R. Fast R-CNN[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago,Chile:IEEE,2015:1440-1448.
[8]REN S Q,HE K M,GIRSHICK R,et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions of Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[9]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:Unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA:IEEE,2016:779-788.
[10]REDMON J,FARHADI A. Yolov3:An incremental improvement[J]. arXiv Preprint arXiv:1804.02767,2018.
[11]LIU W,ANGUELOV D,ERHAN D,et al. SSD:Single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision. Amsterdam,The Netherlands:Springer,2016:21-37.
[12]LIN T Y,GOYAL P,GIRSHICK R,et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,32(2):318-327.
[13]WU Z H,PAN S R,CHEN F W,et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems,2021,32(1):4-24.
[14]KIPF T N,WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv Preprint arXiv:1609.02907,2016.
[15]VELIACˇG1KOVIAC’G1 P,CUCURULL G,CASANOVA A,et al. Graph attention networks[J]. arXiv Preprint arXiv:1710.10903,2017.
[16]HAMILTON W L,YING R,LESKOVEC J. Inductive representation learning on large graphs[J]. arXiv Preprint arXiv:1706.02216,2017.
[17]CHEN T S,YU W H,CHEN R Q,et al. Knowledge-embedded routing network for scene graph generation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,USA:IEEE,2019:6163-6171.
[18]NEUBECK A,VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition(ICPR 2006). Hong Kong,China:IEEE,2006:850-855.
[19]KRISHNA R,ZHU Y K,GROTH O,et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision,2017,123:32-73.
[20]XU D F,ZHU Y K,CHOY C B,et al. Scene graph generation by iterative message passing[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,USA:IEEE,2017:3097-3106.
[21]LI Y K,OUYANG W L,ZHOU B L,et al. Scene graph generation from objects,phrases and region captions[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice,Italy:IEEE,2017:1270-1279.
[22]LU C W,KRISHNA R,BERNSTEIN M,et al. Visual relationship detection with language priors[C]//Proceedings of the 2016 European Conference on Computer Vision. Amsterdam,The Netherlands:Springer,2016:852-869.
[23]YANG J W,LU J S,LEE S,et al. Graph R-CNN for scene graph generation[C]//Proceedings of the 2018 European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:670-685.
[24]Lü J M,XIAO Q Z,ZHONG J J. AVR:Attention based salient visual relationship detection[J]. arXiv Preprint arXiv:2003.07012,2020.

备注/Memo

备注/Memo:: 收稿日期:2021-08-31.
基金项目:国家自然科学基金项目(62072358)、国家重点研发计划项目(2020YFF0304900,2019YFB1311600)、陕西省重点研发计划(2018ZDXM-GY-036).
通讯作者:张亮,教授,博士生导师,研究方向:场景感知与理解、人机交互、嵌入式系统. E-mail:liangzhang@xidian.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed3151
全文下载/Downloads2797
评论/Comments

更新日期/Last Update: 1900-01-01