[1]张 铭,李成龙,高新燕,等.基于注意力机制与高分辨率网络的人体姿态估计[J].南京师范大学学报(工程技术版),2024,24(04):046-56.[doi:10.3969/j.issn.1672-1292.2024.04.005]
 Zhang Ming,Li Chenglong,Gao Xinyan,et al.Human Pose Estimation Based on Attention Mechanism and High-resolution Network[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(04):046-56.[doi:10.3969/j.issn.1672-1292.2024.04.005]
点击复制

基于注意力机制与高分辨率网络的人体姿态估计
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
24卷
期数:
2024年04期
页码:
046-56
栏目:
计算机科学与技术
出版日期:
2024-12-15

文章信息/Info

Title:
Human Pose Estimation Based on Attention Mechanism and High-resolution Network
文章编号:
1672-1292(2024)04-0046-11
作者:
张 铭1李成龙1高新燕2王鹏飞3张金萧1
(1.山东建筑大学计算机科学与技术学院,山东 济南 250000)
(2.山东华云三维科技有限公司,山东 济南 250000)
(3.中建八局第二建设有限公司,山东 济南 250000)
Author(s):
Zhang Ming1Li Chenglong1Gao Xinyan2Wang Pengfei3Zhang Jinxiao1
(1.School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250000,China)
(2.Shandong Huayun 3D Technology Co.,Ltd.,Jinan 250000,China)
(3.The Second Construction Limited Company of China Construction Eighth Engineering Division,Jinan 250000,China)
关键词:
人体姿态估计注意力机制高分辨率网络C2F-CBAM模块关键点检测
Keywords:
human posture estimationattention mechanismshigh-resolution networksC2F-CBAM modulecritical point detection
分类号:
O643; X703
DOI:
10.3969/j.issn.1672-1292.2024.04.005
文献标志码:
A
摘要:
人体姿态估计旨在从图像或视频中精确识别关键点位置和姿态,对行为识别、人机交互等至关重要. 高分辨率网络能够从图像中提取包含多尺度信息的人体关键点特征,但主要聚焦于图像局部范围内的特征信息,难以捕捉关节间的长距离依赖,因此易受复杂背景、遮挡等因素影响,限制了准确率. 针对高分辨率网络在人体姿态估计中所面临的问题,提出了一种融合注意力机制和高分辨率网络的深度学习模块C2F-CBAM,该模块结合了C2F模块和CBAM模块的优势,结合先进的特征提取技术和强化的注意力机制,C2F-CBAM模块显著提高了模型在识别关键点的准确性. 此外,将C2F-CBAM模块嵌入到HRNet网络的关键位置,使得该方法能够更好地整合和综合不同尺度的特征信息. 这种融合策略不仅增强了模型对各种人体姿态和图像分辨率的适应性,还有效地处理了复杂背景和遮挡等问题. 实验结果显示,该模型在COCO2017验证集上相较于其他方法具有显著优势,平均精度比传统HRNet网络提升了0.9,充分验证了模型的有效性和优越性.
Abstract:
Human pose estimation aims to accurately identify key point positions and postures from images or videos,which is essential for behavior recognition,human-computer interaction,etc. The high-resolution network can extract the key point features of the human body containing multi-scale information from the image,but it mainly focuses on the feature information within the local range of the image,and it is difficult to capture the long-distance dependence between joints,so it is susceptible to complex background,occlusion and other factors,which limit the accuracy. In order to solve the problems faced by high-resolution networks in human pose estimation,this paper proposes a deep learning module that integrates attention mechanism and high-resolution network called C2F-CBAM,which combines the advantages of C2F module and CBAM module,and significantly improves the accuracy of the model in identifying key points by combining advanced feature extraction technology and enhanced attention mechanism. In addition,the C2F-CBAM module is embedded in the key position of the HRNet network,so that the method can better integrate and synthesize feature information at different scales,which not only enhances the adaptability of the model to various human postures and image resolutions,but also effectively deals with complex backgrounds and occlusions. Experimental results show that the proposed model has significant advantages over other methods in the COCO2017 validation set,and the average accuracy is improved by 0.9 compared with the traditional HRNet network,which fully verifies the effectiveness and superiority of the model.

参考文献/References:

[1]LI K,WANG S J,ZHANG X,et al. Pose recognition with cascade transformers[C]//IEEE Conference on Computer Vision and Pattern Recognition. Nashville,TN,USA,2021.
[2]CAO Z,SIMO T,WEI S,et al. Realtime multi-person 2d pose estimation using part affinity fields[C]//IEEE Conference On Computer Vision And Pattern Recognition. Honolulu,HI,USA,2017.
[3]KOCABAS M,KARAGOZ S,AKBAS E. Multiposenet:Fastmulti-person pose estimation using pose residual network[C]//European Conference on Computer Vision. Munich,Germany,2018.
[4]PAPANDREOU G,ZHU T,GIDARIS S,et al. Personlab:Person pose estimation and instance segmentation with a bottom-up,part-based,geometric embedding model[C]//European Conference on Computer Vision. Munich,Germany,2018.
[5]NEWELL A,HUANG Z A,DENG J. Associative embedding:End-to-end learning for joint detection and grouping[C]//NeurIPS. Long Beach,CA,USA,2017.
[6]INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al. Deepercut:A deeper,stronger,and faster multi-person pose estimation model[C]//European Conference on Computer Vision,Amsterdam,The Netherlands,Amsterdam. The Netherlands,2016.
[7]孔英会,秦胤峰,张珂. 深度学习二维人体姿态估计方法综述[J].中国图象图形学报,2023,28(7):1965-1989.
[8]CHENG B W,XIAO B,et al. HigherHRNet:Scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[9]邹宇翔,何宁,郭宇昕,等. 基于深度学习的人体姿态估计综述[C]//中国计算机用户协会网络应用分会2023年第二十七届网络新技术与应用年会. 镇江,江苏,2023.
[10]XIAO B,WU H P,WEI Y C. Simple baselines for human pose estimation and tracking[C]//European Conference on Computer Vision. Munich,Germany,2018.
[11]WANG J D,SUN K,CHENG T H,et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligenc,2021,43(10):3349-3361.
[12]CHENG B W,WEI Y C,FERIS R,et al. Decoupled classification refinement:Hard false positive suppression for object detection[J]. arXiv Preprint arXiv:1810.04002,2018.
[13]CHENG B W,WEI Y C,SHI H H,et al. Revisiting rcnn:On awakening the classification power of faster rcnn[C]//European Conference on Computer Vision. Munich,Germany,2018.
[14]NEWELL A,YANG K,DENG J. Stacked hourglass networks for human poseestimation[C]//European Conference on Computer Vision. Amsterdam,The Netherlands,2016.
[15]CHU X,YANG W,OUYANG W,et al. Multi-context attention for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA,2017.
[16]SUN K,XIAO B,LIU Det al. Deep high-resolution representation learning for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[17]CARREIRA J,AGRAWAL P,FRAGKIADAKI K. Human pose estimation with iterative error feedback[C]//IEEE Conference On Computer Vision And Pattern Recognition. Las Vegas,NV,USA,2016.
[18]TOSHEV A,SZEGEDY C. Deeppose:Human pose estimation via deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA,2014.
[19]CHU X,OUYANG W,LI H,et al. Structured feature learning for pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. LasVegas,NV,USA,2016.
[20]YANG W,OUYANG W,LI H,et al. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//IEEE Conference on Computer Vision And Pattern Recognition,Las Vegas,NV,USA,2016.
[21]TOMPSON J,GOROSHIN R,JAIN A,et al. Efficient object localization using convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,MA,USA,2015.
[22]WOO,SANGHYUN,et al. Cbam:Convolutional block attention module[C]//European Conference on Computer Vision. Munich,Germany,2018.
[23]JIE H,SHEN L,SUN G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018
[24]JADERBERG MAX,SIMONYAN K,ZISSERMAN A,et al. Spatial transformer networks[J]. Advances in Neural Information Processing Systems,2015,2:2017-2025.
[25]FANG H S,XIE S Q,TAI Y W,et al. Rmpe:Regional multi-person pose estimation[C]//European Conference on Computer Vision. Honolulu,HI,USA,2017.
[26]YUAN Y,FU R,HUANG L,et al. High-resolution transformer for dense prediction[J]. arXiv Preprint arXiv:2110.09408,2021.
[27]DEBAPRIYA M J,NAGORI S,MATHEW M,et al. Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//IEEE Conference on Computer Vision and Pattern Recognition. New Orleans,LA,USA,2022.
[28]ZHU X,LVU S,WANG X,et al. TPH-YOLOv5:Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//IEEE Conference on Computer Vision Recognition. Nashville,TN,USA,2021.
[29]YUAN Y H,FU R,HUANG L,et al. HR Former:High-Resolution transformer for dense prediction[J]. arXiv Preprint arXiv:2110.09408,2021.
[30]HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[31]WANG Q,WU B,ZHU P,et al. ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2020.
[32]GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[33]LOSHCHILOV I,HUTTER F. Decoupled weight decay regularization[J]. arXiv Preprint arXiv:1711.05101,2017.
[34]LIN T,MAIRE M,BELONGIE S,et al. Microsoft COCO:Com-mon objects in context[C]//European Conference on Computer Vision. Zurich,Switzerland,2014.
[35]VARGHESE R,SAMBATH. YOLOv8:A novel object detection algorithm with enhance performance and robustness[C]//2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. Chennai,India,2024.

相似文献/References:

[1]王 飞,胡荣林,金 鹰.基于3D-CBAM注意力机制的人体动作识别[J].南京师范大学学报(工程技术版),2021,21(01):049.[doi:10.3969/j.issn.1672-1292.2021.01.008]
 Wang Fei,Hu Ronglin,Jin Ying.Human Action Recognition Based on 3D-CBAM Attention Mechanism[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(04):049.[doi:10.3969/j.issn.1672-1292.2021.01.008]
[2]王立凯,曲维光,魏庭新,等.基于深度学习的中文零代词识别[J].南京师范大学学报(工程技术版),2021,21(04):019.[doi:10.3969/j.issn.1672-1292.2021.04.004]
 Wang Likai,Qu Weiguang,Wei Tingxin,et al.Identification of Chinese Zero Pronouns Based on Deep Learning[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(04):019.[doi:10.3969/j.issn.1672-1292.2021.04.004]
[3]汪小鹏,朱 峰,李 磊,等.基于人脸微动作的伪造视频检测[J].南京师范大学学报(工程技术版),2024,24(04):028.[doi:10.3969/j.issn.1672-1292.2024.04.003]
 Wang Xiaopeng,Zhu Feng,Li Lei,et al.Face Forgery Detection Based on Facial Micro-Movements[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(04):028.[doi:10.3969/j.issn.1672-1292.2024.04.003]
[4]陈 斌,樊飞燕,陆天易.骨骼双流注意力增强图卷积人体姿态识别[J].南京师范大学学报(工程技术版),2024,24(04):057.[doi:10.3969/j.issn.1672-1292.2024.04.006]
 Chen Bin,Fan Feiyan,Lu Tianyi.Bone Dual-Stream Attention Enhancement Graph Convolving Human Posture Recognition[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(04):057.[doi:10.3969/j.issn.1672-1292.2024.04.006]

备注/Memo

备注/Memo:
收稿日期:2024-05-12.
基金项目:国家自然科学基金项目(62102235)、山东省自然科学基金项目(ZR2020QF029).
通讯作者:李成龙,博士,副教授,研究方向:计算机视觉、增强现实、计算机图形学等. E-mail:lichenglong18@sdjzu.edu.cn
更新日期/Last Update: 2024-12-15