Abstract

Info

Title:: Human Pose Estimation Based on Attention Mechanism and High-resolution Network

Author(s):: Zhang Ming¹; Li Chenglong¹; Gao Xinyan²; Wang Pengfei³; Zhang Jinxiao¹; (1.School of Computer Science and Technology,Shandong Jianzhu University,Jinan 250000,China)
(2.Shandong Huayun 3D Technology Co.,Ltd.,Jinan 250000,China)
(3.The Second Construction Limited Company of China Construction Eighth Engineering Division,Jinan 250000,China)

Keywords:: human posture estimation; attention mechanisms; high-resolution networks; C2F-CBAM module; critical point detection

PACS:: O643; X703

DOI:: 10.3969/j.issn.1672-1292.2024.04.005

Abstract:: Human pose estimation aims to accurately identify key point positions and postures from images or videos,which is essential for behavior recognition,human-computer interaction,etc. The high-resolution network can extract the key point features of the human body containing multi-scale information from the image,but it mainly focuses on the feature information within the local range of the image,and it is difficult to capture the long-distance dependence between joints,so it is susceptible to complex background,occlusion and other factors,which limit the accuracy. In order to solve the problems faced by high-resolution networks in human pose estimation,this paper proposes a deep learning module that integrates attention mechanism and high-resolution network called C2F-CBAM,which combines the advantages of C2F module and CBAM module,and significantly improves the accuracy of the model in identifying key points by combining advanced feature extraction technology and enhanced attention mechanism. In addition,the C2F-CBAM module is embedded in the key position of the HRNet network,so that the method can better integrate and synthesize feature information at different scales,which not only enhances the adaptability of the model to various human postures and image resolutions,but also effectively deals with complex backgrounds and occlusions. Experimental results show that the proposed model has significant advantages over other methods in the COCO2017 validation set,and the average accuracy is improved by 0.9 compared with the traditional HRNet network,which fully verifies the effectiveness and superiority of the model.

References:

[1]LI K,WANG S J,ZHANG X,et al. Pose recognition with cascade transformers[C]//IEEE Conference on Computer Vision and Pattern Recognition. Nashville,TN,USA,2021.
[2]CAO Z,SIMO T,WEI S,et al. Realtime multi-person 2d pose estimation using part affinity fields[C]//IEEE Conference On Computer Vision And Pattern Recognition. Honolulu,HI,USA,2017.
[3]KOCABAS M,KARAGOZ S,AKBAS E. Multiposenet:Fastmulti-person pose estimation using pose residual network[C]//European Conference on Computer Vision. Munich,Germany,2018.
[4]PAPANDREOU G,ZHU T,GIDARIS S,et al. Personlab:Person pose estimation and instance segmentation with a bottom-up,part-based,geometric embedding model[C]//European Conference on Computer Vision. Munich,Germany,2018.
[5]NEWELL A,HUANG Z A,DENG J. Associative embedding:End-to-end learning for joint detection and grouping[C]//NeurIPS. Long Beach,CA,USA,2017.
[6]INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al. Deepercut:A deeper,stronger,and faster multi-person pose estimation model[C]//European Conference on Computer Vision,Amsterdam,The Netherlands,Amsterdam. The Netherlands,2016.
[7]孔英会,秦胤峰,张珂. 深度学习二维人体姿态估计方法综述[J].中国图象图形学报,2023,28(7):1965-1989.
[8]CHENG B W,XIAO B,et al. HigherHRNet:Scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA,2020.
[9]邹宇翔,何宁,郭宇昕,等. 基于深度学习的人体姿态估计综述[C]//中国计算机用户协会网络应用分会2023年第二十七届网络新技术与应用年会. 镇江,江苏,2023.
[10]XIAO B,WU H P,WEI Y C. Simple baselines for human pose estimation and tracking[C]//European Conference on Computer Vision. Munich,Germany,2018.
[11]WANG J D,SUN K,CHENG T H,et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligenc,2021,43(10):3349-3361.
[12]CHENG B W,WEI Y C,FERIS R,et al. Decoupled classification refinement:Hard false positive suppression for object detection[J]. arXiv Preprint arXiv:1810.04002,2018.
[13]CHENG B W,WEI Y C,SHI H H,et al. Revisiting rcnn:On awakening the classification power of faster rcnn[C]//European Conference on Computer Vision. Munich,Germany,2018.
[14]NEWELL A,YANG K,DENG J. Stacked hourglass networks for human poseestimation[C]//European Conference on Computer Vision. Amsterdam,The Netherlands,2016.
[15]CHU X,YANG W,OUYANG W,et al. Multi-context attention for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA,2017.
[16]SUN K,XIAO B,LIU Det al. Deep high-resolution representation learning for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Long Beach,CA,USA,2019.
[17]CARREIRA J,AGRAWAL P,FRAGKIADAKI K. Human pose estimation with iterative error feedback[C]//IEEE Conference On Computer Vision And Pattern Recognition. Las Vegas,NV,USA,2016.
[18]TOSHEV A,SZEGEDY C. Deeppose:Human pose estimation via deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA,2014.
[19]CHU X,OUYANG W,LI H,et al. Structured feature learning for pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. LasVegas,NV,USA,2016.
[20]YANG W,OUYANG W,LI H,et al. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//IEEE Conference on Computer Vision And Pattern Recognition,Las Vegas,NV,USA,2016.
[21]TOMPSON J,GOROSHIN R,JAIN A,et al. Efficient object localization using convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition,Boston,MA,USA,2015.
[22]WOO,SANGHYUN,et al. Cbam:Convolutional block attention module[C]//European Conference on Computer Vision. Munich,Germany,2018.
[23]JIE H,SHEN L,SUN G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018
[24]JADERBERG MAX,SIMONYAN K,ZISSERMAN A,et al. Spatial transformer networks[J]. Advances in Neural Information Processing Systems,2015,2:2017-2025.
[25]FANG H S,XIE S Q,TAI Y W,et al. Rmpe:Regional multi-person pose estimation[C]//European Conference on Computer Vision. Honolulu,HI,USA,2017.
[26]YUAN Y,FU R,HUANG L,et al. High-resolution transformer for dense prediction[J]. arXiv Preprint arXiv:2110.09408,2021.
[27]DEBAPRIYA M J,NAGORI S,MATHEW M,et al. Yolo-pose:Enhancing yolo for multi person pose estimation using object keypoint similarity loss[C]//IEEE Conference on Computer Vision and Pattern Recognition. New Orleans,LA,USA,2022.
[28]ZHU X,LVU S,WANG X,et al. TPH-YOLOv5:Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//IEEE Conference on Computer Vision Recognition. Nashville,TN,USA,2021.
[29]YUAN Y H,FU R,HUANG L,et al. HR Former:High-Resolution transformer for dense prediction[J]. arXiv Preprint arXiv:2110.09408,2021.
[30]HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[31]WANG Q,WU B,ZHU P,et al. ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2020.
[32]GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA,2018.
[33]LOSHCHILOV I,HUTTER F. Decoupled weight decay regularization[J]. arXiv Preprint arXiv:1711.05101,2017.
[34]LIN T,MAIRE M,BELONGIE S,et al. Microsoft COCO:Com-mon objects in context[C]//European Conference on Computer Vision. Zurich,Switzerland,2014.
[35]VARGHESE R,SAMBATH. YOLOv8:A novel object detection algorithm with enhance performance and robustness[C]//2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. Chennai,India,2024.

Human Pose Estimation Based on Attention Mechanism and High-resolution Network(PDF)

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics