|Table of Contents|

Human Action Recognition Based on 3D-CBAM Attention Mechanism(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2021年01期
Page:
49-56
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
Human Action Recognition Based on 3D-CBAM Attention Mechanism
Author(s):
Wang FeiHu RonglinJin Ying
School of Computer and Software Engineering,Huaiyin Institute of Technology,Huaian 223003,China
Keywords:
machine visionhuman movement recognition3D convolutional neural networkattention mechanism
PACS:
TP391.4
DOI:
10.3969/j.issn.1672-1292.2021.01.008
Abstract:
Aiming at the problems of insufficient feature extraction and low recognition rate of existing action recognition methods,the paper proposes a fusion model by combining the advantages of two-stream network,3D convolutional neural network and convolutional LSTM network. In order to better extract human motion features,the fusion model adopts SSD target detection method to segment the human body as local features and global features of the original video for joint training,and adopts late fusion for classification. The 3D convolutional block attention module(3D-CBAM)is integrated into 3D convolutional neural network by using shortcut structure to enhance the neural network’s channel and spatial feature extraction. And by replacing part of the 3D convolutional layer of the neural network with ConvLSTM layer,the temporal relation of the video is better obtained. The experiment is carried out on the KTH dataset,and the results show that the proposed model has high recognition accuracy of human action.

References:

[1] KONG Y,FU Y. Human action recognition and prediction:a survey[EB/OL]. [2020-05-21]. https://arxiv.org/abs/1806.11230.
[2]WANG H,KLASER A,SCHMID C,et al. Action recognition by dense trajectories[C]//2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs,USA,2011.
[3]WANG H,SCHMID C. Action recognition with improved trajectories[C]//Proceedings of the IEEE international conference on computer vision. Sydney,Australia,2013.
[4]TRAN D,BOURDEV L,FERGUS R,et al. Learning spatio-temporal features with 3D convolutional networks[EB/OL]. [2020-05-20]. https://arxiv.org/abs/1412.0767.
[5]SIMONYAN K,ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems. Montréal,Canada,2014.
[6]SRIVASTAVA N,MANSIMOV E,SALAKHUTDINOV R. Unsupervised learning of video representations using LSTMs[EB/OL]. [2020-06-11]. https://arxiv.org/abs/1502.04681v3.
[7]FRANCISCO O,DANIEL R. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition[J]. Sensors,2016,16(1):115-140.
[8]罗会兰,童康,孔繁胜. 基于深度学习的视频中人体动作识别进展综述[J]. 电子学报,2019,47(5):1162-1173.
[9]ZHANG H B,ZHANG Y X,ZHONG B,et al. A comprehensive survey of vision-based human action recognition methods[J]. Sensors,2019,19(5):105-120.
[10]FEICHTENHOFER C,PINZ A,ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas,USA,2016.
[11]SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2015-05-20]. https://arxiv.org/abs/1409.1556.
[12]WANG L,XIONG Y,WANG Z,et al. Temporal segment networks:towards good practices for deep action recognition[C]//European Conference on Computer Vision. Cham:Springer,2016.
[13]张聪聪,何宁. 基于关键帧的双流卷积网络的人体动作识别方法[J]. 南京信息工程大学学报(自然科学版),2019,64(6):96-101.
[14]JI S W,XU W,YANG M,et al. 3D convolutional neuralnetworks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[15]NG Y H,HAUSKNECHT M,VIJAYANARASIMHAN S,et al. Beyond short snippets:deep networks for video classification[EB/OL]. [2020-05-21]. https://arxiv.org/abs/1503.08909.
[16]SHI X,CHEN Z,WANG H,et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Advances in Neural Information Processing Systems. Montreal,Quebec,Canada,2015.
[17]HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA,2016.
[18]REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(2):142-158.
[19]WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]//European Conference on Computer Vision. Munich,Germany,2018.

Memo

Memo:
-
Last Update: 2021-03-15