«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1672-1292.2023.03.005]
点击复制

一种新的基于通道-空间融合注意力及SwinT的细粒度图像分类算法

分享到：

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

卷:: 23卷
期数:: 2023年03期

页码:: 036-42

栏目:: 计算机科学与技术

出版日期:: 2023-09-15

文章信息/Info

Title:: A New Fine-grained Image Classification Algorithm Based on Channel-Space Fusion Attention and SwinT

文章编号:: 1672-1292(2023)03-0036-07

作者:: 姜昊; 凌萍; 陈寸生保; (江苏师范大学计算机科学与技术学院,江苏徐州 221116)

Author(s):: Jiang Hao; Ling Ping; Chen Cunshengbao; (School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China)

关键词:: 细粒度图像分类; Swin Transformer; 通道-空间融合注意力模块; 深度学习; 弱监督学习

Keywords:: fine-grained image classification; Swin Transformer; channel-spatial fusion attention module; deep learning; weak supervised learning

分类号:: TP183

DOI:: 10.3969/j.issn.1672-1292.2023.03.005

文献标志码:: A

摘要:: 细粒度图像分类是计算机视觉领域的一大分类任务,其难点在于如何通过类别监督信息自主地找到判别性区域. 提出一种新的通道-空间融合注意力模块,基于该模块设计了一种新的Swin Transformer算法SwinT-NCSA(a Swin Transformer based on a novel channel-spatial attention module),分别从通道维和空间维同时提取特征,再将其融入到Swin Transformer模型中以提高其小尺度中多头注意力信息的提取能力. SwinT-NCSA算法特别关注了对分类有用的区域,同时忽视对分类无用的背景区域,以此在细粒度图像分类任务中达到较高的分类准确率. 在FGVC Aircraft 飞机数据集、CUB-200-2011鸟类数据集和Stanford Cars车类数据集3个公共数据集上的实验表明,SwinT-NCSA算法可以分别取得93.3%、88.4%和94.7%的准确率,优于同类算法.

Abstract:: Fine-grained image classification is a major classification task in the computer vision field. Its difficulty lies in how to automatically find the discriminant regions through category supervision information, for which this paper proposes a novel channel-spatial fusion attention module, and based on it, designs a new Swin Transformer algorithm(a Swin Transformer based on a novel channel-spatial attention module, SwinT-NCSA).The proposed algorithm simultaneously extracts features from the channel dimension and spatial dimension, and then integrates them into the Swin Transformer model to improve the extraction ability of multi-head attention information in its small scale. The SwinT-NCSA algorithm pays a particular focus on regions useful for classification, while ignoring background regions useless for classification, to achieve high classification accuracy in fine-grained image classification tasks. Experiments on the FGVC Aircraft aircraft dataset, Caltech-UCSD Birds-200-2011 dataset and the Stanford Cars vehicle class dataset public dataset show that the SwinT-NCSA algorithm can achieve 93.3%, 88.4% and 94.7% accuracy respectively, outperforming peer algorithms.

参考文献/References:

[1]罗建豪,吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述[J]. 自动化学报,2017,43(8):1306-1318.
[2]ZHAO B,FENG J S,WU X,et al. A survey on deep learning-based fine-grained object classification and semantic segmentation[J]. International Journal of Automation and Computing,2017,14(2):119-135.
[3]WEI X S,WU J X,CUI Q. Deep learning for fine-grained image analysis:a survey[J/OL]. arXiv Preprint arXiv:1907.03069v1,2019.
[4]LIN T Y,ROYCHOWDHURY A,MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago,Chile:IEEE,2015.
[5]LIN T Y,MAJI S. Improved bilinear pooling with CNNs[J/OL]. arXiv Preprint arXiv:1707.06772,2017.
[6]WANG Y M,MORARIU V I,DAVIS L S. Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition. Salt Lake City,USA:IEEE,2018:4148-4157.
[7]YANG Z,LUO T G,WANG D,et al. Learning to navigate for fine-grained classification[C]//Proceedings of the 15th European Conference on Computer Vision. Munich,Germany:ECCV,2018.
[8]LIN T Y,DOLLAR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu,USA:IEEE,2017.
[9]VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,USA:NIPS,2017.
[10]XIAO T J,XU Y C,YANG K Y,et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Boston,USA:IEEE,2015.
[11]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al. An image is worth 16×16 words:Transformers for image recognition at scale[J/OL]. arXiv Preprint arXiv:2010.11929,2021.
[12]李佳盈,蒋文婷,杨林,等. 基于ViT的细粒度图像分类[J]. 计算机工程与设计,2023,44(3):916-921.
[13]LIU Z,LIN Y T,CAO Y,et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Montreal,Canada:IEEE,2021.
[14]XU Y F,WEI H P,LIN M X,et al. Transformers in computational visual media:a survey[J]. Computational Visual Media,2022,8(1):33-62.
[15]CARION N,MASSA F,SYNNAEVE G,et al. End-to-end object detection with transformers[C]//Proceedings of the 16th European Conference on Computer Vision. Glasgow,UK:Springer,2020.
[16]MEINHARDT T,KIRILLOV A,LEAL-TAIXE L,et al. Trackformer:multi-object tracking with transformers[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New Orleans,USA:IEEE,2022.
[17]YANG H H,FU Y W. Wavelet U-Net and the chromatic adaptation transform for single image dehazing[C]//Proceedings of the 2019 IEEE International Conference on Image Processing(ICIP). Taipei,China:IEEE,2019.
[18]MEI J B,WANG M M,LIN Y N,et al. TransVOS:video object segmentation with transformers[J/OL].(2021-06-01). arXiv Preprint arXiv:2106.00588,2021.
[19]HE K M,ZHANG X Y,REN S Q,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,USA:IEEE,2016.
[20]WAH C,BRANSON S,WELINDER P,et al. The caltech-UCSD birds-200-2011 dataset[R]. Pasadena,USA:California Institute of Technology,2011.
[21]MAJI S,RAHTU E,KANNALA J,et al. Fine-grained visual classification of aircraft[J/OL].(2013-06-21). arXiv Preprint arXiv:1306.5151,2013.
[22]KRAUSE J,STARK M,DENG J,et al. 3D object representations for fine-grained categorization[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. Sydney,Australia:IEEE,2013.
[23]LIN T Y,ROYCHOWDHURY A,MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision(ICCV). Santiago,Chile:IEEE,2015.
[24]LI Z C,YANG Y,LIU X,et al. Dynamic computational time for visual attention[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops(ICCVW). Venice,Italy:IEEE,2017.
[25]ZHENG H L,FU J L,MEI T,et al. Learning multi-attention convolutional neural network for fine-grained image recognition[C]//Proceedings of 2017 IEEE International Conference on Computer Vision(ICCV). Venice,Italy:IEEE,2017.
[26]MOGHIMI M,BELONGIE S,SABERIAN M,et al. Boosted convolutional neural networks[C]//Proceedings of the 2016 British Machine Vision Conference(BMVC). York,UK:BMVA,2016.
[27]YANG Z,LUO T G,WANG D,et al. Learning to navigate for fine-grained classifycation[C]//Proceedings of the 15th European Conference on Computer Vision. Munich,Germany:ECCV,2018.
[28]HU T,QI H G,HUANG Q M,et al. See better before looking closer:weakly supervised data augmentation network for fine-grained visual classification[J/OL].(2019-01-26). arXiv Preprint arXiv:1901.09891,2019.

备注/Memo

备注/Memo:: 收稿日期:2023-02-21.
基金项目:国家自然科学基金面上项目(61872168)、江苏师范大学研究生科研与实践创新计划项目(2022XKT1534).
通讯作者:凌萍,博士,副教授,研究方向:计算智能、数据挖掘、支持向量机. E-mail:6020000012@jsnu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1245
全文下载/Downloads2154
评论/Comments

更新日期/Last Update: 2023-09-15