留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于改进ViT的红外人体图像步态识别方法研究

杨彦辰 云利军 梅建华 卢琳

杨彦辰, 云利军, 梅建华, 卢琳. 基于改进ViT的红外人体图像步态识别方法研究[J]. 应用光学, 2023, 44(1): 71-78. doi: 10.5768/JAO202344.0102002
引用本文: 杨彦辰, 云利军, 梅建华, 卢琳. 基于改进ViT的红外人体图像步态识别方法研究[J]. 应用光学, 2023, 44(1): 71-78. doi: 10.5768/JAO202344.0102002
YANG Yanchen, YUN Lijun, MEI Jianhua, LU Lin. Gait recognition method of infrared human body images based on improved ViT[J]. Journal of Applied Optics, 2023, 44(1): 71-78. doi: 10.5768/JAO202344.0102002
Citation: YANG Yanchen, YUN Lijun, MEI Jianhua, LU Lin. Gait recognition method of infrared human body images based on improved ViT[J]. Journal of Applied Optics, 2023, 44(1): 71-78. doi: 10.5768/JAO202344.0102002

基于改进ViT的红外人体图像步态识别方法研究

doi: 10.5768/JAO202344.0102002
基金项目: 云南省应用基础研究计划重点项目(2018FA033);云南师范大学研究生科研创新基金项目(YJSJJ21-B77)
详细信息
    作者简介:

    杨彦辰(1997—),男,硕士研究生,主要从事视频图像处理研究。E-mail:649228448@qq.com

    通讯作者:

    云利军(1973—),男,博士,教授,主要从事物联网技术、视频图像处理研究。E-mail:yunlijun@ynnu.edu.cn

  • 中图分类号: TN219;TP181

Gait recognition method of infrared human body images based on improved ViT

  • 摘要: 针对卷积神经网络在步态识别时准确率易饱和现象,以及Vision Transformer(ViT)对步态数据集拟合效率较低的问题,提出构建一个对称双重注意力机制模型,保留行走姿态的时间顺序,用若干独立特征子空间有针对性地拟合步态图像块;同时,采用对称架构的方式,增强注意力模块在拟合步态特征时的作用,并利用异类迁移学习进一步提升特征拟合效率。将该模型运用在中科院CASIA C红外人体步态库中进行多次仿真实验,平均识别准确率达到96.8%。结果表明,本文模型在稳定性、数据拟合速度以及识别准确率3方面皆优于传统ViT模型和CNN对比模型。
  • 图  1  CASIA C数据库中的红外步态实例

    Fig.  1  Examples of infrared gait in CASIA C database

    图  2  红外步态图像预处理

    Fig.  2  Image preprocessing results of infrared gait

    图  3  背包状态双脚步态周期

    Fig.  3  Feet gait cycle in backpack state

    图  4  4种不同状态下的相关系数周期图

    Fig.  4  Periodogram of correlation coefficient in four different states

    图  5  双路CNN步态识别模型

    Fig.  5  Gait recognition model of double channel CNN

    图  6  双重对称注意力机制步态模型

    Fig.  6  Dual symmetrical attention mechanism gait model

    图  7  本文模型与同尺寸Vit对比

    Fig.  7  Comparison between proposed model and ViT of same size

    图  8  加入迁移学习的本文模型与研究过程中其他尝试的对比

    Fig.  8  Comparison between proposed model with transfer learning and other attempts in research process

    图  9  加入迁移学习的本文模型同CNN模型对比

    Fig.  9  Comparison between proposed model with transfer learning and CNN model

  • [1] MORI A, MAKIHARA Y, YAGI Y. Gait recognition using period-based phase synchronization for low frame-rate videos[C]//2010 20th International Conference on Pattern Recognition. USA: IEEE, 2010: 2194-2197.
    [2] KONG K, TOMIZUKA M. A gait monitoring system based on air pressure sensors embedded in a shoe[J]. IEEE/ASME Transactions on mechatronics,2009,14(3):358-370. doi: 10.1109/TMECH.2008.2008803
    [3] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. USA: IEEE, 2016: 770-778.
    [4] WANG H, YUAN H. Gait Recognition based on lightweight CNNs[C]//Proceedings of the 2021 international workshop on modern science and technology. [S. l. ]: The International Center of National University Corporation Kitami Institute of Technology, 2021: 185-190.
    [5] HUANG G, LU Z, PUN C M, et al. Flexible gait recognition based on flow regulation of local features between key frames[J]. IEEE Access,2020,8:75381-75392. doi: 10.1109/ACCESS.2020.2986554
    [6] WANG X, YAN W Q. Human gait recognition based on frame-by-frame gait energy images and convolutional long short-term memory[J]. International Journal of Neural Systems,2020,30(1):1950027. doi: 10.1142/S0129065719500278
    [7] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in neural information processing systems. [S. l. ] : arXiv, 2017: 5998-6008.
    [8] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03)[2022-03-21]. https://arxiv.org/abs/2010.11929.
    [9] 徐剑, 丁晓青, 王生进, 等. 一种融合局部纹理和颜色信息的背景减除方法[J]. 自动化学报,2009,35(9):1145-1150.

    XU Jian, DING Xiaoqing, WANG Shengjin, et al. A background subtraction method integrating local texture and color information[J]. Journal of Automation,2009,35(9):1145-1150.
    [10] 刘仲民, 何胜皎, 胡文瑾, 等. 基于背景减除法的视频序列运动目标检测[J]. 计算机应用,2017,37(6):1777-1781. doi: 10.11772/j.issn.1001-9081.2017.06.1777

    LIU Zhongmin, HE Shengjiao, HU Wenjin, et al. Moving target detection in video sequences based on background subtraction method[J]. Computer Application,2017,37(6):1777-1781. doi: 10.11772/j.issn.1001-9081.2017.06.1777
    [11] 张丽红, 何树成. 基于差值绝对值之和和置信传播的快速收敛立体匹配算法[J]. 计算机应用,2014,34(3):824-827. doi: 10.11772/j.issn.1001-9081.2014.03.0824

    ZHANG Lihong, HE Shucheng. Fast convergent stereo matching algorithm based on sum of absolute values of difference and confidence propagation[J]. Computer Applications,2014,34(3):824-827. doi: 10.11772/j.issn.1001-9081.2014.03.0824
    [12] 陈丽芳, 刘渊, 须文波. 改进的归一互相关法的灰度图像模板匹配方法[J]. 计算机工程与应用,2011,47(26):181-183. doi: 10.3778/j.issn.1002-8331.2011.26.050

    CHEN Lifang, LIU Yuan, XU Wenbo. Improved normalized cross correlation method for grayscale image template matching method[J]. Computer Engineering and Application,2011,47(26):181-183. doi: 10.3778/j.issn.1002-8331.2011.26.050
    [13] DI STEFANO L, MATTOCCIA S, TOMBARI F. ZNCC-based template matching using bounded partial correlation[J]. Pattern Recognition Letters,2005,26(14):2129-2134. doi: 10.1016/j.patrec.2005.03.022
    [14] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM,2017,60(6):84-90. doi: 10.1145/3065386
    [15] RONALD M, POULOSE A, HAN D S. iSPLInception: an inception-resnet deep learning architecture for human activity recognition[J]. IEEE Access,2021,9:68985-69001. doi: 10.1109/ACCESS.2021.3078184
    [16] HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks[C]//European conference on computer vision. Switzerland: Springer, Cham, 2016: 630-645.
    [17] QIUXIA L A I, KHAN S, NIE Y, et al. Understanding more about human and machine attention in deep neural networks[J]. IEEE Transactions on Multimedia,2020,23:2086-2099.
    [18] STEINER A, KOLESNIKOV A, ZHAI X, et al. How to train your ViT? data, augmentation, and regularization in vision transformers[EB/OL]. (2021-06-18)[2022-03-21]. https://arxiv.org/abs/2106.10270v1.
    [19] ZHAI X, KOLESNIKOV A, HOULSBY N, et al. Scaling vision transformers[EB/OL]. (2021-06-08)[2022-03-21].https://arxiv.org/abs/2106.04560.
  • 加载中
图(9)
计量
  • 文章访问数:  89
  • HTML全文浏览量:  41
  • PDF下载量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-03-21
  • 修回日期:  2022-04-28
  • 网络出版日期:  2022-08-16
  • 刊出日期:  2023-01-17

目录

    /

    返回文章
    返回