ZHU Simin, ZHAO Haitao. Depth estimation of monocular infrared images based on attention mechanism and graph convolutional neural network[J]. Journal of Applied Optics, 2021, 42(1): 49-56. DOI: 10.5768/JAO202142.0102001
Citation: ZHU Simin, ZHAO Haitao. Depth estimation of monocular infrared images based on attention mechanism and graph convolutional neural network[J]. Journal of Applied Optics, 2021, 42(1): 49-56. DOI: 10.5768/JAO202142.0102001

Depth estimation of monocular infrared images based on attention mechanism and graph convolutional neural network

More Information
  • Received Date: September 03, 2020
  • Revised Date: October 15, 2020
  • Available Online: January 03, 2021
  • The depth estimation of objects in the scene is a key issue in the field of the unmanned driving. The infrared images are helpful to solve the depth estimation problem under poor light conditions. Aiming at characteristics of unclear infrared images texture and insufficient edge information, a combination of attention mechanism and graph convolutional neural network was proposed to solve the problem of monocular infrared images depth estimation. First of all, in the depth estimation problem, the depth information of each pixel in the image was not only related to the depth information of its surrounding pixels, but also needed to consider the depth information of a larger range of other pixels. The attention mechanism could be effectively extract the pixel-level global depth information association of images. Secondly, the features obtained based on the depth information association could be considered as non-Euclidean data, and the graph convolutional neural network (GCN) was further used for reasoning. Finally, in the training phase, the continuous depth estimation regression problem was transformed into the classification problem, which made the training process more stable and reduced the learning difficulty of the network. The experimental results show that the proposed method has obtained good results on the infrared data set NUST-SR. When the threshold index is less than 1.253, the accuracy rate is improved by 1.2%, which is more advantageous than other methods.
  • [1]
    SILBERMAN N, DEREK H, et al. Indoor segmentation and support inference from RGBD Images[C]//Proceedings of the 12th European conference on Computer Vision. Berlin, Heidelberg: Springer, 2012: 740-746.
    [2]
    EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]//NIPS’14 Proceedings of the 27th International Conference on Neural Information Processing Systems. USA: arXiv, 2014: 2366-2374.
    [3]
    EIGEN D, FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]//2015 IEEE International Conference on Computer Vision (ICCV). USA: IEEE, 2015: 2650-2658.
    [4]
    LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]//2016 Fourth International Conference on 3D Vision (3DV). Stanford, CA: IEEE, 2016: 239-248.
    [5]
    HE Kaiming, ZHANG Xiangyu, et al. Deep residual learning for image recognition[C]//2016 IEEE CVF Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016: 770-778.
    [6]
    吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报,2017,37(12):246-254.

    WU Shouchuan, ZHAO Haitao, SUN Shaoyuan. Depth estimation from monocular infrared video based on Bi-recursive convolutional neural network[J]. Acta Optica Sinica,2017,37(12):246-254.
    [7]
    顾婷婷, 赵海涛, 孙韶媛. 基于帧间信息提取的单幅红外图像深度估计[J]. 激光与光电子学进展,2018,55(6):163-172.

    GU Tingting, ZHAO Haitao, SUN Shaoyuan. Depth estimation of an infrared image based on interframe information extraction[J]. Laser & Optoelectronics Progress,2018,55(6):163-172.
    [8]
    FISHER Y, VLADLEN K, et al. Multi-scale context aggregation by dilated convolutions [M]. USA:arXiv,2016: 1511.07122.
    [9]
    LI Bo, DAI Yuchao, HE Mingyi,et al. Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference[J]. Pattern Recognition,2018,83:328-339.
    [10]
    FU H, GONG M, WANG C, et al. Deep ordinal regression network for monocular depth estimation[C]//2018 IEEE CVF Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2018: 2002-2011.
    [11]
    BAHDANAU D, HYUN CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[M]. USA:arXiv,2015: 1409.0473.
    [12]
    DAN Xu, WEI Wangdan, et al. Structured attention guided convolutional neural fields for monocular depth estimation[C]//2018 IEEE CVF Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2018: 3917-3925.
    [13]
    LI R B, XIAN K, SHEN C H, et al. Deep attention-based classification network for robust depth prediction[M]//Computer Vision – ACCV 2018. Cham: Springer International Publishing, 2019: 663-678.
    [14]
    陈裕如, 赵海涛. 基于自适应像素级注意力模型的场景深度估计[J]. 应用光学,2020,41(3):490-499.

    CHEN Yuru, ZHAO Haitao. Depth estimation based on adaptive pixel-level attention model[J]. Journal of Applied Optics,2020,41(3):490-499.
    [15]
    FU Junwei, LIANG Jun, WANG Ziyang. Monocular depth estimation based on multi-scale graph convolution networks[J]. IEEE Access,2020(8):997-1009.
    [16]
    XU Keyulu, LI Chengtao, TIAN Yonglong, et al. Representation learning on graphs with jumping knowledge networks[M]. USA:arXiv,2018: 1806.03536.
    [17]
    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[M/OL]. USA: arXiv, 2014: 1409-1556.
  • Cited by

    Periodical cited type(10)

    1. 公希萌,赵亮凯. 基于三维激光视觉技术的平面设计图像增强和优化研究. 激光杂志. 2023(04): 158-163 .
    2. 段锦. 基于Retinex的园林景观图像增强. 微型电脑应用. 2022(11): 45-47+52 .
    3. 王康,刘婷,陈子文,李雅薇,郭显久. 计算机图像处理技术在水下图像中的应用. 数字技术与应用. 2021(03): 93-95+161 .
    4. 于琳琳. 基于激光全息技术的漆画无损修复研究. 激光杂志. 2021(12): 207-211 .
    5. 李毕祥,乐敏. 激光全息数字图像补偿资源云存储平台设计. 激光杂志. 2020(03): 117-121 .
    6. 杨森林,孙静,闫曌,李喜龙. 基于深度卷积神经网络的图像帧间补偿研究. 计算机仿真. 2020(01): 452-455 .
    7. 李熙莹,朱肯钢. 结合天空分割和局部透射率优化交通图像去雾算法. 计算机与现代化. 2019(05): 51-58 .
    8. 冯晶明,苗文娟,畅青. 基于像素值梯度变化的深度图修复算法. 现代计算机. 2019(25): 51-54 .
    9. 姚晓峰,须文波,武利秀. 基于深度神经网络的激光图像修复. 激光杂志. 2019(11): 76-79 .
    10. 刘振宇,关彤. 基于RGB-D图像的头部姿态检测. 计算机科学. 2019(S2): 334-340 .

    Other cited types(8)

Catalog

    Article views (1863) PDF downloads (142) Cited by(18)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return