Depth estimation based on adaptive pixel-level attention model

CHEN Yuru; ZHAO Haitao

doi:10.5768/JAO202041.0302002

Journal of Applied Optics > 2020 > 41(3): 490-499. > DOI: 10.5768/JAO202041.0302002

CHEN Yuru, ZHAO Haitao. Depth estimation based on adaptive pixel-level attention model[J]. Journal of Applied Optics, 2020, 41(3): 490-499. DOI: 10.5768/JAO202041.0302002

Citation:

CHEN Yuru, ZHAO Haitao. Depth estimation based on adaptive pixel-level attention model[J]. Journal of Applied Optics, 2020, 41(3): 490-499. DOI: 10.5768/JAO202041.0302002

Citation:

CHEN Yuru, ZHAO Haitao. Depth estimation based on adaptive pixel-level attention model[J]. Journal of Applied Optics, 2020, 41(3): 490-499. DOI: 10.5768/JAO202041.0302002

PDF (1042 KB)

Depth estimation based on adaptive pixel-level attention model

CHEN Yuru,
ZHAO Haitao^,

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China

More Information

Received Date: September 01, 2019
Revised Date: December 29, 2019
Available Online: May 29, 2020

Graphical Abstract

Abstract

Abstract

Depth estimation is a traditional computer vision task that plays a vital role in understanding the geometry of the 3D scenes. The difficulty of the depth estimation task based on monocular images was how to extract the context information of the long-range dependence in image features, therefore an adaptive context aggregation network (ACANet) was proposed to solve this problem. The ACANet was based on the supervised self-attention (SSA) model, which could adaptively learn the similarities with task traits between arbitrary pixels to simulate the continuous context information, and the attention weight distribution of the model learning was used to aggregate and extract the image features. Firstly, the monocular depth estimation task was designed as a multi-class classification problem at the pixel level. Then the attention loss function was designed to reduce the semantic inconsistency of the RGB image and the depth map, and the features indexed by positions were globally pooled by the generated pixel-level attention weights. Finally, a soft ordinal inference (SOI) algorithm was proposed, which fully utilized the predicted confidence of network to transform the discrete depth labels into the smooth continuous depth maps, and the accuracy was improved (rmse decreased by 3%). The experimental results on the public benchmark data set NYU Depth V2 of the monocular depth estimation show that, the rmse index is 0.490, and the threshold index is 82.8%. The better results are obtained, which prove the superiority of the proposed algorithm.
- depth estimation,
- attention model,
- context information,
- soft inference

FullText(HTML)

References (25)

References

[1]	SILBERMAN N, HOIEM D, KOHLI P, et al.Indoor segmentation and support inference from RGBD images[C]//Comput. Vis -ECCV 2012. Berlin: Springer, 2012: 746-760.
[2]	郭连朋, 陈向宁, 刘彬, 等. 基于Kinect传感器多深度图像融合的物体三维重建[J]. 应用光学,2014,35(5):811-816. GUO Lianpeng, CHEN Xiangning, LIU Bin, et al. 3D-object reconstruction based on fusion of depth images by Kinect sensor[J]. Journal of Applied Optics,2014,35(5):811-816.
[3]	SIMON M, MILZ S, AMENDE K, et al. Complex-YOLO: an euler-region-proposal for real-time 3D object detection on point clouds[M]. Cham: Springer International Publishing, 2018: 197-209.
[4]	LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]//2016 Fourth International Conference on 3D Vision. Stanford, CA: IEEE, 2016: 239-248.
[5]	裴嘉欣, 孙韶媛, 王宇岚, 等. 基于改进 YOLOv3 网络的无人车夜间环境感知[J]. 应用光学,2019,40(3):380-386. doi: 10.5768/JAO201940.0301004 PEI Jiaxin, SUN Shaoyuan, WANG Yulan, et al. Nighttime environment perception of driverless vehicles based on improved YOLOv3 network[J]. Journal of Applied Optics,2019,40(3):380-386. doi: 10.5768/JAO201940.0301004
[6]	EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]//International Conference on Neural Information Processing Systems. USA: arXiv, 2014.
[7]	EIGEN D, FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV). USA: IEEE, 2015.
[8]	吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报,2019,37(12):246-254. WU Shouchuan, ZHAO Haitao, SUN Shaoyuan. Depth estimation from monocular infrared video based on Bi-recursive convolutional neural network[J]. Acta Optica Sinica,2019,37(12):246-254.
[9]	GARG R, VIJAY K B G, CARNEIRO G, et al. Unsupervised cnn for single view depth estimation: geometry to the rescue[C]//European Conference on Computer Vision. Cham: Springer, 2016.
[10]	CLÉMENT G, AODHA O M, BROSTOW G J.Unsupervised monocular depth estimation with left-right consistency[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE, 2017.
[11]	顾婷婷, 赵海涛, 孙韶媛. 基于金字塔型残差神经网络的红外图像深度估计[J]. 红外技术,2018,40(5):21-27. GU Tingting, ZHAO Haitao, SUN Shaoyuan. Depth estimation of infrared image based on pyramid residual neural networks[J]. Infrared Technology,2018,40(5):21-27.
[12]	RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[13]	HUANG Jinggang, LEE A B, MUMFORD D. Statistics of range images[C]//Computer Vision and Pattern Recognition. USA: IEEE, 2000: 324-331.
[14]	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv, 2017: 1706.05587.
[15]	YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv, 2015: 1511.07122.
[16]	WANG Panqu, CHEN Pengfei, YUAN Ye, et al. Understanding convolution for semantic segmentation[C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). USA: IEEE, 2018.
[17]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016.
[18]	NIU Zhenxing, ZHOU Mo, WANG Le, et al. Ordinal regression with multiple output cnn for age estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2016.
[19]	FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2018.
[20]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. USA: NIPS Foundation, Inc., 2017.
[21]	WANG Xiaolong, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. USA: IEEE, 2018.
[22]	LI Bo, DAI Yuchao, HE Mingyi. Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference[J]. Pattern Recognition,2018,83:328-339. doi: 10.1016/j.patcog.2018.05.029
[23]	CAO Y Z H, WU Z, SHEN C. Estimating depth from monocular images as classification using deep fully convolutional residual networks[C]//IEEE Transactions on Circuits and Systems for Video Technology. USA: IEEE, 2017.
[24]	JIA Deng, WEI Dong, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//IEEE Computer Vision & Pattern Recognition.USA: IEEE, 2009: 248-255.
[25]	XU Dan, RICCI E, OUYANG Wanli, et al. Multi-scale continuous CRFs as sequential deep networks for monocular depth Estimation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE Computer Society, 2017.

Cited By

Get Citation

PDF

XML

Article views (1038) PDF downloads (32)

Turn off MathJax

Article Contents

Abstract

References

Depth estimation based on adaptive pixel-level attention model

Abstract

References

Catalog

Related Databases

Advertising cooperation

Links

Help Center

Depth estimation based on adaptive pixel-level attention model

Abstract

References

Catalog

Related Databases

Advertising cooperation

Links

Help Center

Export File

Citation

Format

Content