DETR-based binocular measurement system in road environment
-
摘要: DETR(detection transformer)算法是一个基于Transformer的目标检测算法,具有检测速度快、检测效果好的优势。介绍了一种利用DETR算法及双目视觉原理对道路环境下的人、车、自行车、信号灯等目标进行构建的测量系统。分析了双目测距、相机标定、目标检测以及目标匹配的原理,并以此为基础构建了测量系统。采用目标检测算法检测视野中的目标,利用双目视觉原理对检测到的目标进行测距,同时分析了测量系统中测量误差的来源,并计算其对结果的影响。该算法在KITTI数据集及现实环境中进行测试,测量系统基线为45 cm,对15 m~80 m的指定目标检出率高于90.6%,测距误差小于5.8%,在RTX 2080Ti平台上能够实时运行。Abstract: Detection transformer (DETR) is a target detection algorithm based on Transformer, which has the advantages of fast detection speed and good detection effect. A measurement system based on DETR and binocular vision principle for people, vehicles, bicycles, signal lights and other targets in road environment was introduced. The principles of binocular ranging, camera calibration, target detection and target matching were analyzed, and the measurement system was constructed on this basis. The target detection algorithm was used to detect the targets in the field of vision, and the principle of binocular vision was used to measure the distance of the detected targets. The source of measurement error in the measurement system was analyzed and the influence on the results was calculated. The algorithm was tested in KITTI data set and real environment. The system baseline is 45 cm, the detection rate of 15 m~80 m specified targets is higher than 90.6%, and the ranging error is less than 5.8%. The proposed algorithm can run in real time on RTX 2080Ti platform.
-
Key words:
- binocular vision /
- target detection /
- measurement system /
- measurement error
-
表 1 双目视觉测量系统检测结果分析
Table 1 Detection results analysis of binocular vision measurement system
真实目标
距离/m目标数 检出
目标数成功
测距数测距
正确数平均测距
误差/%5~15 7 308 7 108 6 845 6 257 7.9 15~25 7 464 7 014 6 727 6 219 5.1 25~40 7 836 7 175 6 934 6 626 4.8 40~80 4 214 3 820 3 674 3 591 5.8 80~120 2 127 1 825 1 751 1 693 6.4 表 2 本文算法与其它算法对15 m~80 m目标的测量结果比较
Table 2 Detection results comparison of 15 m~80 m targets with proposed algorithm and other algorithms
算法 检出率/% 测距误差/% 运行帧率/(帧/s) 本文算法 92.3 5.2 21 GC-Net 93.1 5.9 1.1 GANet 93.5 5.2 10.2 PSMNet 93.9 4.8 2.4 Monodepth 92.9 6.9 15 表 3 双目测量系统测试结果分析
Table 3 Detection results analysis of binocular measurement system
目标序号 目标真实距离/m 输出概率值 测距值/m 测距误差/% 1 18.94 0.99 18.25 3.64 2 19.81 0.98 18.95 4.34 3 21.18 1.00 20.82 1.70 4 22.56 0.99 21.92 2.84 -
[1] NIE G Y, CHENG M M, LIU Y, et al. Multi-level context ultra-aggregation for stereo matching[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020: 3278-3286. [2] DENG H, LIAO Q M, LU Z Q, et al. Parallax contextual representations for stereo matching[C]//2021 IEEE International Conference on Image Processing (ICIP). Anchorage: IEEE, 2021: 3193-3197. [3] ZBONTAR J, LECUN Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research,2016,17:2287-2318. [4] PANG J H, SUN W X, REN J S, et al. Cascade residual learning: a two-stage convolutional neural network for stereo matching[C]//2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Venice: IEEE, 2018: 878-886. [5] KHAMIS S, FANELLO S, RHEMANN C, et al. StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction[M]//Computer Vision-ECCV 2018. Munich: Springer International Publishing, 2018: 596-613. [6] CHANG J R, CHEN Y S. Pyramid stereo matching network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5410-5418. [7] XU H F, ZHANG J Y. AANet: adaptive aggregation network for efficient stereo matching[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 1956-1965. [8] LIANG Z F, FENG Y L, GUO Y L, et al. Learning for disparity estimation through feature constancy[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2811-2820. [9] POGGI M, PALLOTTI D, TOSI F, et al. Guided stereo matching[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020: 979-988. [10] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//2017 Conference on Neural Information Processing Systems. Long Beach: IEEE, 2017: 3058-3068. [11] TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers and distillation through attention[EB/OL]. (2021-01-15) [2022-03-15]. https://arxiv.org/abs/2012.12877. [12] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2022: 9992-10002. [13] 张琦, 胡广地, 李雨生, 等. 改进Fast-RCNN的双目视觉车辆检测方法[J]. 应用光学,2018,39(6):832-838.ZHANG Qi, HU Guangdi, LI Yusheng, et al. Binocular vision vehicle detection method based on improved Fast-RCNN[J]. Journal of Applied Optics,2018,39(6):832-838. [14] ZHANG Z Y. A flexible new technique for camera calibration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(11):1330-1334. doi: 10.1109/34.888718 [15] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Computer Vision-ECCV 2020. Berlin: Springer International Publishing, 2020: 213-229. [16] 崔恩坤, 滕艳青, 刘佳伟. 立体视觉测量系统标定误差补偿[J]. 应用光学,2020,41(6):1174-1180. doi: 10.5768/JAO202041.0601006CUI Enkun, TENG Yanqing, LIU Jiawei. Calibration error compensation technique of stereoscopic vision measurement system[J]. Journal of Applied Optics,2020,41(6):1174-1180. doi: 10.5768/JAO202041.0601006 [17] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361. -