Night vision dense crowd counting based on mid-term fusion of thermal imaging features
-
摘要: 为了提高人群计数模型对尺度和光噪声的鲁棒性,设计了一种多模态图像融合网络。提出了一种针对夜间人群统计模型,并设计了一个子网络Rgb-T-net,网络融合了热成像特征和可见光图像的特征,增强了网络对热成像和夜间人群特征的判断能力。模型采用自适应高斯核对密度图进行回归,在Rgb-T-CC数据集上完成了夜视训练和测试。经验证网络平均绝对误差为18.16,均方误差为32.14,目标检测召回率为97.65%,计数性能和检测表现优于当前最先进的双峰融合方法。实验结果表明,所提出的多模态特征融合网络能够解决夜视环境下的计数与检测问题,消融实验进一步证明了融合模型各部分参数的有效性。Abstract: In order to improve the robustness of crowd counting model to scale and optical noise, a multimodal image fusion network was designed. A statistical model for night crowd was proposed, and a sub network Rgb-T-net was designed. The network integrated the characteristics of thermal imaging and visible image, and the ability of network to judge the characteristics of thermal imaging and night crowd was enhanced. The proposed model used the adaptive Gaussian checking density diagram for regression, and the night vision training and testing were completed on the Rgb-T-CC data set. Through verification, the average absolute error of the network is 18.16, the mean square error is 32.14, and the recall rate of target detection is 97.65%. The counting performance and detection performance are superior to the current most advanced bimodal fusion method. The experimental results show that the proposed multimodal feature fusion network can solve the counting and detection problem in night vision environment, and the ablation experiment further proves the effectiveness of parameters of the fusion model.
-
图 4 ShanghaiTechPartA,ShanghaiTechPartB,UCF-QNRF和UCF_CC_50数据集的可视化结果。从左到右:输入图像、真实标注、贝叶斯结果和我们推荐方法的结果
Fig. 4 Visualization results from ShanghaiTechPartA, ShanghaiTechPartB, UCF-QNRF and UCF_CC_50 datasets (from left to right: input images, real annotations, Bayesian results and results from proposed method)
表 1 Rgb-T-CC数据集的参数信息
Table 1 Parameter information for Rgb-T-CC dataset
数据集 分辨率 数据类型 数量 最大 最小 平均 总计 模态 Rgb-T-CC 640×480 Rgb+T 4060 82 45 68 138,389 Rgb-T 表 2 Rgb-T-CC数据集上不同最新方法的比较
Table 2 Comparison of different state-of-the-art methods on Rgb-T-CC dataset
表 3 Rgb-T-CC数据集不同融合方式的比较
Table 3 Comparison of different fusion methods on Rgb-T-CC dataset
融合方式(Rgb-T-CC数据集) MAE MSE AGK 22.46 38.97 AGK+Rgb-T-net(早期融合) 18.01 31.49 AGK+Rgb-T-net(中期融合) 18.16 32.14 AGK+Rgb-T-net(晚期融合) 19.35 34.71 -
[1] XU Mingliang, GE Zhaoyang, JIANG Xiaoheng, et al. Depth Information Guided Crowd Counting for complex crowd scenes[J]. Pattern Recognition Letters,2019,12(5):563-569. [2] 高凯珺, 孙韶媛, 姚广顺, 等. 基于深度学习的无人车夜视图像语义分割[J]. 应用光学,2017,38(3):421-428.GAO Kaijun, SUN Shaoyuan, YAO Guangshun, et al. Semantic segmentation of night vision images for unmanned vehicles based on deep learning[J]. Journal of Applied Optics,2017,38(3):421-428. [3] 吴海兵, 陶声祥, 张良, 等. 低照度条件下三基色获取及真彩色融合方法研究[J]. 应用光学,2016,37(5):673-679.WU Haibing, TAO Shengxiang, ZHANG Liang, et al. Tricolor acquisition and true color images fusion method under low illumination condition[J]. Journal of Applied Optics,2016,37(5):673-679. [4] MIAO Yunqi, HAN Jungong, GAO Yongsheng, et al. ST-CNN: spatial-temporal convolutional neural network for crowd counting in videos[J]. Pattern Recognition Letters,2019,125(3):113-118. [5] LIU X, YANG J, DING W, et al. Adaptive mixture regression network with local counting map for crowd counting[C]//European Conference on Computer Vision, August 23-28, 2020, Glasgow. UK: Springer, 2020: 241-257. [6] ZHOU Yuan, YANG Jianxing, LI Hongru, et al. Adversarial learning for multiscale crowd counting under complex scenes[J]. IEEE Transactions on Cybernetics,2021,51(11):5423-5432. doi: 10.1109/TCYB.2019.2956091 [7] BOOMINATHAN L, KRUTHIVENTI S S S, BABU R V. Crowdnet: a deep convolutional network for dense crowd counting[C]//Proceedings of the 24th ACM international conference on Multimedia, October 15-19, 2016, New York, NY. United States: ACM, 2016: 640-644. [8] SAMUEL M, SAMUEL-SOMA M A, MOVEH F F. Ai driven thermal people counting for smart window facade using portable low-cost miniature thermal imaging sensors[J]. 2020, 16(5): 1566-1574. [9] LIU D, ZHANG K, CHEN Z. Attentive cross-modal fusion network for RGB-D saliency detection[J]. IEEE Transactions on Multimedia,2020,23(1):967-981. [10] XU G, LI X, ZHANG X, et al. Loop closure detection in rgb-d slam by utilizing siamese convnet features[J]. Applied Sciences,2022,12(1):62-75. [11] TANG Z, XU T, LI H, et al. Exploring fusion strategies for accurate rgbt visual object tracking[J]. ArXiv Preprint ArXiv: 2201.08673, 2022. [12] ZHANG W, GUO X, WANG J, et al. Asymmetric adaptive fusion in a two-stream network for RGB-D human detection[J]. Sensors,2021,21(3):916-921. doi: 10.3390/s21030916 [13] ZHOU Wujie, JIN Jianhui, LEI Jingsheng, et al. CEGFNet: common extraction and gate fusion network for scene parsing of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing,2021,19(6):1524-1535. [14] ZHANG Shihui, LI He, KONG Weihang. A cross-modal fusion based approach with scale-aware deep representation for RGB-D crowd counting and density estimation[J]. Expert Systems with Applications,2021,180(5):115071. [15] LIU Lingbo, CHEN Jiaqi, WU Hefeng, et al. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA. USA: IEEE, 2021: 4823-4833. [16] LI Yuhong, ZHANG Xiaofan, CHEN Deming. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, June 18-22, 2018, Salt Lake City, Utah. USA: IEEE, 2018: 1091-1100. [17] FISCHER M, VIGNES A. An imprecise bayesian approach to thermal runaway probability[C]//International Symposium on Imprecise Probability: Theories and Applications, July 6-9, 2021, University of Granada, Granada. Spain: PMLR, 2021: 150-160. [18] LIU Lingbo, CHEN Jiaqi, WU Hefeng, et al. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021. Nashville, TN. USA: IEEE, 2021: 4823-4833. [19] LIU Z, HE Z, WANG L, et al. Visdrone-cc2021: The vision meets drone crowd counting challenge results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 11- October 17, 2021, Montreal, BC, Canada. USA: IEEE, 2021: 2830-2838. [20] ZHOU Wujie, GUO Qinling, LEI Jingsheng, et al. ECFFNet: effective and consistent feature fusion network for RGB-T salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,32(3):1224-1235. doi: 10.1109/TCSVT.2021.3077058 [21] FAN J, YANG X, LU R, et al. Design and implementation of intelligent inspection and alarm flight system for epidemic prevention[J]. Drones,2021,5(3):68-82. doi: 10.3390/drones5030068 -