Abstract:
In order to improve the robustness of crowd counting model to scale and optical noise, a multimodal image fusion network was designed. A statistical model for night crowd was proposed, and a sub network Rgb-T-net was designed. The network integrated the characteristics of thermal imaging and visible image, and the ability of network to judge the characteristics of thermal imaging and night crowd was enhanced. The proposed model used the adaptive Gaussian checking density diagram for regression, and the night vision training and testing were completed on the Rgb-T-CC data set. Through verification, the average absolute error of the network is 18.16, the mean square error is 32.14, and the recall rate of target detection is 97.65%. The counting performance and detection performance are superior to the current most advanced bimodal fusion method. The experimental results show that the proposed multimodal feature fusion network can solve the counting and detection problem in night vision environment, and the ablation experiment further proves the effectiveness of parameters of the fusion model.