Abstract:
Aiming at the defect that the single vision tracking algorithm is easily affected by the occlusion, an object detection and tracking algorithm based on the audio-video information fusion was proposed. The whole algorithm framework included three modules: video detection and tracking, acoustic source localization, audio-video information fusion tracking. The YOLOv5m algorithm was adopted by the video detection and tracking module as the framework of visual inspection, and the unscented Kalman filter and Hungary algorithm were used to achieve multi-object tracking and matching. The cross microphone array was adopted by the acoustic source localization module to obtain the audio information, and according to the time delay of receiving signals of each microphone, the acoustic source orientation was calculated. The audio-video likelihood function and audio-video importance sampling function were constructed by the audio-video information fusion tracking module, and the importance particle filter was used as the audio-video information fusion tracking algorithm to achieve object tracking. The performance of the algorithm was tested in complex indoor environment. The experimental results show that the tracking accuracy of the proposed algorithm reaches 90.68%, which has better performance than single mode algorithm.