基于音视频信息融合的目标检测与跟踪算法

Object detection and tracking algorithm based on audio-visual information fusion

  • 摘要: 针对单一视觉跟踪算法易受遮挡影响的缺陷,提出一种基于音视频信息融合的目标检测与跟踪算法。整个算法框架包括视频检测与跟踪、声源定位、音视频信息融合跟踪3个模块。视频检测与跟踪模块采用YOLOv5m算法作为视觉检测的框架,使用无迹卡尔曼滤波和匈牙利算法实现多目标的跟踪与匹配;声源定位模块采用十字型麦克风阵列获取音频信息,结合各麦克风接收信号的时延计算声源方位;音视频信息融合跟踪模块构建音视频似然函数和音视频重要性采样函数,采用重要性粒子滤波作为音视频融合跟踪的算法,实现对目标的跟踪。在室内复杂环境下对算法性能进行测试,结果表明该算法跟踪准确率达到90.68%,相较于单一模态算法具有更好的性能。

     

    Abstract: Aiming at the defect that the single vision tracking algorithm is easily affected by the occlusion, an object detection and tracking algorithm based on the audio-video information fusion was proposed. The whole algorithm framework included three modules: video detection and tracking, acoustic source localization, audio-video information fusion tracking. The YOLOv5m algorithm was adopted by the video detection and tracking module as the framework of visual inspection, and the unscented Kalman filter and Hungary algorithm were used to achieve multi-object tracking and matching. The cross microphone array was adopted by the acoustic source localization module to obtain the audio information, and according to the time delay of receiving signals of each microphone, the acoustic source orientation was calculated. The audio-video likelihood function and audio-video importance sampling function were constructed by the audio-video information fusion tracking module, and the importance particle filter was used as the audio-video information fusion tracking algorithm to achieve object tracking. The performance of the algorithm was tested in complex indoor environment. The experimental results show that the tracking accuracy of the proposed algorithm reaches 90.68%, which has better performance than single mode algorithm.

     

/

返回文章
返回