Abstract:
In order to build a re-detection module suitable for long-term tracking, inspired by the GlobalTrack method which improves two-stage detection network, an efficient deep network for end-to-end re-detection of specific template targets was proposed. First, for more efficient fusion of template features on large-scale images, the depth-wise correlation method was improved by constructing a cross-information enhancement module, which encoded the information of search and template features with cross channel-attention information. In addition, the region proposal network (RPN) and region-based convolutional neural networks (RCNN) structure of traditional two-stage detection network were replaced with a dynamic instance interaction module, guiding the classification-and-regression stage of the detection network with template information as well as building an end-to-end sparse re-detection structure. Comparing results on LaSOT and OxUva long-term tracking datasets, the performance of proposed method is improved by 3%, and the real-time frame rate is improved by 173% compared with those of the original method. The experimental results show that the improved method can re-detect template targets more accurately and quickly in the whole image range.