Multi-object tracking (MOT) has wide applications in the fields of video analysis and signal processing. A major challenge in MOT is how to associate the noisy detections into long and… Click to show full abstract
Multi-object tracking (MOT) has wide applications in the fields of video analysis and signal processing. A major challenge in MOT is how to associate the noisy detections into long and continuous trajectories. In this letter, we address the association problem at the tracklet-level, and mainly focus on the appearance representation designed for tracklets. A multitask convolutional neural network is proposed to learn the discriminative features and spatial-temporal attentions jointly. In particular, we decompose an object in a static image with spatial attentions, and then aggregate multiple features in a tracklet based on the temporal attentions. Appearance misalignment that caused by occlusion and inaccurate bounding is then mitigated by multi-feature aggregation. Experimental results on two challenging MOT benchmarks have demonstrated the effectiveness of the proposed method and shown significant improvement on the quality of tracking identities.
               
Click one of the above tabs to view related content.