Tracking multiple objects in a video sequence can be accomplished by identifying the objects appearing in the sequence and distinguishing between them. Therefore, many recent multi-object tracking (MOT) methods have… Click to show full abstract
Tracking multiple objects in a video sequence can be accomplished by identifying the objects appearing in the sequence and distinguishing between them. Therefore, many recent multi-object tracking (MOT) methods have utilized re-identification and distance metric learning to distinguish between objects by computing the similarity/dissimilarity scores. However, it is difficult to generalize such approaches for arbitrary video sequences, because some important information, such as the number of objects (classes) in a video, is not known in advance. Therefore, in this study, we applied a one-shot learning framework to the MOT problem. Our algorithm tracks objects by classifying newly observed objects into existing tracks, irrespective of the number of objects appearing in a video frame. The proposed method, called OneShotDA, exploits the one-shot learning framework based on an attention mechanism. Our neural network learns to classify unseen data samples using labels from a support set. Once the network has been trained, it predicts correct labels for newly received detection results based on the set of existing tracks. To analyze the effectiveness of our method, it was tested on the MOTchallenge benchmark datasets (MOT16 and MOT17 datasets). The results reveal that the performance of the proposed method was comparable with those of current state-of-the-art methods. In particular, it is noteworthy that the proposed method ranked first among the online trackers on the MOT17 benchmark.
               
Click one of the above tabs to view related content.