Visual tracking of multiple objects is an essential component for a perception system in autonomous driving vehicles. One of the favorable approaches is the tracking-by-detection paradigm, which links current detection… Click to show full abstract
Visual tracking of multiple objects is an essential component for a perception system in autonomous driving vehicles. One of the favorable approaches is the tracking-by-detection paradigm, which links current detection hypotheses to previously estimated object trajectories (also known as tracks) by searching appearance or motion similarities between them. As this search operation is usually based on a very limited spatial or temporal locality, the association can fail in cases of motion noise or long-term occlusion. In this paper, we propose a novel tracking method that solves this problem by putting together information from both enlarged structural and temporal domain. For efficiency without loss of optimality, this approach is decomposed in to three stages, with each dealing with only one constrained association task, and thus, it follows the alternating optimization fashion. In our approach, detections are first assembled into small tracklets based on meta-measurements of object affinity. The association task for tracklets-to-tracks is solved by structural information based on a motion pattern between them. Here, we propose new rules to decouple the processing time from the tracklet length. Furthermore, constraints from temporal domain are introduced to recover objects, which are long-time disappearing due to failed detection or long-term occlusion. By putting together the heterogeneous domain information, our approach exhibits an improved state-of-the-art performance on standard benchmarks. With relatively little processing time, an online and real-time tracking is also permitted in our approach.
               
Click one of the above tabs to view related content.