The next generation of intelligent traffic signal control systems needs multi-object tracking (MOT) algorithms that can track vehicles hundreds of meters away from traffic intersections. To facilitate the integration of… Click to show full abstract
The next generation of intelligent traffic signal control systems needs multi-object tracking (MOT) algorithms that can track vehicles hundreds of meters away from traffic intersections. To facilitate the integration of long-range MOT into existing traffic infrastructure, the tracker must achieve a good balance of cost-effectiveness, accuracy, and efficiency. Although much progress has been made on deep-learning-based MOT for video, these approaches have limited applicability for edge deployment since deep neural networks typically require power-hungry hardware accelerators to achieve real-time performance. Furthermore, traffic cameras have a field of view limited to near the intersection. To address these shortcomings, we introduce a practical MOT framework that fuses tracks from a novel video MOT neural architecture designed for low-power edge devices with tracks from a commercially available traffic radar. The proposed neural architecture achieves high efficiency by using depthwise separable convolutions to jointly predict object detections alongside a dense grid of features at a single scale for spatiotemporal object re-identification. A simple and effective late fusion strategy is also presented where tracks of distant vehicles from a traffic radar are handed over to the video tracker within a region where the sensor fields of view overlap. Our video tracker is empirically validated on the UA-DETRAC video MOT benchmark for traffic intersections and the multi-sensor tracker is evaluated on video and radar data collected and labeled by the authors at an instrumented traffic intersection.
               
Click one of the above tabs to view related content.