Vehicle and pedestrian detection is one of the critical tasks in autonomous driving. Since heterogeneous techniques have been proposed, the selection of a detection system with an appropriate balance among… Click to show full abstract
Vehicle and pedestrian detection is one of the critical tasks in autonomous driving. Since heterogeneous techniques have been proposed, the selection of a detection system with an appropriate balance among detection accuracy, speed and memory consumption for a specific task has become very challenging. To deal with this issue and to provide guidance for model selection, this paper analyzes several mainstream object detection architectures, including Faster R-CNN, R-FCN, and SSD, along with several typical feature extractors, such as ResNet50, ResNet101, MobileNet_V1, MobileNet_V2, Inception_V2 and Inception_ResNet_V2. By conducting extensive experiments using the KITTI benchmark, which is a commonly used street dataset, we demonstrate that Faster R-CNN ResNet50 obtains the best average precision (AP) (58%) for vehicle and pedestrian detection, with a speed of 8.6 FPS. Faster R-CNN Inception_V2 performs best for detecting cars and detecting pedestrians respectively (74.5% and 47.3%). ResNet101 consumes the highest memory (9907 MB) and has the largest number of parameters (64.42 millions), and Inception_ResNet_V2 is the slowest model (3.05 FPS). SSD MobileNet_V2 is the fastest model (70 FPS), and SSD MobileNet_V1 is the lightest model in terms of memory usage (875 MB), both of which are suitable for applications on mobile and embedded devices.
               
Click one of the above tabs to view related content.