Conventional prediction approaches for traffic scenes primarily predict the future states of visible objects (i.e., not in blind spots) based on their current observations. This study focused on the prediction… Click to show full abstract
Conventional prediction approaches for traffic scenes primarily predict the future states of visible objects (i.e., not in blind spots) based on their current observations. This study focused on the prediction of future states of objects in blind spots (e.g., those outside the filed-of-view or occluded regions) based on the current observations of other visible objects. We proposed a method that predicts the appearance of vehicles from a blind spot based on the behaviors of visible pedestrians who observe vehicles in the blind spot. Our proposed method utilizes a spatiotemporal 3D convolutional neural network and learns pedestrian behaviors for predictions. The method explicitly represents subtle motions and the surrounding environments of pedestrians using pose estimation and semantic segmentation. To conduct evaluation experiments, we built two datasets of videos capturing real traffic scenes. The datasets are collected by cameras with and without ego-motions. Using the datasets, we conducted experiments not only on simpler configurations but also on realistic traffic environments. Based on the experimental results, the following conclusions could be obtained: (i) our proposed method achieved a high performance at a level similar to that of humans in our prediction task, and predicted the appearance of vehicles from blind spots more than 1.5 s before they actually appeared. (ii) Explicit representations of pose and semantic masks captured information complementary to RGB videos, and ensembling the representations improved the prediction performance. (iii) Fine-tuning the models using videos with ego-motions is important to achieve good prediction in the videos captured by driving cars.
               
Click one of the above tabs to view related content.