"Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution"

Predicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information of the object of interest in the frame as compared to the full-frame, from the partial observation. To this end, we propose an end-to-end two-stage architecture model that leverages pixel-level features awareness of spatiotemporal information of the object of interest. The first stage of our model is a classification block composed of 3 blocks layers: a background subtraction layer that enables the model to focus on the subject of interest followed by Deformable Convolution layers for feature extraction and finally an additive Softmax for the final classification. Learned information from the first stage is then transferred to the second stage composed of Long Short-Term Memory layers and a final loss function for prediction. The pervasive evaluation on the UT-Interaction, the HMDB51 as well as on the UCF-Sports benchmarks show the betterment of our model performance over threshold probability difference as compared to other solutions. And demonstrate an early action prediction at a lower observation ratio.

Keywords: video; action prediction; deformable convolution; prediction; observation

Journal Title: IEEE Access
Year Published: 2020

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended