The tremendous progress of deep convolution neural networks has shown promising results on the classification of various sports activities. However, the accurate localization of a particular sports event or activity… Click to show full abstract
The tremendous progress of deep convolution neural networks has shown promising results on the classification of various sports activities. However, the accurate localization of a particular sports event or activity in a continuous video stream is still a challenging problem. The accurate detection of sports actions enables the comparison of different performances, objectively. In this work, we propose the DiveNet action localization module to detect the springboard diving sports action in an unconstrained environment. We used Temporal Convolution Network (TCN) over a backbone feature extractor to localize diving actions, with low latency. We estimate the divers center of mass (COM) trajectory and the peak dive height using the temporal demarcations provided by the action localization step via the projectile motion formula. In addition, we train a DiveNet pose regression network, which extends the Unipose architecture with direct physical parameter estimation, i.e COM and 2D joint keypoints. We propose a new homography computation method between the diving motion plane and the image-view for each dive. This enables the representation of physical parameters in metric scale, without any calibration. We release the first publicly available diving sports video dataset, recorded at 60 Hz with a static camera setup for different springboard heights. DiveNet action localization achieves an accuracy of 95% with a single frame latency (< 25 ms). The DiveNet pose regression model shows competitive results around 70% PCK on different diving pose datasets. We achieve COM accuracy of 6 pixels, dive peak height sensitivity of 20 cm and mean joint angle errors around 10 degrees.
               
Click one of the above tabs to view related content.