Although 3D hand pose estimation has made significant progress in recent years with the development of the deep neural network, most learning-based methods require a large amount of labeled data… Click to show full abstract
Although 3D hand pose estimation has made significant progress in recent years with the development of the deep neural network, most learning-based methods require a large amount of labeled data that is time-consuming to collect. In this paper, we propose a dual-branch self-boosting framework for self-supervised 3D hand pose estimation from depth images. First, we adopt a simple yet effective image-to-image translation technology to generate realistic depth images from synthetic data for network pre-training. Second, we propose a dual-branch network to perform 3D hand model estimation and pixel-wise pose estimation in a decoupled way. Through a part-aware model-fitting loss, the network can be updated according to the fine-grained differences between the hand model and the unlabeled real image. Through an inter-branch loss, the two complementary branches can boost each other continuously during self-supervised learning. Furthermore, we adopt a refinement stage to better utilize the prior structure information in the estimated hand model for a more accurate and robust estimation. Our method outperforms previous self-supervised methods by a large margin without using paired multi-view images and achieves comparable results to strongly supervised methods. Besides, by adopting our regenerated pose annotations, the performance of the skeleton-based gesture recognition is significantly improved.
               
Click one of the above tabs to view related content.