Most existing methods pay much attention to how to improve the accuracy of human pose estimation results. They usually ignore what the size of their model is. However, besides accuracy,… Click to show full abstract
Most existing methods pay much attention to how to improve the accuracy of human pose estimation results. They usually ignore what the size of their model is. However, besides accuracy, real-time and speed are also important. In this paper, a new module named Densely Connected Residual Module is presented to effectively decrease the number of parameters in our network. We introduce our module into the backbone of High-Resolution Net. In addition, we change direct addition fusion into pyramid fusion at the end of the network. No need for ImageNet pre-training sharply decreases the total time of our training processes. We do our experiments over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. As a result, we achieve a decrease on number of parameters and calculated amount, respectively by around 72% and 14%, making our network more lightweight than High-Resolution Net. During testing process, our model can predict an image at a speed of 25 ms per image, which also achieves real-time fundamentally. The code has been available at https://github.com/consistent1997/LDCRN .
               
Click one of the above tabs to view related content.