We study the task of single person dense pose estimation. Specifically, given a human-centric image, we learn to map all human pixels onto a 3D, surface-based human body model. Existing… Click to show full abstract
We study the task of single person dense pose estimation. Specifically, given a human-centric image, we learn to map all human pixels onto a 3D, surface-based human body model. Existing methods approach this problem by fitting deep convolutional networks on sparse annotated points where the regression on both surface coordinate components for each body part is uncorrelated and optimized separately. In this work, we devise a novel, unified loss function that explicitly characterizes the correlation for surface coordinates regression, achieving significant improvements in both accuracy and efficiency. Furthermore, based on an observation that the image-to-surface correspondence is intrinsically invariant to geometric transformations from input images, we propose to enforce a geometric equivariance consistency on the target mapping, thereby allowing us to enable reliable supervision on large amounts of unlabeled pixels. We conduct comprehensive studies on the effectiveness of our approach using a quite simple network. Extensive experiments on the DensePose-COCO dataset show that our model achieves superior performance against previous state-of-the-art methods with much less computation complexity. We hope that our work would serve as a solid baseline for future study in the field. The code will be available at https://github.com/Johnqczhang/densepose.pytorch.
               
Click one of the above tabs to view related content.