"Overcoming Data Deficiency for Multi-Person Pose Estimation."

Building multi-person pose estimation (MPPE) models that can handle complex foreground and uncommon scenes is an important challenge in computer vision. Aside from designing novel models, strengthening training data is a promising direction but remains largely unexploited for the MPPE task. In this article, we systematically identify the key deficiencies of existing pose datasets that prevent the power of well-designed models from being fully exploited and propose the corresponding solutions. Specifically, we find that the traditional data augmentation techniques are inadequate in addressing the two key deficiencies, imbalanced instance complexity (IC) (evaluated by our new metric IC) and insufficient realistic scenes. To overcome these deficiencies, we propose a model-agnostic full-view data generation (Full-DG) method to enrich the training data from the perspectives of both poses and scenes. By hallucinating images with more balanced pose complexity and richer real-world scenes, Full-DG can help improve pose estimators' robustness and generalizability. In addition, we introduce a plug-and-play adaptive category-aware loss (AC-loss) to alleviate the severe pixel-level imbalance between keypoints and backgrounds (i.e., around 1:600). Full-DG together with AC-loss can be readily applied to both the bottom-up and top-down models to improve their accuracy. Notably, plugging into the representative estimators HigherHRNet and HRNet, our method achieves substantial performance gains of 1.0%-2.9% AP on the COCO benchmark, and 1.0%-5.1% AP on the CrowdPose benchmark.

Keywords: pose estimation; person pose; multi person; overcoming data

Journal Title: IEEE transactions on neural networks and learning systems
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended