6D object pose estimation is a longstanding computer vision problem. Existing deep learning-based methods have achieved inspiring results in this task. However, large-scale training data with annotations is extremely needed… Click to show full abstract
6D object pose estimation is a longstanding computer vision problem. Existing deep learning-based methods have achieved inspiring results in this task. However, large-scale training data with annotations is extremely needed to guarantee these methods’ performance, and acquiring real 6D object pose annotations is fairly labor-intensive and time-consuming. To overcome this drawback, we propose a semi-supervised pose estimation method using labeled synthetic data and unlabeled real data. For unlabeled real data, we form a self-supervised pipeline by minimizing the distance between the input point cloud, which is under ground-truth pose, and the model points transformed based on predicted pose. The labeled synthetic data is used to supervise the network to converge correctly. And we utilize a feature mapping to eliminate the domain gap between the real and synthetic features to further enhance the network’s performance. Moreover, we propose an attention-based pose estimation network, which can concentrate more on the distinguishing features, thus improving the accuracy of pose estimation. Experiments show that our proposed semi-supervised method is able to achieve good performance without the real annotations and outperforms all other methods relying on synthetic data or self-supervision strategy, indicating that the proposed method is effective.
               
Click one of the above tabs to view related content.