Deep learning method for 6D object pose estimation based on RGB image and depth (RGB-D) has been successfully applied to robot grasping. The fusion of RGB and depth is one… Click to show full abstract
Deep learning method for 6D object pose estimation based on RGB image and depth (RGB-D) has been successfully applied to robot grasping. The fusion of RGB and depth is one of the most important difficulties. Previous works on the fusion of these two features are mostly concatenated together without considering the different contributions of the two types of features to pose estimation. We propose a selective embedding with gated fusion structure called SEGate, which can adjust the weights of RGB and depth features adaptively. Furthermore, we aggregate the local features of point clouds according to the distance between them. More specifically, the close point clouds contribute a lot to local features, while the distant point clouds contribute a little. Experiments show that our approach achieves the state-of-art performance in both LineMOD and YCB-Video datasets. Meanwhile, our approach is more robust to the pose estimation of occluded objects.
               
Click one of the above tabs to view related content.