Estimating the pose of an object is essential for robot manipulation. In many applications the spatial and geometric relations between the object and the other parts of the world, e.g.… Click to show full abstract
Estimating the pose of an object is essential for robot manipulation. In many applications the spatial and geometric relations between the object and the other parts of the world, e.g. the relation between the object and its supporting plane, are a-priori known or can be assumed with a certain accuracy. This information can be leveraged for pose estimation. In this work, we show how this information can be formulated as multimodal prior and probabilistically fused with pose information that a CNN extracts from an image. For this purpose, the CNN pipeline from prior work is utilized. In the cases where the prior fits the ground truth, the approach is able to propel monocular results to binocular / depth data levels. Importantly, in the cases of no fitting priors, the pose estimation does not get negatively affected. The proposed method was evaluated on the T-Less dataset and used in a sample robotic application.
               
Click one of the above tabs to view related content.