Robots operating in real-world settings often need to plan interactions with surrounding scene elements and therefore, it is crucial for them to understand their workspace at the level of individual… Click to show full abstract
Robots operating in real-world settings often need to plan interactions with surrounding scene elements and therefore, it is crucial for them to understand their workspace at the level of individual objects. In this spirit, this work presents a novel approach to progressively build instance-level, dense 3D maps from color and depth cues acquired by either a moving RGB-D sensor or a camera-LiDAR setup, whose pose is being tracked. The proposed framework processes each input RGB image with a semantic instance segmentation neural network and uses depth information to extract a set of per-frame, semantically labeled 3D instance segments, which then get matched to object instances already identified in previous views. Following integration of these newly detected instance segments in a global volumetric map, an efficient label diffusion scheme that considers multi-view instance predictions together with the reconstructed scene geometry is used to refine 3D segmentation boundaries. Experiments on indoor benchmarking RGB-D sequences show that the proposed system achieves state-of-the-art performance in terms of 3D segmentation accuracy, while reducing the computational processing cost required at each frame. Furthermore, the applicability of the system to challenging domains outside the traditional office scenes is demonstrated by testing it on a robotic excavator equipped with a calibrated camera-LiDAR setup, with the goal of segmenting individual boulders in a highly cluttered construction scenario.
               
Click one of the above tabs to view related content.