We introduce a novel framework for 3D scene reconstruction with simultaneous object annotation, using a pre-trained 2D convolutional neural network (CNN), incremental data streaming, and remote exploration, with a virtual… Click to show full abstract
We introduce a novel framework for 3D scene reconstruction with simultaneous object annotation, using a pre-trained 2D convolutional neural network (CNN), incremental data streaming, and remote exploration, with a virtual reality setup. It enables versatile integration of any 2D box detection or segmentation network. We integrate new approaches to (i) asynchronously perform dense 3D-reconstruction and object annotation at interactive frame rates, (ii) efficiently optimize CNN results in terms of object prediction and spatial accuracy, and (iii) generate computationally-efficient colliders in large triangulated 3D-reconstructions at run-time for 3D scene interaction. Our method is novel in combining CNNs with long and varying inference time with live 3D-reconstruction from RGB-D camera input. We further propose a lightweight data structure to store the 3D-reconstruction data and object annotations to enable fast incremental data transmission for real-time exploration with a remote client, which has not been presented before. Our framework achieves update rates of 22 fps (SSD Mobile Net) and 19 fps (Mask RCNN) for indoor environments up to 800 m 3 . We evaluated the accuracy of 3D-object detection. Our work provides a versatile foundation for semantic scene understanding of large streamed 3D-reconstructions, while being independent from the CNN’s processing time. Source code is available for non-commercial use.
               
Click one of the above tabs to view related content.