Multi-view stereo estimates the depth maps of multiple perspective images in a scene and then fuses them to generate a 3D point cloud of the scene, which is an essential… Click to show full abstract
Multi-view stereo estimates the depth maps of multiple perspective images in a scene and then fuses them to generate a 3D point cloud of the scene, which is an essential technology of 3D reconstruction. In this paper, we propose a deep learning method GDINet, applying probabilistic methods to the pyramid framework, which can significantly improve reconstruction quality. In detail, we first establish a Gaussian distribution for each image’s pixel and iterate it in the pyramid framework. The mean value is the estimated depth, and the variance represents the depth estimation error. In addition, we design a novel loss function with excellent convergence to train our network. Finally, we present an initialization module to generate the coarse Gaussian distribution, controlling the parameters in a reasonable range. Our results rank $2nd$ on both DTU and Tanks & Temples datasets, showing that our network has high accuracy, completeness, and robustness. We also make a visualization comparison on the BlendedMVS dataset (containing many aerial scene images) to demonstrate the generalization ability of our model.
               
Click one of the above tabs to view related content.