We propose a cost volume-based neural network for depth inference from multi-view images. We demonstrate that building a cost volume pyramid in a coarse-to-fine manner instead of constructing a cost… Click to show full abstract
We propose a cost volume-based neural network for depth inference from multi-view images. We demonstrate that building a cost volume pyramid in a coarse-to-fine manner instead of constructing a cost volume at a fixed resolution leads to a compact, lightweight network and allows us inferring high resolution depth maps to achieve better reconstruction results. To this end, we first build a cost volume based on uniform sampling of fronto-parallel planes across the entire depth range at the coarsest resolution of an image. Then, given current depth estimate, we construct new cost volumes iteratively to perform depth map refinement. We show that working on cost volume pyramid can lead to a more compact, yet efficient network structure compared with the Point-MVSNet on 3D points. We further show that the (residual) depth sampling can be fully determined by analytical geometric derivation, which serves as a principle for building compact cost volume pyramid. To demonstrate the effectiveness of our proposed framework, we extend our cost volume pyramid structure to the unsupervised depth inference scenario. Experimental results on benchmark datasets show that our model can perform 6x faster with similar performance as state-of-the-art methods for supervised scenario and demonstrates superior performance on unsupervised scenario.
               
Click one of the above tabs to view related content.