Deep convolutional neural networks for dense prediction tasks are commonly optimized using synthetic data, as generating pixel-wise annotations for real-world data is laborious. However, the synthetically trained models do not… Click to show full abstract
Deep convolutional neural networks for dense prediction tasks are commonly optimized using synthetic data, as generating pixel-wise annotations for real-world data is laborious. However, the synthetically trained models do not generalize well to real-world environments. This poor "synthetic to real" (S2R) generalization we address through the lens of shortcut learning. We demonstrate that the learning of feature representations in deep convolutional networks is heavily influenced by synthetic data artifacts (shortcut attributes). To mitigate this issue, we propose an Information-Theoretic Shortcut Avoidance (ITSA) approach to automatically restrict shortcut-related information from being encoded into the feature representations. Specifically, our proposed method minimizes the sensitivity of latent features to input variations: to regularize the learning of robust and shortcut-invariant features in synthetically trained models. To avoid the prohibitive computational cost of direct input sensitivity optimization, we propose a practical yet feasible algorithm to achieve robustness. Our results show that the proposed method can effectively improve S2R generalization in multiple distinct dense prediction tasks, such as stereo matching, optical flow, and semantic segmentation. Importantly, the proposed method enhances the robustness of the synthetically trained networks and outperforms their fine-tuned counterparts (on real data) for challenging out-of-domain applications.
               
Click one of the above tabs to view related content.