Recent deep neural networks (DNNs) with several layers of feature representations rely on some form of skip connections to simultaneously circumnavigate optimization problems and improve generalization performance. However, the operations… Click to show full abstract
Recent deep neural networks (DNNs) with several layers of feature representations rely on some form of skip connections to simultaneously circumnavigate optimization problems and improve generalization performance. However, the operations of these models are still not clearly understood, especially in comparison to DNNs without skip connections referred to as plain networks (PlainNets) that are absolutely untrainable beyond some depth. As such, the exposition of this article is the theoretical analysis of the role of skip connections in training very DNNs using concepts from linear algebra and random matrix theory. In comparison with PlainNets, the results of our investigation directly unravel the following: 1) why DNNs with skip connections are easier to optimize and 2) why DNNs with skip connections exhibit improved generalization. Our investigation results concretely show that the hidden representations of PlainNets progressively suffer from information loss via singularity problems with depth increase, thus making their optimization difficult. In contrast, as model depth increases, the hidden representations of DNNs with skip connections circumnavigate singularity problems to retain full information that reflects in improved optimization and generalization. For theoretical analysis, this article studies in relation to PlainNets two popular skip connection-based DNNs that are residual networks (ResNets) and residual network with aggregated features (ResNeXt).
               
Click one of the above tabs to view related content.