The loss function of the deep neural network is high dimensional, nonconvex and complex. So far, the geometric properties of the loss surface of the neural network have not been… Click to show full abstract
The loss function of the deep neural network is high dimensional, nonconvex and complex. So far, the geometric properties of the loss surface of the neural network have not been well understood. Different from most theoretical studies on the loss surface, this article makes the experimental exploration on the loss surface of the deep neural network, including trajectories of various adaptive optimization algorithms, the Hessian matrix of the loss function of the deep neural network, the curvature of the loss surface along the trajectories of the various adaptive optimization algorithms. It is found that the gradient direction of the adaptive optimization algorithms is almost perpendicular to the direction of the maximum curvature of the loss surface, while the gradient directions of the stochastic gradient descent (SGD) algorithm do not have such a rule. The Hessian matrix of the loss surface along the trajectory of the optimization algorithm is degraded, which is inconsistent with the hypothetical that nonsingular of the Hessian matrix in many theoretical studies of deep learning. Besides, this article proposes a new ensemble learning method of the neural network based on the scaling invariance of the ReLu neural network and mode connectivity.
               
Click one of the above tabs to view related content.