Distributed second-order optimization, as an effective strategy for training large-scale machine learning systems, has been widely investigated due to its low communication complexity. However, the existing distributed second-order optimization algorithms,… Click to show full abstract
Distributed second-order optimization, as an effective strategy for training large-scale machine learning systems, has been widely investigated due to its low communication complexity. However, the existing distributed second-order optimization algorithms, including distributed approximate Newton (DANE), accelerated inexact DANE (AIDE), and statistically preconditioned accelerated gradient (SPAG), are all required to precisely solve an expensive subproblem up to the target precision. Therefore, this causes these algorithms to suffer from high computation costs and this hinders their development. In this article, we design a novel distributed second-order algorithm called the accelerated distributed approximate Newton (ADAN) method to overcome the high computation costs of the existing ones. Compared with DANE, AIDE, and SPAG, which are constructed based on the relative smooth theory, ADAN's theoretical foundation is built upon the inexact Newton theory. The different theoretical foundations lead to handle the expensive subproblem efficiently, and steps required to solve the subproblem are independent of the target precision. At the same time, ADAN resorts to the acceleration and can effectively exploit the objective function's curvature information, making ADAN to achieve a low communication complexity. Thus, ADAN can achieve both the communication and computation efficiencies, while DANE, AIDE, and SPAG can achieve only the communication efficiency. Our empirical study also validates the advantages of ADAN over extant distributed second-order algorithms.
               
Click one of the above tabs to view related content.