Abstract Parallelization framework has become a necessity to speed up the training of deep neural networks (DNN) recently. In the typical parallelization framework, called MA-DNN, the parameters of local models… Click to show full abstract
Abstract Parallelization framework has become a necessity to speed up the training of deep neural networks (DNN) recently. In the typical parallelization framework, called MA-DNN, the parameters of local models are periodically averaged to get a global model. However, since DNN is a highly non-convex model, averaging parameters cannot ensure that such global model can perform better than those local models. To tackle this problem, we introduce a new parallelization framework, called EC-DNN. In this framework, we propose to aggregate the local models by the simple ensemble, i.e., averaging the outputs of local models instead of the parameters. As most of prevalent loss functions are convex to the output of DNN, the performance of the global model produced by the simple ensemble is guaranteed to be at least as good as the average performance of local models. To get more performance improvement, we extend the simple ensemble to the generalized ensemble, which produces the global model by the weighted sum instead of the average of the outputs of the local models. However, the model size will explode since each round of ensemble can give rise to multiple times size increment. Thus, we carry out model compression after each ensemble to reduce the size of the global model to be the same as the local ones. Our experimental results show that EC-DNN can achieve better speedup than MA-DNN without loss of accuracy, and there is even accuracy improvement sometimes.
               
Click one of the above tabs to view related content.