Federated Learning (FL) serves privacy-preserving collaborative learning among multiple isolated parties, while retaining their privacy data locally. Cross-device and cross-silo FL have achieved great success in cross-domain applications, in which… Click to show full abstract
Federated Learning (FL) serves privacy-preserving collaborative learning among multiple isolated parties, while retaining their privacy data locally. Cross-device and cross-silo FL have achieved great success in cross-domain applications, in which the scarce communication resource is the primary bottleneck. Driven by the need to combine heterogeneous machines from different parties to build a shared data center, we found intra-domain FL, a new type of FL in which isolated parties collaborate in the shared data center, and strong computational heterogeneity becomes the primary bottleneck. To mitigate the training inefficiency caused by stragglers, this paper proposes an efficient synchronization algorithm ESync, which allows parties to train different iterations locally under the coordination of a novel scheduler State Server. We give the boundaries of weight divergence and optimality gap of ESync, and analyze the trade-off between convergence accuracy and communication efficiency. Extensive experiments are conducted to compare ESync with SSGD, ASGD, DC-ASGD, FedAvg, FedAsync, TiFL and FedDrop under strong computational heterogeneity. Numerical results show that ESync achieves great speed up without loss of accuracy, and therefore demonstrate the effectiveness of ESync in both training efficiency and converged accuracy.
               
Click one of the above tabs to view related content.