This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designated for a fixed number of stragglers, we develop a new scheme called… Click to show full abstract
This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designated for a fixed number of stragglers, we develop a new scheme called Adaptive Gradient Coding (AGC) with flexible communication cost for varying number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost by allowing workers to send multiple signals sequentially to the master. In particular, it can minimize the communication cost according to the unknown real-time number of stragglers in practical environments. In addition, we present a Group AGC (G-AGC) by combining the group idea with AGC to resist more stragglers in some situations. The numerical and simulation results demonstrate that our adaptive schemes can achieve the smallest average running time.
               
Click one of the above tabs to view related content.