Coflow scheduling is critical to data-parallel applications in data centers. While schemes like Varys can achieve optimal performance, they require a priori information about coflows which is hard to obtain… Click to show full abstract
Coflow scheduling is critical to data-parallel applications in data centers. While schemes like Varys can achieve optimal performance, they require a priori information about coflows which is hard to obtain in practice. Existing non-clairvoyant solutions like Aalo generalize least attained service (LAS) scheduling discipline to address this issue. However, they fail to identify the bottleneck flows in a coflow and tend to allocate excessive bandwidth to the non-bottleneck flows, leading to bandwidth wastage and inferior overall performance. To this end, we present Fai that strives to improve the overall coflow performance by accelerating the bottleneck flows without priori knowledge. Fai employs bottleneck-aware scheduling. It adopts loose coordination to update coflow priority and flow rates based on total bytes sent. In addition, Fai detects bottleneck flows based on a flow’s rate and bytes sent, and de-allocates bandwidth for other flows to match the bottleneck rate without affecting the coflow completion time (CCT). The saved bandwidth is then distributed among coflows according to their priority to improve overall performance. Testbed evaluation on a 40-node cluster shows that Fai improves average (P95) CCT by 1.73× (3.43×), compared to Aalo. Large-scale trace-driven simulations also show that Fai outperforms Aalo substantially.
               
Click one of the above tabs to view related content.