LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Exploiting Stragglers in Distributed Computing Systems With Task Grouping

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to… Click to show full abstract

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to mitigating stragglers is work replication, where only the first completion among replicated tasks is accepted, discarding the others. However, discarding work leads to resource wastage. In this article, we propose a method for exploiting the work completed by stragglers rather than discarding it. The idea is to increase the granularity of the assigned work, and to increase the frequency of worker updates. We show that the proposed method reduces the completion time of tasks via experiments performed on a simulated cluster as well as on Amazon EC2 with Apache Hadoop.

Keywords: stragglers distributed; distributed computing; computing systems; systems task; exploiting stragglers; task grouping

Journal Title: IEEE Transactions on Services Computing
Year Published: 2024

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.