Low latency stream processing on large clusters consisting of hundreds to thousands of servers is an increasingly important challenge. A crucial barrier to tackling this challenge is stragglers, i.e., tasks… Click to show full abstract
Low latency stream processing on large clusters consisting of hundreds to thousands of servers is an increasingly important challenge. A crucial barrier to tackling this challenge is stragglers, i.e., tasks that are significantly straggling behind others in processing the stream data. However, prior straggler mitigation solutions have significant limitations. They balance streaming workloads among tasks but may incur imbalanced backlogs when the workloads exhibit variance, causing stragglers as well. Fortunately, we observe that carefully scheduling the outgoing tuples of different tasks can yield benefits for balancing backlogs, and thus avoids stragglers. To this end, we present Hone, a tuple scheduler that aims to minimize the maximum queue backlog of all tasks over time. Hone leverages an online Largest-Backlog-First (LBF) algorithm with a provable good competitive ratio to perform efficient tuple scheduling. We have implemented Hone based on Apache Storm and evaluated it extensively via both simulations and testbed experiments. Our results show that under the same workload balancing strategy–shuffle grouping, Hone outperforms the original Storm significantly, with the end-to-end tuple processing latency reduced by 78.7 percent on average.
               
Click one of the above tabs to view related content.