LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Efficient Time-Evolving Stream Processing at Scale

Photo from wikipedia

Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balancing on these time-evolving… Click to show full abstract

Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balancing on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel load balancing mechanism (named FISH), which can provide the efficient time-evolving stream processing at scale through recent hot keys identification and worker assignment. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within the bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balancing within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous computation). We also propose to heuristically infer the accurate information of remote workers through computation rather than communication for cost-efficient worker assignment. We have integrated our approach into Apache Storm. Our results on a cluster of 128 nodes for both synthetic and real-world stream datasets show that FISH significantly outperforms state-of-the-arts with the average and the 99th percentile latency reduction by 87.12 and 76.34 percent (versus W-Choices), and memory overhead reduction by 96.66 percent (versus Shuffle Grouping).

Keywords: time; time evolving; efficient time; stream processing; evolving stream

Journal Title: IEEE Transactions on Parallel and Distributed Systems
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.