LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters

Photo from wikipedia

It has become a recent trend that large volumes of data are generated, stored, and processed across geographically distributed datacenters. When popular data parallel frameworks, such as MapReduce and Spark,… Click to show full abstract

It has become a recent trend that large volumes of data are generated, stored, and processed across geographically distributed datacenters. When popular data parallel frameworks, such as MapReduce and Spark, are employed to process such geo-distributed data, optimizing the network transfer in communication stages becomes increasingly crucial to application performance, as the inter-datacenter links have much lower bandwidth than intra-datacenter links. In this article, we focus on exploiting the flexibility of multi-path routing for inter-datacenter flows of data analytic jobs, with the hope of better utilizing inter-datacenter links and thus improve job performance. We design an optimal multi-path routing and scheduling strategy to achieve the best possible network performance for all concurrent jobs, based on our formulation of an optimization problem that can be transformed into an equivalent linear programming (LP) problem to be efficiently solved. As a highlight of this article, we have implemented our proposed algorithm in the controller of an application-layer software-defined inter-datacenter overlay testbed, designed to provide transfer optimization service for Spark jobs. With extensive evaluations of our real-world implementation on Google Cloud, we have shown convincing evidence that our optimal multi-path routing and scheduling strategies have achieved significant improvements in terms of job performance.

Keywords: network; geo distributed; distributed datacenters; data analytic; inter datacenter; optimizing network

Journal Title: IEEE Transactions on Parallel and Distributed Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.