LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Trading Cost and Throughput in Geo-Distributed Analytics With A Two Time Scale Approach

Photo by austindistel from unsplash

In the era of global-scale services, analytical queries are performed on datasets that span multiple data centers (DCs). Such geo-distributed queries generate a large amount of inter-DC data transfers at… Click to show full abstract

In the era of global-scale services, analytical queries are performed on datasets that span multiple data centers (DCs). Such geo-distributed queries generate a large amount of inter-DC data transfers at run time. Due to the expensive inter-DC bandwidth, various methods have been proposed to reduce the traffic cost in geo-distributed data analytics. However, current methods do not attempt to address the throughput issue in geo-distributed analytics. In this article, we target at characterizing and optimizing a cost-throughput tradeoff problem in geo-distributed data analytics. Our objectives are two-fold: (1) we minimize the inter-DC traffic cost when serving geo-distributed analytics with uncertain query demand, and (2) we maximize the system throughput, in terms of the number of query requests that can be successfully served with guaranteed queuing delay. Specifically, we formulate a stochastic optimization problem that seamlessly combines these two objectives. To solve this problem, we take advantage of Lyapunov optimization techniques to design and analyze a two-timescale online control framework. Without prior knowledge of future query requests, this framework makes online decisions on input data placement and admission control of query requests. Rigorous theoretical analyses show that our framework can achieve a near-optimal solution and maintain system stability and robustness as well. Extensive trace-driven simulation results further demonstrate that our framework is capable of reducing inter-DC traffic cost, improving system throughput, and guaranteeing a maximum delay for each query request.

Keywords: distributed analytics; cost; cost throughput; inter; geo distributed

Journal Title: IEEE Transactions on Cloud Computing
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.