The cost-minimization problem for streaming workflow (SW) has already become increasingly important and even critical in stream big data processing, particularly for geographically distributed datacenters, because of its huge demand… Click to show full abstract
The cost-minimization problem for streaming workflow (SW) has already become increasingly important and even critical in stream big data processing, particularly for geographically distributed datacenters, because of its huge demand on computing and communicating resources. Existing virtual machine (VM) allocation algorithms in cloud computing have been widely applied to batch-processing models; however, none of them can be successfully applied to SW because: 1) they failed to adapt the continuous execution characteristic of SW; and 2) most of them are all based on the assumption that the price of traffic and VMs among datacenters are uniform. In this paper, we propose a transformation-based SW allocation algorithm with the goal of cost-minimization for stream big data processing in geographically distributed datacenters, considering the characteristics of SW and price heterogeneity among geographically distributed datacenters. We first propose a cost-aware workflow transformation framework based on eight well-designed and verified transformation rules for cost reduction to adapt the continuous execution characteristic of SW. We then formulate the joint VM-traffic optimization problem and show that it is NP-hard. To produce the optimal solution in polynomial time, we then transform the SW allocation problem into the minimum-cost maximum-flow problem, considering both traffic and VMs price heterogeneity. Finally, our experimental results validate the high cost efficiency of our approach with lower computing and communicating costs by optimizing the workflow specification and joint VM-traffic cost optimization.
               
Click one of the above tabs to view related content.