This paper considers the design of heterogeneous multi-cloud systems for big data storage and computing in the presence of cloud collusion and failures. A fundamental concept of such a system… Click to show full abstract
This paper considers the design of heterogeneous multi-cloud systems for big data storage and computing in the presence of cloud collusion and failures. A fundamental concept of such a system is the secrecy capacity, which represents the maximum amount of information that can be stored for each unit of storage space under the requirements of secure distributed computing. A capacity-achieving code is designed for matrix multiplication, a computing subroutine widely used in machine learning applications. The code allows fast parallel decoding and unequal data allocation in the clouds. Such a flexibility leads naturally to the idea of optimizing data allocation to minimize the computing time. Given any feasible storage budget, the optimal solution is derived, characterizing explicitly the fundamental tradeoff between storage and computing. Furthermore, it is shown via majorization theory that the whole tradeoff curve improves if the cloud computing rates are more even. Experiments on Amazon EC2 clusters are conducted, corroborating our theoretical observations and the negligibility of decoding overhead.
               
Click one of the above tabs to view related content.