In a distributed storage system, proactive fault tolerance is used to provide an extra layer of data protection over traditional reactive fault tolerance. The placement of replicas across storage servers… Click to show full abstract
In a distributed storage system, proactive fault tolerance is used to provide an extra layer of data protection over traditional reactive fault tolerance. The placement of replicas across storage servers in a replication-based system affects both rebuild times and the vulnerability of data to device failures, and therefore influences system reliability. The effects of proactive fault tolerance and data placement on the reliability of distributed storage systems are intricate and understudied. This article proposes reliability equations for predicting the number of data loss in proactive replication-based systems using random and copyset placement schemes, over a period of time. The reliability equations represent the effect of proactive fault tolerance, disk-operation failures, server-operation failures, rebuild bandwidths, and replica placement schemes upon the systems reliability, and take time-based Weibull distributions to model failure and repair processes. Moreover, a Monte-Carlo based method is designed to simulate the operation of proactive distributed storage systems. The results of the reliability equations are in good accord with those of the simulations, which verifies the correctness of the equations. The proposed equations can help system designers readily optimize tradeoffs and compare schemes, facilitating distributed storage systems design.
               
Click one of the above tabs to view related content.