Fault tolerance and load balancing are two key roles in resource allocation against failures. This paper proposes a primary and backup resource allocation model with preventive recovery priority setting to… Click to show full abstract
Fault tolerance and load balancing are two key roles in resource allocation against failures. This paper proposes a primary and backup resource allocation model with preventive recovery priority setting to minimize a weighted value of unavailable probability (W-UP) against multiple failures. W-UP considers the probability of unsuccessful recovery and the maximum unavailable probability after recovery among physical nodes. We consider that each node fails with a workload-dependent failure probability; each failure pattern occurs with a probability. The workload-dependent failure probability is a non-decreasing function revealing an empirical relationship between the workload and the failure probability for each physical node. We introduce a recovery strategy to handle the workload variation which is determined at the operation start time and can be applied for each failure pattern. Once a failure pattern occurs, the recoveries are operated according to the priority setting to promptly recover the functions hosted by failed nodes. We also discuss an approach to obtain unsuccessful recovery probability with considering the maximum number of arbitrary recoverable functions by a set of available nodes without the priority setting. We formulate the optimization problem as a mixed integer linear programming (MILP) problem. We develop a heuristic algorithm to solve larger size problems in a practical time. The developed heuristic algorithm is approximately 729 times faster than the MILP approach with 1.6% performance penalty on W-UP. The numerical results observe that the proposed model reduces W-UP compared with baselines.
               
Click one of the above tabs to view related content.