Recently, application of network systems (e.g., cloud computing systems) is increasingly prevalent for achieving integration, sharing and efficient utilization of various resources. A parallel task in a network system has… Click to show full abstract
Recently, application of network systems (e.g., cloud computing systems) is increasingly prevalent for achieving integration, sharing and efficient utilization of various resources. A parallel task in a network system has multiple subtasks that can be executed in different servers in parallel. However, failures of any subtask inevitably result in that the entire task cannot be complete. To avoid such a situation, the network system can create some copies from a subtask and make them run on different servers simultaneously. This redundant parallel execution manner is an efficient approach to improve performance and guarantee reliability. However, it also brings complexity in modeling, evaluation and optimization. For example, link failures inevitably lead to inaccessibility of some servers, and server failures also result in that subtasks hosted on the server cannot be complete. This is the complicated failure correlation that cannot be ignored in modeling and evaluation. This paper first presents a reliability-performance correlation model for a redundant parallel task in the network system. The model captures precedence constraints of subtasks, multiple types of failures and complicated failure correlations to improve fidelity. This paper also design an algorithm that encompasses the Graph theory and the Bayesian theorem to evaluate a performability metric, which can be used to quantify important reliability-performance correlation. Finally, a heuristic algorithm is designed to search an optimal task execution strategy that maximizes the performability metric. Illustrative examples are presented.
               
Click one of the above tabs to view related content.