The ubiquitous adoption of public clouds has resulted in the bloom of big data, Internet of Things, and artificial intelligence (AI) and a great capacity for applications. However, the efficient… Click to show full abstract
The ubiquitous adoption of public clouds has resulted in the bloom of big data, Internet of Things, and artificial intelligence (AI) and a great capacity for applications. However, the efficient operation of these applications is challenging due to the infrastructure heterogeneity, computing hierarchy, scale distribution, and stochastic user behaviors. Among these challenges, reliability has received intense interest in the cloud community. Most existing research on the reliability of clouds is on fault discovery or fault tolerance. However, a public cloud has a wider range of failures, which makes it difficult to analyze the association between reliability and other indicators. More accurate reliability modeling methods and advanced optimization approaches are needed to ensure the reliable operation of a public cloud. To address these challenges, this article models the reliability of cloud computing from the service perspective and reasonably divides cloud services into the request processing phase and request execution phase when facing multiple users and service types. Moreover, this article combines a variety of AI-based methods, such as autonomous scheduling mechanisms, anomaly detection, and autonomous learning improvement, which can effectively adapt to the dynamic and complex cloud service environment and ensure the service reliability of a public cloud. Numerical results clearly indicate that the proposed service reliability model and autonomous optimizations are efficient for recovering the reliability of cloud computing systems in terms of multiple AI-based failure prognostics and intelligent learning ability.
               
Click one of the above tabs to view related content.