In modern optical networks, alarm analysis plays a pivotal role in detecting fault and guiding operators toward timely interventions. Timely and accurate root cause localization (RCL) can effectively reduce downtime,… Click to show full abstract
In modern optical networks, alarm analysis plays a pivotal role in detecting fault and guiding operators toward timely interventions. Timely and accurate root cause localization (RCL) can effectively reduce downtime, prevent cascading failures, and enhance overall system performance. However, the increasing scale and complexity of networks have posed challenges to RCL due to the huge volume of alarms generated during fault occurrences. The development of automatic RCL methods helps reduce the time costs and human errors associated with manual analysis. Most existing RCL methods face challenges, such as poor explainability, low adaptability to network changes, high learning costs, and lack of interactivity, reducing their credibility and usability in production environments. This article introduces a graph structure-enhanced large language model (LLM) for optical network fault detection, capable of performing explainable alarm RCL with improved adaptability and interactivity. Graph structures provide an intuitive means of expressing the relationships between topology and alarms, clearly visualizing alarm propagation paths, which aids in explainable RCL; LLM exhibits profound capabilities in semantic comprehension and language generation, which improves both explainability and interactivity. When integrated with algorithms, they hold promise for reducing operators’ learning costs, adding interactivity, and introducing new possibilities for the visualization and efficiency of complex tasks. We conducted evaluations and validations across multiple metrics using real alarm data collected from optical transport network (OTN). The results show that the proposed algorithm achieves an accuracy of over 86.8% across various complex scenarios; compared to the base model, the fine-tuned model exhibits an accuracy improvement of over 87%. Other results indicate that the proposed method enhances the explainability, adaptability, and interactivity of fault detection in optical network, showcasing significant potential for automated and intelligent network operations.
               
Click one of the above tabs to view related content.