Background For patients with stage T1-T2 esophageal squamous cell carcinoma (ESCC), accurately predicting lymph node metastasis (LNM) remains challenging. We aimed to investigate the performance of machine learning (ML) models… Click to show full abstract
Background For patients with stage T1-T2 esophageal squamous cell carcinoma (ESCC), accurately predicting lymph node metastasis (LNM) remains challenging. We aimed to investigate the performance of machine learning (ML) models for predicting LNM in patients with stage T1-T2 ESCC. Methods Patients with T1-T2 ESCC at three centers between January 2014 and December 2019 were included in this retrospective study and divided into training and external test sets. All patients underwent esophagectomy and were pathologically examined to determine the LNM status. Thirty-six ML models were developed using six modeling algorithms and six feature selection techniques. The optimal model was determined by the bootstrap method. An external test set was used to further assess the model’s generalizability and effectiveness. To evaluate prediction performance, the area under the receiver operating characteristic curve (AUC) was applied. Results Of the 1097 included patients, 294 (26.8%) had LNM. The ML models based on clinical features showed good predictive performance for LNM status, with a median bootstrapped AUC of 0.659 (range: 0.592, 0.715). The optimal model using the naive Bayes algorithm with feature selection by determination coefficient had the highest AUC of 0.715 (95% CI: 0.671, 0.763). In the external test set, the optimal ML model achieved an AUC of 0.752 (95% CI: 0.674, 0.829), which was superior to that of T stage (0.624, 95% CI: 0.547, 0.701). Conclusions ML models provide good LNM prediction value for stage T1-T2 ESCC patients, and the naive Bayes algorithm with feature selection by determination coefficient performed best.
               
Click one of the above tabs to view related content.