Objectives The diagnosis of leukemia relies very much on the results of bone marrow examinations, which is never generally performed in routine physical examination. In many rural areas even community… Click to show full abstract
Objectives The diagnosis of leukemia relies very much on the results of bone marrow examinations, which is never generally performed in routine physical examination. In many rural areas even community hospitals and primary care clinics, the lack of hematological specialist and facility does not allow a definite diagnosis of leukemia. Thus, there will be a significant benefit if machine learning (ML) models could help early predict leukemia using preliminary blood test data in a routine physical examination in community hospitals to save time before a definite diagnosis. Methods We collected the routine physical examination data of 1230 newly diagnosed leukemia patients and 1300 healthy people. We trained and tested 3 machine learning (ML) models including linear support vector machine (LSVM), random forest (RF), and XGboost models. We not only examined the accordance between model results and statistical analysis of the input data but also examined the consistency of model accuracy scores and relative importance order of model factors with regard to different input data sets and different model arguments to check the applicability of both the models and the input data. Results Generally, the RF and XGboost models give more identical, consistent, and robust relative importance order of factors that is also accordant with the statistical analysis, while the LSVM gives much different and nonsense orders for different inputs. Results of the RF and XGboost models show that (1) generally, the models achieve accuracy scores above 0.9, indicating effective identification of leukemia, and (2) the top three factors that contribute most to the identification of leukemia include red blood cell (RBC), hematocrit (HCT), and white blood cell (WBC), while the other factors contribute relatively less. Conclusions This study shows a feasible case example for early identification of leukemia using routine physical examination data with the assistance of ML models, which can be conveniently, cheaply, and widely applied in community hospitals or primary care clinics to save time before definite diagnosis; however, more studies are still needed to validate the applicability of more ML models to a larger variety of input data sets.
               
Click one of the above tabs to view related content.