72 Background: Identifying cancer cases within the electronic health record (EHR) or claims data can be challenging because diagnosis codes are often entered into patient records during routine screenings or… Click to show full abstract
72 Background: Identifying cancer cases within the electronic health record (EHR) or claims data can be challenging because diagnosis codes are often entered into patient records during routine screenings or as “rule out” diagnosis codes when the patient is referred to a procedure. To improve accuracy of prostate cancer (PCa) case ascertainment, we compared algorithms that used diagnoses codes to natural language processing (NLP) tools applied to clinical notes and pathology reports to identify Veterans with prostate cancer (PCa). Methods: This is a retrospective observational cohort study using VA EHR data to identify veterans diagnosed with PCa between 2000 and 2020. Using International Classification of Diseases (ICD-10 CM or ICD-9 CM) diagnosis and procedure codes, we identified veterans who may have PCa. We deployed validated NLP tools to identify the presence of Gleason score, metastatic PCa, and castration sensitivity to identify evidence of PCa within the notes. We conducted a descriptive analysis to compare the results of algorithms that relied exclusively on diagnosis codes compared to use of NLP tools. Results: From 2000 through 2020,1,031,296 veterans had one or more PCa diagnosis code. This number decreased by 11% for each additional PCa diagnosis code required. When we required 4 or more PCa diagnosis codes to be present, only 746,350 veterans had PCa. When we deployed NLP tools to identify mention of a Gleason score or an indicator of mPCa, only 685,847 Veterans had these indicators of PCa, a 35% decrease in the number of PCa cases with a single diagnosis code. Chart review of patients with their first PCa diagnosis codes in 2019 and 4 or more codes in their records illustrated no evidence of Gleason score or mPCa disease in their EHR. Analysis of their pathology reports revealed that these patients had prostatic intraepithelial neoplasia or atypical small acinar proliferation and had not yet developed prostate cancer. Conclusions: Accurate ascertainment of PCa using EHR and claims data requires using NLP tools and clinical notes combined with structured data sources such as diagnosis codes. Relying on ICD diagnosis codes alone will overestimate the burden of PCa up to 30%.
               
Click one of the above tabs to view related content.