Machine learning techniques and algorithm-based approaches are becoming more and more vital to support clinical decision-making. In the medical area, natural language processing (NLP) techniques have shown the ability to… Click to show full abstract
Machine learning techniques and algorithm-based approaches are becoming more and more vital to support clinical decision-making. In the medical area, natural language processing (NLP) techniques have shown the ability to extract useful information from electronic health records. On the one hand, statistic, semantic, and contextualized word embedding-based models and on the other hand preprocessing approaches are the keys to a better representation of a document. Using narratives from the Intensive Care Unit, we elaborated a comparison of the most used methods and preprocessing approaches to tackle an outcome prediction problem and guide researchers into NLP pipelines in the medical area. We used real data from Medical Information Mart for Intensive Care-III (MIMIC-III). We selected all notes related to patients with pneumonia. We conducted a deep analysis on text preprocessing tasks producing three datasets: raw data with minor preprocessing, meticulous preprocessing, and extreme preprocessing filtering only medical-related terminologies using Named Entity Recognition algorithms. We then used these three sets in five models, of which two are based on the traditional noncontextual word embedding techniques and three use contextualized word embedding based on a transformer. We demonstrated that transformer-based models outperform other word embedding models and a profound preprocessing yielded an accuracy of 98.2 F1-score. These results show the highly competitive ability of NLP predictive models against other models that use medical data. With an appropriate NLP pipeline, the information contained in medical narratives can be used to draw up a patient profile, and admission notes can help to ascertain a mortality risk of a patient admitted to the Intensive Care Unit.
               
Click one of the above tabs to view related content.