Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate… Click to show full abstract
Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.
               
Click one of the above tabs to view related content.