Noise-tolerant similarity search in temporal medical data
Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2021-01, Vol.113, p.103667-103667, Article 103667 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient’s temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.
[Display omitted]
•We discuss that the use of pattern search methods could help clinical researchers in the detection of under-diagnosed conditions (e.g., low-prevalence diseases).•We propose an efficient and robust approach for pattern similarity search.•Our solution uses probabilistic methods to transform high dimensional medical data into compact signatures that retain the original similarity.•We develop a filtering step to efficiently reduce the complexity of the similarity search.•Our evaluations demonstrate that our method achieves significant speedup compared to traditional similarity search methods. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2020.103667 |