Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC medical informatics and decision making 2022-02, Vol.22 (1), p.33-33, Article 33
Hauptverfasser:	Slater, Luke T, Russell, Sophie, Makepeace, Silver, Carberry, Alexander, Karwath, Andreas, Williams, John A, Fanning, Hilary, Ball, Simon, Hoehndorf, Robert, Gkoutos, Georgios V
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Annotations Benchmarks Collaboration Computational linguistics Configuration management Diagnosis Differential diagnosis Disease Embedding Genotype & phenotype Health informatics Identification and classification Language processing Medical diagnosis Methods MIMIC-III Natural language Natural language interfaces Ontology Patients Phenotype Phenotypes Proteins Rare diseases Semantic networks Semantic similarity Semantic web Semantics Similarity Similarity measures
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-022-01770-4