Automated extraction of potential migraine biomarkers using a semantic graph

[Display omitted] •Potential biomarkers are found using knowledge mined from literature and databases.•Results are compared with a manually performed literature review.•Ranking on number of connecting concepts works best, with ROC-AUC’s of 95–97%•It is questionable whether concepts’ current granular...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2017-07, Vol.71, p.178-189
Hauptverfasser: Vlietstra, Wytze J., Zielman, Ronald, van Dongen, Robin M., Schultes, Erik A., Wiesman, Floris, Vos, Rein, van Mulligen, Erik M., Kors, Jan A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Potential biomarkers are found using knowledge mined from literature and databases.•Results are compared with a manually performed literature review.•Ranking on number of connecting concepts works best, with ROC-AUC’s of 95–97%•It is questionable whether concepts’ current granularity is always necessary. Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2017.05.018