Creating Reference Datasets for Systems Biology Applications Using Text Mining

High‐throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Annals of the New York Academy of Sciences 2009-03, Vol.1158 (1), p.14-28
Hauptverfasser: Krallinger, Martin, Rojas, Ana María, Valencia, Alfonso
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High‐throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting data and the selection of a subset of targets for more careful characterizations often requires relying on information provided by manually annotated data repositories. These repositories are incomplete and cover only a small fraction of the knowledge contained in the literature. We propose in this paper the use of text‐mining technologies to extract, organize, and present information relevant for a particular biological topic. The aims of the resulting approach are (1) to enable topic‐centric biological literature navigation, (2) to assist in the construction of manually revised data repositories, (3) to provide prioritization of biological entities for experimental studies, and (4) to enable human interpretation of large‐scale experiments by providing direct links of bio‐entities to relevant descriptions in the literature.
ISSN:0077-8923
1749-6632
DOI:10.1111/j.1749-6632.2008.03750.x