Creating Reference Datasets for Systems Biology Applications Using Text Mining
High‐throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting da...
Gespeichert in:
Veröffentlicht in: | Annals of the New York Academy of Sciences 2009-03, Vol.1158 (1), p.14-28 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | High‐throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting data and the selection of a subset of targets for more careful characterizations often requires relying on information provided by manually annotated data repositories. These repositories are incomplete and cover only a small fraction of the knowledge contained in the literature. We propose in this paper the use of text‐mining technologies to extract, organize, and present information relevant for a particular biological topic. The aims of the resulting approach are (1) to enable topic‐centric biological literature navigation, (2) to assist in the construction of manually revised data repositories, (3) to provide prioritization of biological entities for experimental studies, and (4) to enable human interpretation of large‐scale experiments by providing direct links of bio‐entities to relevant descriptions in the literature. |
---|---|
ISSN: | 0077-8923 1749-6632 |
DOI: | 10.1111/j.1749-6632.2008.03750.x |