Literature mining for the biologist: from information retrieval to biological discovery
Recent advances in tools for extracting facts from the scientific literature will soon enable the automatic annotation and analysis of the growing number of system-wide experimental data sets. Mining the literature is also rapidly becoming useful for both hypothesis generation and biological discove...
Gespeichert in:
Veröffentlicht in: | Nature reviews. Genetics 2006-02, Vol.7 (2), p.119-129 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in tools for extracting facts from the scientific literature will soon enable the automatic annotation and analysis of the growing number of system-wide experimental data sets. Mining the literature is also rapidly becoming useful for both hypothesis generation and biological discovery.
Key Points
Literature-mining tools are becoming essential to researchers because of the growth of the scientific literature and the shift from studying individual genes and proteins to entire systems.
Currently, information-retrieval tools such as PubMed are by far the most commonly used literature-mining methods among biologists.
Methods for identifying the genes, proteins and other entities that are mentioned in the literature — known as entity recognition — are key components of most complex literature-mining systems.
Recently, methods for extracting biomedical facts from text have improved considerably. Such methods will probably soon become mainstream tools for the annotation and analysis of large-scale experimental data sets.
By combining facts that have been extracted from several papers, text-mining methods can discover both global trends and generate new hypotheses that are based on the existing literature.
To realize the full discovery potential of literature mining, it should be integrated with other data types. Protein networks are well suited for unifying large-scale experimental data with knowledge that has been extracted from the biomedical literature.
Data-integration methods have also been developed for ranking candidate genes for inherited diseases and for associating genes with phenotypic characteristics.
Bridging the gap between biologists and computational linguists will be crucial to the success of approaches that integrate literature mining with high-throughput experimental data. We hope that this review will inspire more biologists to become actively involved in the development of literature-mining tools.
For the average biologist, hands-on literature mining currently means a keyword search in PubMed. However, methods for extracting biomedical facts from the scientific literature have improved considerably, and the associated tools will probably soon be used in many laboratories to automatically annotate and analyse the growing number of system-wide experimental data sets. Owing to the increasing body of text and the open-access policies of many journals, literature mining is also becoming useful for both hypothesis generation and bio |
---|---|
ISSN: | 1471-0056 1471-0064 |
DOI: | 10.1038/nrg1768 |