Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by whi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2003-07, Vol.19 (10), p.1275-1283
Hauptverfasser:	LORD, P. W, STEVENS, R. D, BRASS, A, GOBLE, C. A
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Biological and medical sciences Databases, Factual Databases, Genetic Documentation Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Humans Information Storage and Retrieval - methods Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Natural Language Processing Ontology Phylogeny Proteins - chemistry Proteins - classification Proteins - genetics Reproducibility of Results Semantics Sensitivity and Specificity Sequence Alignment Sequence Analysis, Protein - methods Statistics as Topic Terminology as Topic
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Software available from http://www.russet.org.uk.
ISSN:	1367-4803 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/btg153