Using programmatic motifs and genetic programming to classify protein sequences as to cellular location

As newly sequenced proteins are deposited into the world's ever-growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location — that is, where the protein res...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Koza, John R., Bennett, Forrest H., Andre, David
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As newly sequenced proteins are deposited into the world's ever-growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location — that is, where the protein resides in a living organism (e.g., extracellular, membrane, nuclear). A human-created five-way algorithm for cellular location using statistical techniques with 76% accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is an extracellular protein, 84% for nuclear proteins, 89% for membrane proteins, and 83% for anchored membrane proteins. Unlike the statistical calculation, the genetically evolved programs employ a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, named memory, indexed memory, setcreating operations, and look-ahead. The genetically evolved classification program can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.
ISSN:0302-9743
1611-3349
DOI:10.1007/BFb0040796