K-Local hyperplane distance nearest neighbor algorithm and protein fold recognition

This paper deals with protein structure analysis, which is useful for understanding the function of proteins and therefore evolutionary relationships, since for proteins, function follows from form (shape). One of the basic approaches to structure analysis is protein fold recognition (protein fold i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition and image analysis 2007-12, Vol.17 (4), p.621-630
1. Verfasser: Okun, O G
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper deals with protein structure analysis, which is useful for understanding the function of proteins and therefore evolutionary relationships, since for proteins, function follows from form (shape). One of the basic approaches to structure analysis is protein fold recognition (protein fold is a 3D pattern), which is applied when there is no significant sequence similarity between structurally similar proteins. It does not rely on sequence similarity and can be achieved with relevant features extracted from protein sequences. Given (numerical) features, one of the existing machine learning techniques can be then applied to learn and classify proteins represented by these features. In this paper, we experiment with the K-local hyperplane distance nearest neighbor algorithm (HKNN) [12] applied to protein fold recognition. The goal is to compare it with other methods tested on a real-world dataset [3]. Two tasks are considered: (1) classification into four structural classes of proteins and (2) classification into 27 most populated protein folds composing these structural classes. Preliminary results demonstrate that HKNN can successfully compete with other methods (in both speed and accuracy) and thus encourage its further exploration in bioinformatics.[PUBLICATION ABSTRACT]
ISSN:1054-6618
1555-6212
DOI:10.1134/S1054661807040232