A new method to forecast of Escherichia coli promoter gene sequences: Integrating feature selection and Fuzzy-AIRS classifier system
We have investigated the real-world task of recognizing biological concepts in DNA sequences in this work. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a novel approach based on feature selection (FS) and Artificial Immune Recognition Sy...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2009, Vol.36 (1), p.57-64 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We have investigated the real-world task of recognizing biological concepts in DNA sequences in this work. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a novel approach based on feature selection (FS) and Artificial Immune Recognition System (AIRS) with Fuzzy resource allocation mechanism (Fuzzy-AIRS), which is first proposed by us. The aim of this study is to improve the prediction accuracy of
Escherichia coli promoter gene sequences using a novel system based on FS and Fuzzy-AIRS. The
E. coli promoter gene sequences dataset has 57 attributes and 106 samples including 53 promoters and 53 non-promoters. The proposed system consists of two parts. Firstly, we have reduced the dimension of
E. coli promoter gene sequences dataset from 57 attributes to 4 attributes by means of FS process. Second, Fuzzy-AIRS classifier algorithm has been run to predict the
E. coli promoter gene sequences. The robustness of the proposed method is examined using prediction accuracy, sensitivity and specificity analysis,
k-fold cross-validation method and confusion matrix. Whilst only Fuzzy-AIRS classifier has obtained 50% prediction accuracy using 10-fold cross-validation, the proposed system has obtained 90% prediction accuracy in the same conditions. These obtained results have indicated that the proposed system obtain the success rate in recognizing promoters in strings that represent nucleotides. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2007.09.010 |