iPro2L-PSTKNC: A Two-Layer Predictor for Discovering Various Types of Promoters by Position Specific of Nucleotide Composition

Promoters are DNA regulatory elements located proximal to the transcription start site, which are in charge of the initiation of specific gene transcription. In Escherichia coli , promoters can be recognized by \sigma factors that have multiple families based on distinct function and structure, such...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics 2021-06, Vol.25 (6), p.2329-2337
Hauptverfasser: Lyu, Yinuo, He, Wenying, Li, Shuhao, Zou, Quan, Guo, Fei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Promoters are DNA regulatory elements located proximal to the transcription start site, which are in charge of the initiation of specific gene transcription. In Escherichia coli , promoters can be recognized by \sigma factors that have multiple families based on distinct function and structure, such as \sigma ^{24}, \sigma ^{28}, \sigma ^{32}, \sigma ^{38}, \sigma ^{54} and \sigma ^{70}. At present, biological methods are mainly used to identify these promoters. However, because it is time-consuming and material-consuming to do biological experiments, computational biology algorithm has emerged as a more effective way to predict the classification. In this study, we develop a novel two-layer seamless predictor called iPro2L-PSTKNC to identify the promoters of the {E. coli} genome, which based on the feature extraction model we newly proposed that is named as the position specific tendencies of k-mer nucleotide composition (PSTKNC). On the first layer, it is a binary classification predicting whether a sequence is promoter or not. And the second layer is a multiple classification identifying which type the identified promoter belongs to. The ensemble classification SVM performsbest comparing with other algorithms, which gets a promising accuracy and the Matthews correlation coefficient (MCC) at \text{90.05}\% and \text{80.13}\%. Our data and code are available at https://github.com/lyuyinuo/iPro2L-PSTKNC
ISSN:2168-2194
2168-2208
DOI:10.1109/JBHI.2020.3026735