Chemometrics for QSAR with low sequence homology: Mycobacterial promoter sequences recognition with 2D-RNA entropies
Predicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to u...
Gespeichert in:
Veröffentlicht in: | Chemometrics and intelligent laboratory systems 2007-01, Vol.85 (1), p.20-26 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to use a large input parameter set and multilayered feed-forward ANN architecture trained using the error-back-propagation algorithm to raise an overall accuracy up to 97% [Kalate, et al. 2003. Comput. Biol. Chem. 27, 555–564]. Subsequently, one could expect that a notably simpler model may be derived using parameters based on non-linear structural information. In the present work, a method based on molecular folding negentropies (
Θ
k
) is introduced to predict by the first time mycobacterial promoter sequences (mps) from the corresponding RNA secondary structure. The best QSAR equation found was the classification function mps
=
4.921
×
0
Θ
M
−
1.205, which recognised 126/135 mps (93.3%) and 100% of 245 control sequences (cs). The model have shown a very high Mathew regression coefficient
C
=
0.949. Both average overall accuracy and predictability were 97.6%. Additionally, several machine learning algorithms were applied in order to reaffirm the validity of the LDA model from the chemometrics point of view. This linear model with only one parameter (
0
Θ
M) may be considered the simpler reported up-to-date by large, without lack of accuracy (97%) with respect to Kalate et al.'s model. |
---|---|
ISSN: | 0169-7439 1873-3239 |
DOI: | 10.1016/j.chemolab.2006.03.005 |