PATSIM: Prediction and analysis of protein sequences using hybrid Knuth-Morris Pratt (KMP) and Boyer-Moore (BM) algorithm

In phylogenomic profiling, the genomic context based methods are based on the observation that two or more proteins having the same pattern of presence or absence in many diverse genomes most likely have a functional link. In this research work, a tool (PATSIM) has been developed to predict the prot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Gene 2018-05, Vol.657, p.50-59
Hauptverfasser: Manikandan, P., Ramyachitra, D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In phylogenomic profiling, the genomic context based methods are based on the observation that two or more proteins having the same pattern of presence or absence in many diverse genomes most likely have a functional link. In this research work, a tool (PATSIM) has been developed to predict the protein patterns based on the SOPM tool. In this tool, the secondary structure for CATH database protein sequences, predicted by the SOPM (Self Optimized Prediction Method) server is passed as input to fulfill objectives such as, (i) Predict the Amino Acid Pattern using the proposed Hybrid KMP and BM algorithm, (ii) Predict the physiochemical properties such as Hydrophobic Non-Polar ALKYL Amino Acid groups, Hydrophobic Non-Polar AROMATIC Amino Acid groups, Hydrophilic Polar Neutral Amino Acid groups, Hydrophilic Polar Acidic Amino Acid groups and Hydrophilic Polar Basic Amino Acid groups of protein sequence, (iii) Predict the secondary structure of protein where the structure of protein sequence is unknown, and (iv) Similarity analysis of protein sequence (structure unknown) with the CATH database. From the results, it is inferred that this tool effectively predicts the similarity between the sequences and also identifies the protein patterns for four secondary structural classes, namely Alpha Helix (h), Beta Sheet (e), Turn (t) and Coil (c). Based on the experimental results, it is inferred that this tool identifies the physiochemical properties of the protein sequence in an effective manner. The source code and its documentation for the PATSIM tool is freely available in the GitHub public repository (https://github.com/manimkn89/Protein-Sequence-Analysis). •To propose a hybrid algorithm to predict the amino acid patterns from the protein sequences.•To predict the physiochemical properties of protein sequences.•To predict the secondary structure of protein where the structure of protein sequence is unknown.•To perform the similarity analysis of protein sequence (structure unknown) with the CATH database.
ISSN:0378-1119
1879-0038
DOI:10.1016/j.gene.2018.02.069