A customized program for the identification of conserved protein sequence motifs

We searched for viral protein sequences that could be important for tissue tropism. To achieve this goal, human pathogenic viruses were classified according to the tissue they infect (e.g., pulmonary), irrespective of whether they were enveloped or non-enveloped RNA or DNA viruses. Next, we develope...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BioTechniques 2020-01, Vol.68 (1), p.45-47
Hauptverfasser: Mian, Mohammad, Talada, Jeffrey, Klobas, Anthony, Torres, Stephanie, Rasheed, Yusuf, Javed, Hibah, Lughmani, Zainab, Forough, Reza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We searched for viral protein sequences that could be important for tissue tropism. To achieve this goal, human pathogenic viruses were classified according to the tissue they infect (e.g., pulmonary), irrespective of whether they were enveloped or non-enveloped RNA or DNA viruses. Next, we developed an amino acid sequence alignment program and identified the conserved amino acid motif, VAIVLGG, in alphaviruses. The VAIVLGG sequence is located on the structural capsid protein of the chikungunya virus, a mosquito-borne arthrogenic member of the alphaviruses. Capsid protein translocation onto the host cell membrane is a required step for virion budding. Our identified VAIVLGG consensus sequence might potentially be used for developing a pan-vaccine effective against alphaviruses. Viral protein sequences are fed into a battery of rolling hashes of 6–14 length, and amino acid subsequences are performed with a time complexity of ( ). The hashes are the keys in a HashMap with values of the sequence ID and index; the space complexity is ( ). A normal alignment is done on 14 length matches to discover longer matches. The upper bound on the time complexity for the alignment is ( ∧2*m), where is the number of viruses containing a matching sequence and is the length of the longest matching sequence.
ISSN:0736-6205
1940-9818
DOI:10.2144/btn-2019-0039