Rapid multiple protein sequence search by parallel and heterogeneous computation

Abstract Motivation Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences continues to grow rapidly, there is an increasing need for efficient and scalable multiple sequence query algorithms...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2024-03, Vol.40 (4)
Hauptverfasser: Li, Jiefu, Wang, Ziyuan, Fan, Xuwei, Yao, Ruijie, Zhang, Guoqing, Fan, Rui, Wang, Zefeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences continues to grow rapidly, there is an increasing need for efficient and scalable multiple sequence query algorithms for super-large databases without expensive time and computational costs. Results We introduce Chorus, a novel protein sequence query system that leverages parallel model and heterogeneous computation architecture to enable users to query thousands of protein sequences concurrently against large protein databases on a desktop workstation. Chorus achieves over 100× speedup over BLASTP without sacrificing sensitivity. We demonstrate the utility of Chorus through a case study of analyzing a ∼1.5-TB large-scale metagenomic datasets for novel CRISPR-Cas protein discovery within 30 min. Availability and implementation Chorus is open-source and its code repository is available at https://github.com/Bio-Acc/Chorus.
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae151