LinearAlifold: Linear-time consensus structure prediction for RNA alignments

•Predicting the consensus structure for aligned RNA homologs has wide applications. (new bullet point) But current tools for consensus structure prediction are rather slow.•Our LinearAlifold is fast, scaling linearly with sequence length and sequence count.•It outperforms previous tools in accuracy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular biology 2024-09, Vol.436 (17), p.168694, Article 168694
Hauptverfasser: Malik, Apoorv, Zhang, Liang, Gautam, Milan, Dai, Ning, Li, Sizhen, Zhang, He, Mathews, David H., Huang, Liang
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Predicting the consensus structure for aligned RNA homologs has wide applications. (new bullet point) But current tools for consensus structure prediction are rather slow.•Our LinearAlifold is fast, scaling linearly with sequence length and sequence count.•It outperforms previous tools in accuracy when compared to known structures.•We also built a web server with rich visualizations of the output structure. Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (∼30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or ∼36× speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold’s prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
ISSN:0022-2836
1089-8638
1089-8638
DOI:10.1016/j.jmb.2024.168694