Movi: A fast and cache-efficient full-text pangenome index
Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT...
Gespeichert in:
Veröffentlicht in: | iScience 2024-12, Vol.27 (12), p.111464, Article 111464 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Pangenome indexes are promising tools for many applications, including classification of nanopore sequencing reads. Move structure is a compressed-index data structure based on the Burrows-Wheeler Transform (BWT). It offers simultaneous O(1)-time queries and O(r) space, where r is the number of BWT runs (consecutive sequence of identical characters). We developed Movi based on the move structure for indexing and querying pangenomes. Movi scales very well for repetitive text as its size grows strictly by r. Movi computes sophisticated matching queries for classification such as pseudo-matching lengths and backward search up to 30 times faster than existing methods by minimizing the number of cache misses and using memory prefetching to attain a degree of latency hiding. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
[Display omitted]
•Movi is a very fast and cache-efficient index for pangenomes•The size of Movi’s index scales with the non-redundant content in the pangenome•A single Movi thread can handle output from 26,890 nanopores•Movi builds on the move structure, a full-text compressed index that uses the BWT
Biocomputational method; Classification of bioinformatical subject; Genomic analysis |
---|---|
ISSN: | 2589-0042 2589-0042 |
DOI: | 10.1016/j.isci.2024.111464 |