Pitfalls of genotyping microbial communities with rapidly growing genome collections

Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before cal...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cell systems 2023-02, Vol.14 (2), p.160-176.e3
Hauptverfasser: Zhao, Chunyu, Shi, Zhou Jason, Pollard, Katherine S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping. •Genetic variants can be detected in metagenomics data by aligning reads to genomes•Closely related species are now prevalent in microbial genome databases•Closely related species reduce alignment uniqueness and increase alignment errors•Post-alignment filters using read pairs and database customization mitigate errors Closely related species are now common in rapidly growing microbial genome databases, making it difficult to correctly align metagenomic sequencing reads. Zhao et al. quantitatively investigate these alignment errors and their effects on tools for genotyping microbial communities. They identify actionable mitigation strategies and areas where new methodology is needed.
ISSN:2405-4712
2405-4720
DOI:10.1016/j.cels.2022.12.007