CSMD: a computational subtraction-based microbiome discovery pipeline for species-level characterization of clinical metagenomic samples

Abstract Motivation Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2020-03, Vol.36 (5), p.1577-1583
Hauptverfasser: Liu, Yu, Bible, Paul W, Zou, Bin, Liang, Qiaoxing, Dong, Cong, Wen, Xiaofeng, Li, Yan, Ge, Xiaofei, Li, Xifang, Deng, Xiuli, Ma, Rong, Guo, Shixin, Liang, Juanran, Chen, Tingting, Pan, Wenliang, Liu, Lixin, Chen, Wei, Wang, Xueqin, Wei, Lai
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databases. Results We present computational subtraction-based microbiome discovery (CSMD), a bioinformatics pipeline specifically developed to generate accurate species-level microbiome profiles for clinical samples with low microbial loads. CSMD applies strategies for the maximal elimination of host sequences with minimal loss of microbial signal and effectively detects microorganisms present in the sample with minimal false positives using a stepwise convergent solution. CSMD was benchmarked in a comparative evaluation with other classic tools on previously published well-characterized datasets. It showed higher sensitivity and specificity in host sequence removal and higher specificity in microbial identification, which led to more accurate abundance estimation. All these features are integrated into a free and easy-to-use tool. Additionally, CSMD applied to cell-free plasma DNA showed that microbial diversity within these samples is substantially broader than previously believed. Availability and implementation CSMD is freely available at https://github.com/liuyu8721/csmd. Supplementary information Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btz790