A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments

Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2022-04, Vol.13 (1), p.2326-2326, Article 2326
Hauptverfasser: Pan, Shaojun, Zhu, Chengkai, Zhao, Xing-Ming, Coelho, Luis Pedro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species. Here, the authors present SemiBin, a siamese deep neural network framework that incorporates information from reference genomes, able to extract better metagenome-assembled genomes (MAGs) in several host-associated and environmental habitats.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-022-29843-y