Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

Currently, there is a need of non-computationally-intensive bioinformatics tools to cope with the increase of large datasets produced by Next Generation Sequencing technologies. We present a simple and robust bioinformatics pipeline to search for novel enzymes in metagenomic sequences. The strategy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	3 Biotech 2020-02, Vol.10 (2), p.78-78, Article 78
Hauptverfasser:	Góngora-Castillo, Elsa, López-Ochoa, Luisa A., Apolinar-Hernández, Max M., Caamal-Pech, Aldo M., Contreras-de la Rosa, Perla A., Quiroz-Moreno, Adriana, Ramírez-Prado, Jorge H., O’Connor-Sánchez, Aileen
Format:	Artikel
Sprache:	eng
Schlagworte:	Agriculture Amino acids Bioinformatics Biomaterials Biotechnology Biotechnology & Applied Microbiology Cancer Research Chemistry Chemistry and Materials Science Cyanobacteria Data mining Enzymes Life Sciences & Biomedicine Metagenomics Next-generation sequencing Pattern search Protocols and Methods Robustness Science & Technology Stem Cells
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Currently, there is a need of non-computationally-intensive bioinformatics tools to cope with the increase of large datasets produced by Next Generation Sequencing technologies. We present a simple and robust bioinformatics pipeline to search for novel enzymes in metagenomic sequences. The strategy is based on pattern searching using as reference conserved motifs coded as regular expressions. As a case study, we applied this scheme to search for novel proteases S8A in a publicly available metagenome. Briefly, (1) the metagenome was assembled and translated into amino acids; (2) patterns were matched using regular expressions; (3) retrieved sequences were annotated; and (4) diversity analyses were conducted. Following this pipeline, we were able to identify nine sequences containing an S8 catalytic triad, starting from a metagenome containing 9,921,136 Illumina reads. Identity of these nine sequences was confirmed by BLASTp against databases at NCBI and MEROPS. Identities ranged from 62 to 89% to their respective nearest ortholog, which belonged to phyla Proteobacteria, Actinobacteria, Planctomycetes, Bacterioidetes, and Cyanobacteria, consistent with the most abundant phyla reported for this metagenome. All these results support the idea that they all are novel S8 sequences and strongly suggest that our methodology is robust and suitable to detect novel enzymes.
ISSN:	2190-572X 2190-5738
DOI:	10.1007/s13205-019-2044-6