Rapid annotation of nifH gene sequences using classification and regression trees facilitates environmental functional gene analysis

Summary The nifH gene is a widely used molecular proxy for studying nitrogen fixation. Phylogenetic classification of nifH gene sequences is an essential step in diazotroph community analysis that requires a fast automated solution due to increasing size of environmental sequence libraries and incre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Environmental microbiology reports 2016-10, Vol.8 (5), p.905-916
Hauptverfasser: Frank, Ildiko E., Turk-Kubo, Kendra A., Zehr, Jonathan P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary The nifH gene is a widely used molecular proxy for studying nitrogen fixation. Phylogenetic classification of nifH gene sequences is an essential step in diazotroph community analysis that requires a fast automated solution due to increasing size of environmental sequence libraries and increasing yield of nifH sequences from high‐throughput technologies. A novel approach to rapidly classify nifH amino acid sequences into well‐defined phylogenetic clusters that provides a common platform for comparative analysis across studies is presented. Phylogenetic group membership can be accurately predicted with decision tree‐type statistical models that identify and utilize signature residues in the amino acid sequences. Our classification models were trained and evaluated with a publicly available and manually curated nifH gene database containing cluster annotations. Model‐independent sequence sets from diverse ecosystems were used for further assessment of the models’ prediction accuracy. The utility of this novel sequence binning approach was demonstrated in a comparative study where joint treatment of diazotroph assemblages from a wide range of habitats identified habitat‐specific and widely‐distributed diazotrophs and revealed a marine – terrestrial distinction in community composition. Our rapid and automated phylogenetic cluster assignment circumvents extensive phylogenetic analysis of nifH sequences; hence, it saves substantial time and resources in nitrogen fixation studies.
ISSN:1758-2229
1758-2229
DOI:10.1111/1758-2229.12455