Filtering and ranking techniques for automated selection of high-quality 16S rRNA gene sequences
StrainInfo has augmented its type strain and species/subspecies passports with a recommendation for a high-quality 16S rRNA gene sequence available from the public sequence databases. These recommendations are generated by an automated pipeline that collects all candidate 16S rRNA gene sequences for...
Gespeichert in:
Veröffentlicht in: | Systematic and applied microbiology 2013-12, Vol.36 (8), p.549-559 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | StrainInfo has augmented its type strain and species/subspecies passports with a recommendation for a high-quality 16S rRNA gene sequence available from the public sequence databases. These recommendations are generated by an automated pipeline that collects all candidate 16S rRNA gene sequences for a prokaryotic type strain, filters out low-quality sequences and retains a high-quality sequence from the remaining pool. Due to thorough automation, recommendations can be renewed daily using the latest updates of the public sequence databases and the latest species descriptions. We discuss the quality criteria constructed to filter and rank available 16S rRNA gene sequences, and show how a partially ordered set (poset) ranking algorithm can be applied to solve the multi-criteria ranking problem of selecting the best candidate sequence. The proof of concept of the recommender system is validated by comparing the results of automated selection with an expert selection made in the All-Species Living Tree Project. Based on these validation results, the pipeline may reliably be applied for non-type strains and developed further for the automated selection of housekeeping genes. |
---|---|
ISSN: | 0723-2020 1618-0984 |
DOI: | 10.1016/j.syapm.2013.09.001 |