FBB: a fast Bayesian-bound tool to calibrate RNA-seq aligners

Despite RNA-seq reads provide quality scores that represent the probability of calling a correct base, these values are not probabilistically integrated in most alignment algorithms. Based on the quality scores of the reads, we propose to calculate a lower bound of the probability of alignment of an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2017-01, Vol.33 (2), p.210-218
Hauptverfasser: Rodriguez-Lujan, Irene, Hasty, Jeff, Huerta, Ramón
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Despite RNA-seq reads provide quality scores that represent the probability of calling a correct base, these values are not probabilistically integrated in most alignment algorithms. Based on the quality scores of the reads, we propose to calculate a lower bound of the probability of alignment of any fast alignment algorithm that generates SAM files. This bound is called Fast Bayesian Bound (FBB) and serves as a canonical reference to compare alignment results across different algorithms. This Bayesian Bound intends to provide additional support to the current state-of-the-art aligners, not to replace them. We propose a feasible Bayesian bound that uses quality scores of the reads to align them to a genome of reference. Two theorems are provided to efficiently calculate the Bayesian bound that under some conditions becomes the equality. The algorithm reads the SAM files generated by the alignment algorithms using multiple command option values. The program options are mapped into the FBB reference values, and all the aligners can be compared respect to the same accuracy values provided by the FBB. Stranded paired read RNA-seq data was used for evaluation purposes. The errors of the alignments can be calculated based on the information contained in the distance between the pairs given by Theorem 2, and the alignments to the incorrect strand. Most of the algorithms (Bowtie, Bowtie 2, SHRiMP2, Soap 2, Novoalign) provide similar results with subtle variations. Current version of the FBB software is provided at https://bitbucket.org/irenerodriguez/fbb CONTACT: rhuerta@ucsd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btw608