Baiting out a full length sequence from unmapped RNA-seq data

As a powerful tool, RNA-Seq has been widely used in various studies. Usually, unmapped RNA-seq reads have been considered as useless and been trashed or ignored. We develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC genomics 2021-11, Vol.22 (1), p.857-857, Article 857
Hauptverfasser:	Li, Dongwei, Huang, Qitong, Huang, Lei, Wen, Jikai, Luo, Jing, Li, Qing, Peng, Yanling, Zhang, Yubo
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis DNA, Complementary Exome Sequencing Full length sequence High-Throughput Nucleotide Sequencing High-throughput screening (Biochemical assaying) Methodology Methods RNA sequencing RNA-Seq Sequence Analysis, RNA Statistical model Statistical models Unmapped reads
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	As a powerful tool, RNA-Seq has been widely used in various studies. Usually, unmapped RNA-seq reads have been considered as useless and been trashed or ignored. We develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data and randomly select one 149 bp read as a model. Specific reverse transcription primers are designed to amplify its both ends, followed by next generation sequencing. Then we design a statistical model based on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1556 bp, with insertion mutations in microsatellite structure. We believe this method would be a useful strategy to extract the sequences information from the unmapped RNA-seq data. Further, it is an alternative way to get the full length sequence of unknown cDNA.
ISSN:	1471-2164 1471-2164
DOI:	10.1186/s12864-021-08146-4