Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data

Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature communications 2022-09, Vol.13 (1), p.5357-13, Article 5357
Hauptverfasser:	Shiraishi, Yuichi, Okada, Ai, Chiba, Kenichi, Kawachi, Asuka, Omori, Ikuko, Mateos, Raúl Nicolás, Iida, Naoko, Yamauchi, Hirofumi, Kosaki, Kenjiro, Yoshimi, Akihide
Format:	Artikel
Sprache:	eng
Schlagworte:	49/91 631/114 631/208/1792 631/208/737 Archives & records Bioinformatics Cancer Exome Sequencing Genomes Genomics Humanities and Social Sciences Introns - genetics Knowledge acquisition Levamisole - analogs & derivatives multidisciplinary Mutation Ratios Retention RNA Splicing - genetics Science Science (multidisciplinary) Sensitivity analysis Splicing Transcriptome - genetics Transcriptomes Transcriptomics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many disease-associated genomic variants disrupt gene function through abnormal splicing. With the advancement of genomic medicine, identifying disease-associated splicing associated variants has become more important than ever. Most bioinformatics approaches to detect splicing associated variants require both genome and transcriptomic data. However, there are not many datasets where both of them are available. In this study, we develop a methodology to detect genomic variants that cause splicing changes (more specifically, intron retention), using transcriptome sequencing data alone. After evaluating its sensitivity and precision, we apply it to 230,988 transcriptome sequencing data from the publicly available repository and identified 27,049 intron retention associated variants (IRAVs). In addition, by exploring positional relationships with variants registered in existing disease databases, we extract 3,000 putative disease-associated IRAVs, which range from cancer drivers to variants linked with autosomal recessive disorders. The in-silico screening framework demonstrates the possibility of near-automatically acquiring medical knowledge, making the most of massively accumulated publicly available sequencing data. Collections of IRAVs identified in this study are available through IRAVDB ( https://iravdb.io/ ). This paper proposed a novel in-silico framework for automatically screening disease-related variants and applied it to over 200,000 transcriptomes, providing an example to acquire medically relevant knowledge from publicly available sequence data.
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-022-32887-9