Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling

Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with te...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2011-07, Vol.27 (13), p.i383-i391
Hauptverfasser: abaj, Pawe P., Leparc, Germán G., Linggi, Bryan E., Markillie, Lye Meng, Wiley, H. Steven, Kreil, David P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page i391
container_issue 13
container_start_page i383
container_title Bioinformatics
container_volume 27
creator abaj, Pawe P.
Leparc, Germán G.
Linggi, Bryan E.
Markillie, Lye Meng
Wiley, H. Steven
Kreil, David P.
description Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently,
doi_str_mv 10.1093/bioinformatics/btr247
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3117338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btr247</oup_id><sourcerecordid>873120325</sourcerecordid><originalsourceid>FETCH-LOGICAL-c544t-c8783a192db4a0dd8d413bae5aeac638e8f208d043af987f44e2a77e49965eed3</originalsourceid><addsrcrecordid>eNqNkc1u1DAURq2qiJbCIxRF3bAK9V8SZ1OpGpWCVIFEy9q6cW46RomdsZ1R4elxmaGiu65syecef1cfIaeMfmS0Feed9dYNPkyQrInnXQpcNgfkmMmalpxW7WG-i7oppaLiiLyJ8SelFZNSviZHnNWqom19THC1hgAmYbC_s8m7Alxf2GkOfosTulT4ofj-9bK8xU0xBzQ2PkLWFZsFXLIpD22xSAFcNMHOqcCHjMW_VJYMdrTu_i15NcAY8d3-PCE_Pl3drT6XN9-uv6wub0pTSZlKoxolgLW87yTQvle9ZKIDrADB1EKhGjhVPZUChlY1g5TIoWlQtm1dIfbihFzsvPPSTdibnD_AqOdgJwi_tAern784u9b3fqsFY40QKgvOdgIfk9XR2IRmbbxzaJJmlAumWIY-7H8JfrNgTHqy0eA4gkO_RK0awTgVvMpktSNN8DEGHJ6iMKofa9TPa9S7GvPc-__3eJr611sG6D7nMr_Q-Qfe-7QG</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>873120325</pqid></control><display><type>article</type><title>Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>abaj, Pawe P. ; Leparc, Germán G. ; Linggi, Bryan E. ; Markillie, Lye Meng ; Wiley, H. Steven ; Kreil, David P.</creator><creatorcontrib>abaj, Pawe P. ; Leparc, Germán G. ; Linggi, Bryan E. ; Markillie, Lye Meng ; Wiley, H. Steven ; Kreil, David P. ; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><description>Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, &lt;30% of all transcripts could be quantified reliably with a relative error &lt;20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. Contact: rnaseq10@boku.ac.at Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btr247</identifier><identifier>PMID: 21685096</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>ACCURACY ; BASIC BIOLOGICAL SCIENCES ; Bioinformatics ; Cell Line ; DESIGN ; DNA SEQUENCERS ; Gene Expression Profiling - methods ; GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE ; GENES ; High-Throughput Nucleotide Sequencing - methods ; Humans ; Microarray Analysis ; MICROARRAY TECHNOLOGY ; microarrays ; Original Papers ; PARALLEL PROCESSING ; RNA ; RNA - analysis ; RNA-Seq ; SCREENS ; Sequence Analysis, RNA - methods ; Software ; statistics ; TARGETS</subject><ispartof>Bioinformatics, 2011-07, Vol.27 (13), p.i383-i391</ispartof><rights>The Author(s) 2011. Published by Oxford University Press. 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c544t-c8783a192db4a0dd8d413bae5aeac638e8f208d043af987f44e2a77e49965eed3</citedby><cites>FETCH-LOGICAL-c544t-c8783a192db4a0dd8d413bae5aeac638e8f208d043af987f44e2a77e49965eed3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117338/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117338/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,882,1599,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21685096$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1023181$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>abaj, Pawe P.</creatorcontrib><creatorcontrib>Leparc, Germán G.</creatorcontrib><creatorcontrib>Linggi, Bryan E.</creatorcontrib><creatorcontrib>Markillie, Lye Meng</creatorcontrib><creatorcontrib>Wiley, H. Steven</creatorcontrib><creatorcontrib>Kreil, David P.</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><title>Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, &lt;30% of all transcripts could be quantified reliably with a relative error &lt;20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. Contact: rnaseq10@boku.ac.at Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>ACCURACY</subject><subject>BASIC BIOLOGICAL SCIENCES</subject><subject>Bioinformatics</subject><subject>Cell Line</subject><subject>DESIGN</subject><subject>DNA SEQUENCERS</subject><subject>Gene Expression Profiling - methods</subject><subject>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</subject><subject>GENES</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Humans</subject><subject>Microarray Analysis</subject><subject>MICROARRAY TECHNOLOGY</subject><subject>microarrays</subject><subject>Original Papers</subject><subject>PARALLEL PROCESSING</subject><subject>RNA</subject><subject>RNA - analysis</subject><subject>RNA-Seq</subject><subject>SCREENS</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Software</subject><subject>statistics</subject><subject>TARGETS</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkc1u1DAURq2qiJbCIxRF3bAK9V8SZ1OpGpWCVIFEy9q6cW46RomdsZ1R4elxmaGiu65syecef1cfIaeMfmS0Feed9dYNPkyQrInnXQpcNgfkmMmalpxW7WG-i7oppaLiiLyJ8SelFZNSviZHnNWqom19THC1hgAmYbC_s8m7Alxf2GkOfosTulT4ofj-9bK8xU0xBzQ2PkLWFZsFXLIpD22xSAFcNMHOqcCHjMW_VJYMdrTu_i15NcAY8d3-PCE_Pl3drT6XN9-uv6wub0pTSZlKoxolgLW87yTQvle9ZKIDrADB1EKhGjhVPZUChlY1g5TIoWlQtm1dIfbihFzsvPPSTdibnD_AqOdgJwi_tAern784u9b3fqsFY40QKgvOdgIfk9XR2IRmbbxzaJJmlAumWIY-7H8JfrNgTHqy0eA4gkO_RK0awTgVvMpktSNN8DEGHJ6iMKofa9TPa9S7GvPc-__3eJr611sG6D7nMr_Q-Qfe-7QG</recordid><startdate>20110701</startdate><enddate>20110701</enddate><creator>abaj, Pawe P.</creator><creator>Leparc, Germán G.</creator><creator>Linggi, Bryan E.</creator><creator>Markillie, Lye Meng</creator><creator>Wiley, H. Steven</creator><creator>Kreil, David P.</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope></search><sort><creationdate>20110701</creationdate><title>Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling</title><author>abaj, Pawe P. ; Leparc, Germán G. ; Linggi, Bryan E. ; Markillie, Lye Meng ; Wiley, H. Steven ; Kreil, David P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c544t-c8783a192db4a0dd8d413bae5aeac638e8f208d043af987f44e2a77e49965eed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>ACCURACY</topic><topic>BASIC BIOLOGICAL SCIENCES</topic><topic>Bioinformatics</topic><topic>Cell Line</topic><topic>DESIGN</topic><topic>DNA SEQUENCERS</topic><topic>Gene Expression Profiling - methods</topic><topic>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</topic><topic>GENES</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Humans</topic><topic>Microarray Analysis</topic><topic>MICROARRAY TECHNOLOGY</topic><topic>microarrays</topic><topic>Original Papers</topic><topic>PARALLEL PROCESSING</topic><topic>RNA</topic><topic>RNA - analysis</topic><topic>RNA-Seq</topic><topic>SCREENS</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Software</topic><topic>statistics</topic><topic>TARGETS</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>abaj, Pawe P.</creatorcontrib><creatorcontrib>Leparc, Germán G.</creatorcontrib><creatorcontrib>Linggi, Bryan E.</creatorcontrib><creatorcontrib>Markillie, Lye Meng</creatorcontrib><creatorcontrib>Wiley, H. Steven</creatorcontrib><creatorcontrib>Kreil, David P.</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>abaj, Pawe P.</au><au>Leparc, Germán G.</au><au>Linggi, Bryan E.</au><au>Markillie, Lye Meng</au><au>Wiley, H. Steven</au><au>Kreil, David P.</au><aucorp>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2011-07-01</date><risdate>2011</risdate><volume>27</volume><issue>13</issue><spage>i383</spage><epage>i391</epage><pages>i383-i391</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, &lt;30% of all transcripts could be quantified reliably with a relative error &lt;20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. Contact: rnaseq10@boku.ac.at Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>21685096</pmid><doi>10.1093/bioinformatics/btr247</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2011-07, Vol.27 (13), p.i383-i391
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3117338
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; PubMed Central; Alma/SFX Local Collection
subjects ACCURACY
BASIC BIOLOGICAL SCIENCES
Bioinformatics
Cell Line
DESIGN
DNA SEQUENCERS
Gene Expression Profiling - methods
GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
GENES
High-Throughput Nucleotide Sequencing - methods
Humans
Microarray Analysis
MICROARRAY TECHNOLOGY
microarrays
Original Papers
PARALLEL PROCESSING
RNA
RNA - analysis
RNA-Seq
SCREENS
Sequence Analysis, RNA - methods
Software
statistics
TARGETS
title Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T19%3A34%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Characterization%20and%20improvement%20of%20RNA-Seq%20precision%20in%20quantitative%20transcript%20expression%20profiling&rft.jtitle=Bioinformatics&rft.au=abaj,%20Pawe%20P.&rft.aucorp=Pacific%20Northwest%20National%20Lab.%20(PNNL),%20Richland,%20WA%20(United%20States)&rft.date=2011-07-01&rft.volume=27&rft.issue=13&rft.spage=i383&rft.epage=i391&rft.pages=i383-i391&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btr247&rft_dat=%3Cproquest_pubme%3E873120325%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=873120325&rft_id=info:pmid/21685096&rft_oup_id=10.1093/bioinformatics/btr247&rfr_iscdi=true