Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates

Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have bee...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BioMedInformatics 2021-09, Vol.1 (2), p.47-63
Hauptverfasser: Li, Xiaohong, Rai, Shesh N., Rouchka, Eric C., O’Toole, Timothy E., Cooper, Nigel G. F.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 63
container_issue 2
container_start_page 47
container_title BioMedInformatics
container_volume 1
creator Li, Xiaohong
Rai, Shesh N.
Rouchka, Eric C.
O’Toole, Timothy E.
Cooper, Nigel G. F.
description Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.
doi_str_mv 10.3390/biomedinformatics1020004
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_3390_biomedinformatics1020004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_3390_biomedinformatics1020004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2244-898181e5a38650b254f873d6f3f75d1c37ed5db21c12d6e04479560fa8ab5f5d3</originalsourceid><addsrcrecordid>eNp1kMtKxDAYhYMoOIzzDnmBaq5tuiz1CoOKoysXJU3-aIa2GZOOoE9vRBduXJ2zOHx8HIQwJaec1-Ss92EE6ycX4qhnbxIljBAiDtCClRUvKsHKwz_9GK1S2uYFUxVntVqg58Zu92kGizd63A2AN_4TcKsHsx8yMUw4s_HDbVMkeMPnetbYT3h-BXwfIcFkAAeH25AV9lM2ecn9XUevZ0gn6MjpIcHqN5fo6fLisb0u1ndXN22zLgxjQhSqVlRRkJqrUpKeSeGynS0dd5W01PAKrLQ9o4YyWwIRoqplSZxWupdOWr5E6odrYkgpgut20Y86fnSUdN8_df_9xL8AJU1hKw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates</title><source>DOAJ Directory of Open Access Journals</source><source>MDPI - Multidisciplinary Digital Publishing Institute</source><creator>Li, Xiaohong ; Rai, Shesh N. ; Rouchka, Eric C. ; O’Toole, Timothy E. ; Cooper, Nigel G. F.</creator><creatorcontrib>Li, Xiaohong ; Rai, Shesh N. ; Rouchka, Eric C. ; O’Toole, Timothy E. ; Cooper, Nigel G. F.</creatorcontrib><description>Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.</description><identifier>ISSN: 2673-7426</identifier><identifier>EISSN: 2673-7426</identifier><identifier>DOI: 10.3390/biomedinformatics1020004</identifier><language>eng</language><ispartof>BioMedInformatics, 2021-09, Vol.1 (2), p.47-63</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2244-898181e5a38650b254f873d6f3f75d1c37ed5db21c12d6e04479560fa8ab5f5d3</citedby><cites>FETCH-LOGICAL-c2244-898181e5a38650b254f873d6f3f75d1c37ed5db21c12d6e04479560fa8ab5f5d3</cites><orcidid>0000-0003-3487-6572 ; 0000-0002-8377-353X ; 0000-0003-1773-1523</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27922,27923</link.rule.ids></links><search><creatorcontrib>Li, Xiaohong</creatorcontrib><creatorcontrib>Rai, Shesh N.</creatorcontrib><creatorcontrib>Rouchka, Eric C.</creatorcontrib><creatorcontrib>O’Toole, Timothy E.</creatorcontrib><creatorcontrib>Cooper, Nigel G. F.</creatorcontrib><title>Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates</title><title>BioMedInformatics</title><description>Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.</description><issn>2673-7426</issn><issn>2673-7426</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp1kMtKxDAYhYMoOIzzDnmBaq5tuiz1CoOKoysXJU3-aIa2GZOOoE9vRBduXJ2zOHx8HIQwJaec1-Ss92EE6ycX4qhnbxIljBAiDtCClRUvKsHKwz_9GK1S2uYFUxVntVqg58Zu92kGizd63A2AN_4TcKsHsx8yMUw4s_HDbVMkeMPnetbYT3h-BXwfIcFkAAeH25AV9lM2ecn9XUevZ0gn6MjpIcHqN5fo6fLisb0u1ndXN22zLgxjQhSqVlRRkJqrUpKeSeGynS0dd5W01PAKrLQ9o4YyWwIRoqplSZxWupdOWr5E6odrYkgpgut20Y86fnSUdN8_df_9xL8AJU1hKw</recordid><startdate>20210901</startdate><enddate>20210901</enddate><creator>Li, Xiaohong</creator><creator>Rai, Shesh N.</creator><creator>Rouchka, Eric C.</creator><creator>O’Toole, Timothy E.</creator><creator>Cooper, Nigel G. F.</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3487-6572</orcidid><orcidid>https://orcid.org/0000-0002-8377-353X</orcidid><orcidid>https://orcid.org/0000-0003-1773-1523</orcidid></search><sort><creationdate>20210901</creationdate><title>Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates</title><author>Li, Xiaohong ; Rai, Shesh N. ; Rouchka, Eric C. ; O’Toole, Timothy E. ; Cooper, Nigel G. F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2244-898181e5a38650b254f873d6f3f75d1c37ed5db21c12d6e04479560fa8ab5f5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiaohong</creatorcontrib><creatorcontrib>Rai, Shesh N.</creatorcontrib><creatorcontrib>Rouchka, Eric C.</creatorcontrib><creatorcontrib>O’Toole, Timothy E.</creatorcontrib><creatorcontrib>Cooper, Nigel G. F.</creatorcontrib><collection>CrossRef</collection><jtitle>BioMedInformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiaohong</au><au>Rai, Shesh N.</au><au>Rouchka, Eric C.</au><au>O’Toole, Timothy E.</au><au>Cooper, Nigel G. F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates</atitle><jtitle>BioMedInformatics</jtitle><date>2021-09-01</date><risdate>2021</risdate><volume>1</volume><issue>2</issue><spage>47</spage><epage>63</epage><pages>47-63</pages><issn>2673-7426</issn><eissn>2673-7426</eissn><abstract>Sample size calculation for adequate power analysis is critical in optimizing RNA-seq experimental design. However, the complexity increases for directly estimating sample size when taking into consideration confounding covariates. Although a number of approaches for sample size calculation have been proposed for RNA-seq data, most ignore any potential heterogeneity. In this study, we implemented a simulation-based and confounder-adjusted method to provide sample size recommendations for RNA-seq differential expression analysis. The data was generated using Monte Carlo simulation, given an underlined distribution of confounding covariates and parameters for a negative binomial distribution. The relationship between the sample size with the power and parameters, such as dispersion, fold change and mean read counts, can be visualized. We demonstrate that the adjusted sample size for a desired power and type one error rate of α is usually larger when taking confounding covariates into account. More importantly, our simulation study reveals that sample size may be underestimated by existing methods if a confounding covariate exists in RNA-seq data. Consequently, this underestimate could affect the detection power for the differential expression analysis. Therefore, we introduce confounding covariates for sample size estimation for heterogeneous RNA-seq data.</abstract><doi>10.3390/biomedinformatics1020004</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0003-3487-6572</orcidid><orcidid>https://orcid.org/0000-0002-8377-353X</orcidid><orcidid>https://orcid.org/0000-0003-1773-1523</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2673-7426
ispartof BioMedInformatics, 2021-09, Vol.1 (2), p.47-63
issn 2673-7426
2673-7426
language eng
recordid cdi_crossref_primary_10_3390_biomedinformatics1020004
source DOAJ Directory of Open Access Journals; MDPI - Multidisciplinary Digital Publishing Institute
title Adjusted Sample Size Calculation for RNA-seq Data in the Presence of Confounding Covariates
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T12%3A29%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adjusted%20Sample%20Size%20Calculation%20for%20RNA-seq%20Data%20in%20the%20Presence%20of%20Confounding%20Covariates&rft.jtitle=BioMedInformatics&rft.au=Li,%20Xiaohong&rft.date=2021-09-01&rft.volume=1&rft.issue=2&rft.spage=47&rft.epage=63&rft.pages=47-63&rft.issn=2673-7426&rft.eissn=2673-7426&rft_id=info:doi/10.3390/biomedinformatics1020004&rft_dat=%3Ccrossref%3E10_3390_biomedinformatics1020004%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true