Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis

Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Annals of human genetics 2021-11, Vol.85 (6), p.235-244
Hauptverfasser: Cui, Weitong, Xue, Huaru, Geng, Yifan, Zhang, Jing, Liang, Yajun, Tian, Xuewen, Wang, Qinglu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 244
container_issue 6
container_start_page 235
container_title Annals of human genetics
container_volume 85
creator Cui, Weitong
Xue, Huaru
Geng, Yifan
Zhang, Jing
Liang, Yajun
Tian, Xuewen
Wang, Qinglu
description Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p‐value and fold change thresholds on DE analyses, with RNA‐seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp‐values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA‐seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.
doi_str_mv 10.1111/ahg.12441
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2557534314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2581701361</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3531-fbca98ed911752d53e9765a3e3325d2a3c3a8d721e88294a65f678d786fbdc8d3</originalsourceid><addsrcrecordid>eNp1kc9q3DAQh0Vp6W6THPoCRZBLevBGY0m2fFxC_kFoICRno7VGu1q89q5kt_Etj9Bn7JNEziY5BDoXMZqPj2F-hHwHNoNYp3q1nEEqBHwiUxBZkYBixWcyZYzxRCjGJuRbCGvGIFWCfyUTLriAQmVT0p9bi1VHW0tXbrmiv7V3unNtQ11DO6-bUHm37Sg-bj2GMA7GmcGmc3ZwzZIaFw1-7HVdD28gGrrEBsOoufs1__f0N-CO6kbXQ3DhkHyxug549PoekIeL8_uzq-Tm9vL6bH6TVFxySOyi0oVCUwDkMjWSY5FnUnPkPJUm1bziWpk8BVQqLYTOpM3y-KEyuzCVMvyAnOy9W9_uegxduXGhwrrWDbZ9KFMpcxmPASKixx_Qddv7uO9IKcgZ8Awi9XNPVb4NwaMtt95ttB9KYOWYRRmzKF-yiOyPV2O_2KB5J9-OH4HTPfDH1Tj831TOry73ymeHt5S3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2581701361</pqid></control><display><type>article</type><title>Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis</title><source>MEDLINE</source><source>Wiley Online Library Free Content</source><source>Access via Wiley Online Library</source><creator>Cui, Weitong ; Xue, Huaru ; Geng, Yifan ; Zhang, Jing ; Liang, Yajun ; Tian, Xuewen ; Wang, Qinglu</creator><creatorcontrib>Cui, Weitong ; Xue, Huaru ; Geng, Yifan ; Zhang, Jing ; Liang, Yajun ; Tian, Xuewen ; Wang, Qinglu</creatorcontrib><description>Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p‐value and fold change thresholds on DE analyses, with RNA‐seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp‐values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA‐seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.</description><identifier>ISSN: 0003-4800</identifier><identifier>EISSN: 1469-1809</identifier><identifier>DOI: 10.1111/ahg.12441</identifier><identifier>PMID: 34341986</identifier><language>eng</language><publisher>England: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Differential expression ; false discovery rate ; fold change ; Gene Expression ; Genomes ; Humans ; Neoplasms - genetics ; Ribonucleic acid ; RNA ; RNA, Messenger - genetics ; RNA-Seq ; sample size ; threshold ; Transcription ; Variation</subject><ispartof>Annals of human genetics, 2021-11, Vol.85 (6), p.235-244</ispartof><rights>2021 John Wiley &amp; Sons Ltd/University College London</rights><rights>2021 John Wiley &amp; Sons Ltd/University College London.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3531-fbca98ed911752d53e9765a3e3325d2a3c3a8d721e88294a65f678d786fbdc8d3</citedby><cites>FETCH-LOGICAL-c3531-fbca98ed911752d53e9765a3e3325d2a3c3a8d721e88294a65f678d786fbdc8d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fahg.12441$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fahg.12441$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,1433,27924,27925,45574,45575,46409,46833</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34341986$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cui, Weitong</creatorcontrib><creatorcontrib>Xue, Huaru</creatorcontrib><creatorcontrib>Geng, Yifan</creatorcontrib><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Liang, Yajun</creatorcontrib><creatorcontrib>Tian, Xuewen</creatorcontrib><creatorcontrib>Wang, Qinglu</creatorcontrib><title>Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis</title><title>Annals of human genetics</title><addtitle>Ann Hum Genet</addtitle><description>Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p‐value and fold change thresholds on DE analyses, with RNA‐seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp‐values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA‐seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.</description><subject>Algorithms</subject><subject>Differential expression</subject><subject>false discovery rate</subject><subject>fold change</subject><subject>Gene Expression</subject><subject>Genomes</subject><subject>Humans</subject><subject>Neoplasms - genetics</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA, Messenger - genetics</subject><subject>RNA-Seq</subject><subject>sample size</subject><subject>threshold</subject><subject>Transcription</subject><subject>Variation</subject><issn>0003-4800</issn><issn>1469-1809</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kc9q3DAQh0Vp6W6THPoCRZBLevBGY0m2fFxC_kFoICRno7VGu1q89q5kt_Etj9Bn7JNEziY5BDoXMZqPj2F-hHwHNoNYp3q1nEEqBHwiUxBZkYBixWcyZYzxRCjGJuRbCGvGIFWCfyUTLriAQmVT0p9bi1VHW0tXbrmiv7V3unNtQ11DO6-bUHm37Sg-bj2GMA7GmcGmc3ZwzZIaFw1-7HVdD28gGrrEBsOoufs1__f0N-CO6kbXQ3DhkHyxug549PoekIeL8_uzq-Tm9vL6bH6TVFxySOyi0oVCUwDkMjWSY5FnUnPkPJUm1bziWpk8BVQqLYTOpM3y-KEyuzCVMvyAnOy9W9_uegxduXGhwrrWDbZ9KFMpcxmPASKixx_Qddv7uO9IKcgZ8Awi9XNPVb4NwaMtt95ttB9KYOWYRRmzKF-yiOyPV2O_2KB5J9-OH4HTPfDH1Tj831TOry73ymeHt5S3</recordid><startdate>202111</startdate><enddate>202111</enddate><creator>Cui, Weitong</creator><creator>Xue, Huaru</creator><creator>Geng, Yifan</creator><creator>Zhang, Jing</creator><creator>Liang, Yajun</creator><creator>Tian, Xuewen</creator><creator>Wang, Qinglu</creator><general>Wiley Subscription Services, Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>202111</creationdate><title>Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis</title><author>Cui, Weitong ; Xue, Huaru ; Geng, Yifan ; Zhang, Jing ; Liang, Yajun ; Tian, Xuewen ; Wang, Qinglu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3531-fbca98ed911752d53e9765a3e3325d2a3c3a8d721e88294a65f678d786fbdc8d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Differential expression</topic><topic>false discovery rate</topic><topic>fold change</topic><topic>Gene Expression</topic><topic>Genomes</topic><topic>Humans</topic><topic>Neoplasms - genetics</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA, Messenger - genetics</topic><topic>RNA-Seq</topic><topic>sample size</topic><topic>threshold</topic><topic>Transcription</topic><topic>Variation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cui, Weitong</creatorcontrib><creatorcontrib>Xue, Huaru</creatorcontrib><creatorcontrib>Geng, Yifan</creatorcontrib><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Liang, Yajun</creatorcontrib><creatorcontrib>Tian, Xuewen</creatorcontrib><creatorcontrib>Wang, Qinglu</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Annals of human genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cui, Weitong</au><au>Xue, Huaru</au><au>Geng, Yifan</au><au>Zhang, Jing</au><au>Liang, Yajun</au><au>Tian, Xuewen</au><au>Wang, Qinglu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis</atitle><jtitle>Annals of human genetics</jtitle><addtitle>Ann Hum Genet</addtitle><date>2021-11</date><risdate>2021</risdate><volume>85</volume><issue>6</issue><spage>235</spage><epage>244</epage><pages>235-244</pages><issn>0003-4800</issn><eissn>1469-1809</eissn><abstract>Summary Great efforts have been made on the algorithms that deal with RNA‐seq data to enhance the accuracy and efficiency of differential expression (DE) analysis. However, no consensus has been reached on the proper threshold values of fold change and adjusted p‐value for filtering differentially expressed genes (DEGs). It is generally believed that the more stringent the filtering threshold, the more reliable the result of a DE analysis. Nevertheless, by analyzing the impact of both adjusted p‐value and fold change thresholds on DE analyses, with RNA‐seq data obtained for three different cancer types from the Cancer Genome Atlas (TCGA) database, we found that, for a given sample size, the reproducibility of DE results became poorer when more stringent thresholds were applied. No matter which threshold level was applied, the overlap rates of DEGs were generally lower for small sample sizes than for large sample sizes. The raw read count analysis demonstrated that the transcript expression of the same gene in different samples, whether in tumor groups or in normal groups, showed high variations, which resulted in a drastic fluctuation in fold change values and adjustedp‐values when different sets of samples were used. Overall, more stringent thresholds did not yield more reliable DEGs due to high variations in transcript expression; the reliability of DEGs obtained with small sample sizes was more susceptible to these variations. Therefore, less stringent thresholds are recommended for screening DEGs. Moreover, large sample sizes should be considered in RNA‐seq experimental designs to reduce the interfering effect of variations in transcript expression on DEG identification.</abstract><cop>England</cop><pub>Wiley Subscription Services, Inc</pub><pmid>34341986</pmid><doi>10.1111/ahg.12441</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0003-4800
ispartof Annals of human genetics, 2021-11, Vol.85 (6), p.235-244
issn 0003-4800
1469-1809
language eng
recordid cdi_proquest_miscellaneous_2557534314
source MEDLINE; Wiley Online Library Free Content; Access via Wiley Online Library
subjects Algorithms
Differential expression
false discovery rate
fold change
Gene Expression
Genomes
Humans
Neoplasms - genetics
Ribonucleic acid
RNA
RNA, Messenger - genetics
RNA-Seq
sample size
threshold
Transcription
Variation
title Effect of high variation in transcript expression on identifying differentially expressed genes in RNA‐seq analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A24%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effect%20of%20high%20variation%20in%20transcript%20expression%20on%20identifying%20differentially%20expressed%20genes%20in%20RNA%E2%80%90seq%20analysis&rft.jtitle=Annals%20of%20human%20genetics&rft.au=Cui,%20Weitong&rft.date=2021-11&rft.volume=85&rft.issue=6&rft.spage=235&rft.epage=244&rft.pages=235-244&rft.issn=0003-4800&rft.eissn=1469-1809&rft_id=info:doi/10.1111/ahg.12441&rft_dat=%3Cproquest_cross%3E2581701361%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2581701361&rft_id=info:pmid/34341986&rfr_iscdi=true