Sequence biases in large scale gene expression profiling data

We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (Long...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nucleic acids research 2006-01, Vol.34 (12), p.e83-e83
Hauptverfasser:	Siddiqui, Asim S., Delaney, Allen D., Schnerch, Angelique, Griffith, Obi L., Jones, Steven J. M., Marra, Marco A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Animals Base Composition Cytosine - analysis DNA - chemistry Gene Expression Profiling Genes Guanine - analysis Humans Methods Online Mice Nucleic Acid Probes - chemistry Oligonucleotide Array Sequence Analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	e83
container_issue	12
container_start_page	e83
container_title	Nucleic acids research
container_volume	34
creator	Siddiqui, Asim S. Delaney, Allen D. Schnerch, Angelique Griffith, Obi L. Jones, Steven J. M. Marra, Marco A.
description	We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).
doi_str_mv	10.1093/nar/gkl404
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1524917</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68627837</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-ce92e6c0c1df46162e7dc388be368267b1b68a6232b30f2bdb72ac7711aef99a3</originalsourceid><addsrcrecordid>eNpdkUtLxDAUhYMoOj42_gApLlwI1byapAsF8TXqoAsVxE1I09sa7aRj0hH991Zm8LW6i_NxOJcPoU2C9wjO2b43Yb9-aTjmC2hAmKApzwVdRAPMcJYSzNUKWo3xGWPCScaX0QoRiuOMygE6uIXXKXgLSeFMhJg4nzQm1JBEaxpIavCQwPskQIyu9ckktJVrnK-T0nRmHS1VpomwMb9r6P7s9O54mI5uzi-Oj0ap5Zx0qYWcgrDYkrLigggKsrRMqQKYUFTIghRCGUEZLRiuaFEWkhorJSEGqjw3bA0dznon02IMpQXfBdPoSXBjEz50a5z-m3j3pOv2TZOM8pzIvmBnXhDa_t_Y6bGLFprGeGinUQslqFTsC9z-Bz630-D75zTFWGCFc9FDuzPIhjbGANX3EoL1lxLdK9EzJT289Xv7Dzp30APpDHCxg_fv3IQXLSSTmR4-POqrSzZ8HJ1c64x9Ai07l-Y</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200608096</pqid></control><display><type>article</type><title>Sequence biases in large scale gene expression profiling data</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Siddiqui, Asim S. ; Delaney, Allen D. ; Schnerch, Angelique ; Griffith, Obi L. ; Jones, Steven J. M. ; Marra, Marco A.</creator><creatorcontrib>Siddiqui, Asim S. ; Delaney, Allen D. ; Schnerch, Angelique ; Griffith, Obi L. ; Jones, Steven J. M. ; Marra, Marco A.</creatorcontrib><description>We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkl404</identifier><identifier>PMID: 16840527</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Animals ; Base Composition ; Cytosine - analysis ; DNA - chemistry ; Gene Expression Profiling ; Genes ; Guanine - analysis ; Humans ; Methods Online ; Mice ; Nucleic Acid Probes - chemistry ; Oligonucleotide Array Sequence Analysis</subject><ispartof>Nucleic acids research, 2006-01, Vol.34 (12), p.e83-e83</ispartof><rights>2006 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commerical use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>2006 The Author(s) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-ce92e6c0c1df46162e7dc388be368267b1b68a6232b30f2bdb72ac7711aef99a3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524917/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524917/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16840527$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Siddiqui, Asim S.</creatorcontrib><creatorcontrib>Delaney, Allen D.</creatorcontrib><creatorcontrib>Schnerch, Angelique</creatorcontrib><creatorcontrib>Griffith, Obi L.</creatorcontrib><creatorcontrib>Jones, Steven J. M.</creatorcontrib><creatorcontrib>Marra, Marco A.</creatorcontrib><title>Sequence biases in large scale gene expression profiling data</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).</description><subject>Animals</subject><subject>Base Composition</subject><subject>Cytosine - analysis</subject><subject>DNA - chemistry</subject><subject>Gene Expression Profiling</subject><subject>Genes</subject><subject>Guanine - analysis</subject><subject>Humans</subject><subject>Methods Online</subject><subject>Mice</subject><subject>Nucleic Acid Probes - chemistry</subject><subject>Oligonucleotide Array Sequence Analysis</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkUtLxDAUhYMoOj42_gApLlwI1byapAsF8TXqoAsVxE1I09sa7aRj0hH991Zm8LW6i_NxOJcPoU2C9wjO2b43Yb9-aTjmC2hAmKApzwVdRAPMcJYSzNUKWo3xGWPCScaX0QoRiuOMygE6uIXXKXgLSeFMhJg4nzQm1JBEaxpIavCQwPskQIyu9ckktJVrnK-T0nRmHS1VpomwMb9r6P7s9O54mI5uzi-Oj0ap5Zx0qYWcgrDYkrLigggKsrRMqQKYUFTIghRCGUEZLRiuaFEWkhorJSEGqjw3bA0dznon02IMpQXfBdPoSXBjEz50a5z-m3j3pOv2TZOM8pzIvmBnXhDa_t_Y6bGLFprGeGinUQslqFTsC9z-Bz630-D75zTFWGCFc9FDuzPIhjbGANX3EoL1lxLdK9EzJT289Xv7Dzp30APpDHCxg_fv3IQXLSSTmR4-POqrSzZ8HJ1c64x9Ai07l-Y</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Siddiqui, Asim S.</creator><creator>Delaney, Allen D.</creator><creator>Schnerch, Angelique</creator><creator>Griffith, Obi L.</creator><creator>Jones, Steven J. M.</creator><creator>Marra, Marco A.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20060101</creationdate><title>Sequence biases in large scale gene expression profiling data</title><author>Siddiqui, Asim S. ; Delaney, Allen D. ; Schnerch, Angelique ; Griffith, Obi L. ; Jones, Steven J. M. ; Marra, Marco A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-ce92e6c0c1df46162e7dc388be368267b1b68a6232b30f2bdb72ac7711aef99a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Animals</topic><topic>Base Composition</topic><topic>Cytosine - analysis</topic><topic>DNA - chemistry</topic><topic>Gene Expression Profiling</topic><topic>Genes</topic><topic>Guanine - analysis</topic><topic>Humans</topic><topic>Methods Online</topic><topic>Mice</topic><topic>Nucleic Acid Probes - chemistry</topic><topic>Oligonucleotide Array Sequence Analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Siddiqui, Asim S.</creatorcontrib><creatorcontrib>Delaney, Allen D.</creatorcontrib><creatorcontrib>Schnerch, Angelique</creatorcontrib><creatorcontrib>Griffith, Obi L.</creatorcontrib><creatorcontrib>Jones, Steven J. M.</creatorcontrib><creatorcontrib>Marra, Marco A.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Siddiqui, Asim S.</au><au>Delaney, Allen D.</au><au>Schnerch, Angelique</au><au>Griffith, Obi L.</au><au>Jones, Steven J. M.</au><au>Marra, Marco A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequence biases in large scale gene expression profiling data</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2006-01-01</date><risdate>2006</risdate><volume>34</volume><issue>12</issue><spage>e83</spage><epage>e83</epage><pages>e83-e83</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>We present the results of a simple, statistical assay that measures the G+C content sensitivity bias of gene expression experiments without the requirement of a duplicate experiment. We analyse five gene expression profiling methods: Affymetrix GeneChip, Long Serial Analysis of Gene Expression (LongSAGE), LongSAGELite, ‘Classic’ Massively Parallel Signature Sequencing (MPSS) and ‘Signature’ MPSS. We demonstrate the methods have systematic and random errors leading to a different G+C content sensitivity. The relationship between this experimental error and the G+C content of the probe set or tag that identifies each gene influences whether the gene is detected and, if detected, the level of gene expression measured. LongSAGE has the least bias, while Signature MPSS shows a strong bias to G+C rich tags and Affymetrix data show different bias depending on the data processing method (MAS 5.0, RMA or GC-RMA). The bias in the Affymetrix data primarily impacts genes expressed at lower levels. Despite the larger sampling of the MPSS library, SAGE identifies significantly more genes (60% more RefSeq genes in a single comparison).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16840527</pmid><doi>10.1093/nar/gkl404</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0305-1048
ispartof	Nucleic acids research, 2006-01, Vol.34 (12), p.e83-e83
issn	0305-1048 1362-4962
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1524917
source	Oxford Journals Open Access Collection; MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects	Animals Base Composition Cytosine - analysis DNA - chemistry Gene Expression Profiling Genes Guanine - analysis Humans Methods Online Mice Nucleic Acid Probes - chemistry Oligonucleotide Array Sequence Analysis
title	Sequence biases in large scale gene expression profiling data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T13%3A38%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequence%20biases%20in%20large%20scale%20gene%20expression%20profiling%20data&rft.jtitle=Nucleic%20acids%20research&rft.au=Siddiqui,%20Asim%20S.&rft.date=2006-01-01&rft.volume=34&rft.issue=12&rft.spage=e83&rft.epage=e83&rft.pages=e83-e83&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkl404&rft_dat=%3Cproquest_pubme%3E68627837%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200608096&rft_id=info:pmid/16840527&rfr_iscdi=true