The specious art of single-cell genomics

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2023-08, Vol.19 (8), p.e1011288-e1011288
Hauptverfasser: Chari, Tara, Pachter, Lior
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1011288
container_issue 8
container_start_page e1011288
container_title PLoS computational biology
container_volume 19
creator Chari, Tara
Pachter, Lior
description Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.
doi_str_mv 10.1371/journal.pcbi.1011288
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2865519713</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A763760127</galeid><doaj_id>oai_doaj_org_article_654e356cae5d477fae9207a05bca1181</doaj_id><sourcerecordid>A763760127</sourcerecordid><originalsourceid>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</originalsourceid><addsrcrecordid>eNqFkk9v1DAQxSMEoqXwDRBE4lIOWTx2Jo5PqKr4s1IFEpSzZTt26lU2XuwEwbfHy6ZVFyFx8sj-zfN7oymK50BWwDi82YQ5jmpY7Yz2KyAAtG0fFKeAyCrOsH14rz4pnqS0ISSXonlcnDCOglDanhbn1ze2TDtrfJhTqeJUBlcmP_aDrYwdhrK3Y9h6k54Wj5wakn22nGfFt_fvri8_VlefP6wvL64qgwKnSruOCmaMFY5ixxgi0UIYQVtGuFJOK3BdDdiBdkoYrjlFQnXjMpX9CXZWvDzo7oaQ5BIySdo2iCA4sEysD0QX1Ebuot-q-EsG5eWfixB7mXN4M1jZYG0ZNkZZ7GrOnbKCZhsEtVEALWStt8tvs97azthximo4Ej1-Gf2N7MMPCaRmtaibrHC-KMTwfbZpkluf9pNTo80jzcaRicxyntFXf6H_jvf6QPUqJ_CjCeNkf069mlOS669f5AVvGG8IUP4_9tMxWx9YE0NK0bq7lEDkfqNu3cj9Rsllo3Lbi_sTumu6XSH2G157xjM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865519713</pqid></control><display><type>article</type><title>The specious art of single-cell genomics</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Chari, Tara ; Pachter, Lior</creator><creatorcontrib>Chari, Tara ; Pachter, Lior</creatorcontrib><description>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1011288</identifier><identifier>PMID: 37590228</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Biological analysis ; Biology and Life Sciences ; Computer and Information Sciences ; Data Analysis ; Data mining ; Datasets ; Embedding ; Engineering and Technology ; Gene expression ; Genetic research ; Genomics ; Humans ; Information management ; Methods ; Physical Sciences ; Qualitative analysis ; Research and Analysis Methods ; Social Sciences ; Stem cells ; Thermometers</subject><ispartof>PLoS computational biology, 2023-08, Vol.19 (8), p.e1011288-e1011288</ispartof><rights>Copyright: © 2023 Chari, Pachter. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2023 Public Library of Science</rights><rights>2023 Chari, Pachter. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 Chari, Pachter 2023 Chari, Pachter</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</citedby><cites>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</cites><orcidid>0000-0002-9164-6231 ; 0000-0002-6953-4313</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10434946/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10434946/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79569,79570</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37590228$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chari, Tara</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><title>The specious art of single-cell genomics</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</description><subject>Biological analysis</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data Analysis</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Engineering and Technology</subject><subject>Gene expression</subject><subject>Genetic research</subject><subject>Genomics</subject><subject>Humans</subject><subject>Information management</subject><subject>Methods</subject><subject>Physical Sciences</subject><subject>Qualitative analysis</subject><subject>Research and Analysis Methods</subject><subject>Social Sciences</subject><subject>Stem cells</subject><subject>Thermometers</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqFkk9v1DAQxSMEoqXwDRBE4lIOWTx2Jo5PqKr4s1IFEpSzZTt26lU2XuwEwbfHy6ZVFyFx8sj-zfN7oymK50BWwDi82YQ5jmpY7Yz2KyAAtG0fFKeAyCrOsH14rz4pnqS0ISSXonlcnDCOglDanhbn1ze2TDtrfJhTqeJUBlcmP_aDrYwdhrK3Y9h6k54Wj5wakn22nGfFt_fvri8_VlefP6wvL64qgwKnSruOCmaMFY5ixxgi0UIYQVtGuFJOK3BdDdiBdkoYrjlFQnXjMpX9CXZWvDzo7oaQ5BIySdo2iCA4sEysD0QX1Ebuot-q-EsG5eWfixB7mXN4M1jZYG0ZNkZZ7GrOnbKCZhsEtVEALWStt8tvs97azthximo4Ej1-Gf2N7MMPCaRmtaibrHC-KMTwfbZpkluf9pNTo80jzcaRicxyntFXf6H_jvf6QPUqJ_CjCeNkf069mlOS669f5AVvGG8IUP4_9tMxWx9YE0NK0bq7lEDkfqNu3cj9Rsllo3Lbi_sTumu6XSH2G157xjM</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Chari, Tara</creator><creator>Pachter, Lior</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PJZUB</scope><scope>PKEHL</scope><scope>PPXIY</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid><orcidid>https://orcid.org/0000-0002-6953-4313</orcidid></search><sort><creationdate>20230801</creationdate><title>The specious art of single-cell genomics</title><author>Chari, Tara ; Pachter, Lior</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biological analysis</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data Analysis</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Engineering and Technology</topic><topic>Gene expression</topic><topic>Genetic research</topic><topic>Genomics</topic><topic>Humans</topic><topic>Information management</topic><topic>Methods</topic><topic>Physical Sciences</topic><topic>Qualitative analysis</topic><topic>Research and Analysis Methods</topic><topic>Social Sciences</topic><topic>Stem cells</topic><topic>Thermometers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chari, Tara</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest Health &amp; Medical Research Collection</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Health &amp; Nursing</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chari, Tara</au><au>Pachter, Lior</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The specious art of single-cell genomics</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2023-08-01</date><risdate>2023</risdate><volume>19</volume><issue>8</issue><spage>e1011288</spage><epage>e1011288</epage><pages>e1011288-e1011288</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>37590228</pmid><doi>10.1371/journal.pcbi.1011288</doi><tpages>e1011288</tpages><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid><orcidid>https://orcid.org/0000-0002-6953-4313</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2023-08, Vol.19 (8), p.e1011288-e1011288
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2865519713
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Public Library of Science (PLoS)
subjects Biological analysis
Biology and Life Sciences
Computer and Information Sciences
Data Analysis
Data mining
Datasets
Embedding
Engineering and Technology
Gene expression
Genetic research
Genomics
Humans
Information management
Methods
Physical Sciences
Qualitative analysis
Research and Analysis Methods
Social Sciences
Stem cells
Thermometers
title The specious art of single-cell genomics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T00%3A12%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20specious%20art%20of%20single-cell%20genomics&rft.jtitle=PLoS%20computational%20biology&rft.au=Chari,%20Tara&rft.date=2023-08-01&rft.volume=19&rft.issue=8&rft.spage=e1011288&rft.epage=e1011288&rft.pages=e1011288-e1011288&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1011288&rft_dat=%3Cgale_plos_%3EA763760127%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865519713&rft_id=info:pmid/37590228&rft_galeid=A763760127&rft_doaj_id=oai_doaj_org_article_654e356cae5d477fae9207a05bca1181&rfr_iscdi=true