The specious art of single-cell genomics
Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to th...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2023-08, Vol.19 (8), p.e1011288-e1011288 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1011288 |
---|---|
container_issue | 8 |
container_start_page | e1011288 |
container_title | PLoS computational biology |
container_volume | 19 |
creator | Chari, Tara Pachter, Lior |
description | Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery. |
doi_str_mv | 10.1371/journal.pcbi.1011288 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2865519713</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A763760127</galeid><doaj_id>oai_doaj_org_article_654e356cae5d477fae9207a05bca1181</doaj_id><sourcerecordid>A763760127</sourcerecordid><originalsourceid>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</originalsourceid><addsrcrecordid>eNqFkk9v1DAQxSMEoqXwDRBE4lIOWTx2Jo5PqKr4s1IFEpSzZTt26lU2XuwEwbfHy6ZVFyFx8sj-zfN7oymK50BWwDi82YQ5jmpY7Yz2KyAAtG0fFKeAyCrOsH14rz4pnqS0ISSXonlcnDCOglDanhbn1ze2TDtrfJhTqeJUBlcmP_aDrYwdhrK3Y9h6k54Wj5wakn22nGfFt_fvri8_VlefP6wvL64qgwKnSruOCmaMFY5ixxgi0UIYQVtGuFJOK3BdDdiBdkoYrjlFQnXjMpX9CXZWvDzo7oaQ5BIySdo2iCA4sEysD0QX1Ebuot-q-EsG5eWfixB7mXN4M1jZYG0ZNkZZ7GrOnbKCZhsEtVEALWStt8tvs97azthximo4Ej1-Gf2N7MMPCaRmtaibrHC-KMTwfbZpkluf9pNTo80jzcaRicxyntFXf6H_jvf6QPUqJ_CjCeNkf069mlOS669f5AVvGG8IUP4_9tMxWx9YE0NK0bq7lEDkfqNu3cj9Rsllo3Lbi_sTumu6XSH2G157xjM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865519713</pqid></control><display><type>article</type><title>The specious art of single-cell genomics</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Public Library of Science (PLoS)</source><creator>Chari, Tara ; Pachter, Lior</creator><creatorcontrib>Chari, Tara ; Pachter, Lior</creatorcontrib><description>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1011288</identifier><identifier>PMID: 37590228</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Biological analysis ; Biology and Life Sciences ; Computer and Information Sciences ; Data Analysis ; Data mining ; Datasets ; Embedding ; Engineering and Technology ; Gene expression ; Genetic research ; Genomics ; Humans ; Information management ; Methods ; Physical Sciences ; Qualitative analysis ; Research and Analysis Methods ; Social Sciences ; Stem cells ; Thermometers</subject><ispartof>PLoS computational biology, 2023-08, Vol.19 (8), p.e1011288-e1011288</ispartof><rights>Copyright: © 2023 Chari, Pachter. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2023 Public Library of Science</rights><rights>2023 Chari, Pachter. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 Chari, Pachter 2023 Chari, Pachter</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</citedby><cites>FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</cites><orcidid>0000-0002-9164-6231 ; 0000-0002-6953-4313</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10434946/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10434946/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79569,79570</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37590228$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chari, Tara</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><title>The specious art of single-cell genomics</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</description><subject>Biological analysis</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data Analysis</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Engineering and Technology</subject><subject>Gene expression</subject><subject>Genetic research</subject><subject>Genomics</subject><subject>Humans</subject><subject>Information management</subject><subject>Methods</subject><subject>Physical Sciences</subject><subject>Qualitative analysis</subject><subject>Research and Analysis Methods</subject><subject>Social Sciences</subject><subject>Stem cells</subject><subject>Thermometers</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqFkk9v1DAQxSMEoqXwDRBE4lIOWTx2Jo5PqKr4s1IFEpSzZTt26lU2XuwEwbfHy6ZVFyFx8sj-zfN7oymK50BWwDi82YQ5jmpY7Yz2KyAAtG0fFKeAyCrOsH14rz4pnqS0ISSXonlcnDCOglDanhbn1ze2TDtrfJhTqeJUBlcmP_aDrYwdhrK3Y9h6k54Wj5wakn22nGfFt_fvri8_VlefP6wvL64qgwKnSruOCmaMFY5ixxgi0UIYQVtGuFJOK3BdDdiBdkoYrjlFQnXjMpX9CXZWvDzo7oaQ5BIySdo2iCA4sEysD0QX1Ebuot-q-EsG5eWfixB7mXN4M1jZYG0ZNkZZ7GrOnbKCZhsEtVEALWStt8tvs97azthximo4Ej1-Gf2N7MMPCaRmtaibrHC-KMTwfbZpkluf9pNTo80jzcaRicxyntFXf6H_jvf6QPUqJ_CjCeNkf069mlOS669f5AVvGG8IUP4_9tMxWx9YE0NK0bq7lEDkfqNu3cj9Rsllo3Lbi_sTumu6XSH2G157xjM</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Chari, Tara</creator><creator>Pachter, Lior</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PJZUB</scope><scope>PKEHL</scope><scope>PPXIY</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid><orcidid>https://orcid.org/0000-0002-6953-4313</orcidid></search><sort><creationdate>20230801</creationdate><title>The specious art of single-cell genomics</title><author>Chari, Tara ; Pachter, Lior</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c595t-bfd293cce9f25d33550b99c928307aafba1fd415d1bfa9c7b72502b6fb9900393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Biological analysis</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data Analysis</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Engineering and Technology</topic><topic>Gene expression</topic><topic>Genetic research</topic><topic>Genomics</topic><topic>Humans</topic><topic>Information management</topic><topic>Methods</topic><topic>Physical Sciences</topic><topic>Qualitative analysis</topic><topic>Research and Analysis Methods</topic><topic>Social Sciences</topic><topic>Stem cells</topic><topic>Thermometers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chari, Tara</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest Health & Medical Research Collection</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Health & Nursing</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chari, Tara</au><au>Pachter, Lior</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The specious art of single-cell genomics</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2023-08-01</date><risdate>2023</risdate><volume>19</volume><issue>8</issue><spage>e1011288</spage><epage>e1011288</epage><pages>e1011288-e1011288</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>37590228</pmid><doi>10.1371/journal.pcbi.1011288</doi><tpages>e1011288</tpages><orcidid>https://orcid.org/0000-0002-9164-6231</orcidid><orcidid>https://orcid.org/0000-0002-6953-4313</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7358 |
ispartof | PLoS computational biology, 2023-08, Vol.19 (8), p.e1011288-e1011288 |
issn | 1553-7358 1553-734X 1553-7358 |
language | eng |
recordid | cdi_plos_journals_2865519713 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Public Library of Science (PLoS) |
subjects | Biological analysis Biology and Life Sciences Computer and Information Sciences Data Analysis Data mining Datasets Embedding Engineering and Technology Gene expression Genetic research Genomics Humans Information management Methods Physical Sciences Qualitative analysis Research and Analysis Methods Social Sciences Stem cells Thermometers |
title | The specious art of single-cell genomics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-18T00%3A12%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20specious%20art%20of%20single-cell%20genomics&rft.jtitle=PLoS%20computational%20biology&rft.au=Chari,%20Tara&rft.date=2023-08-01&rft.volume=19&rft.issue=8&rft.spage=e1011288&rft.epage=e1011288&rft.pages=e1011288-e1011288&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1011288&rft_dat=%3Cgale_plos_%3EA763760127%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865519713&rft_id=info:pmid/37590228&rft_galeid=A763760127&rft_doaj_id=oai_doaj_org_article_654e356cae5d477fae9207a05bca1181&rfr_iscdi=true |