Determining clinically relevant features in cytometry data using persistent homology

Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-03
Hauptverfasser: Mukherjee, Soham, Wethington, Darren, Dey, Tamal K, Das, Jayajit
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Mukherjee, Soham
Wethington, Darren
Dey, Tamal K
Das, Jayajit
description Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-na\"ive CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-na\"ive CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.
doi_str_mv 10.48550/arxiv.2203.06263
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2203_06263</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2639111812</sourcerecordid><originalsourceid>FETCH-LOGICAL-a522-204cf3237a3f2f5ec2a6ce6650e9a7730a0d71cfc60e6b9f1c004dd5d85f68693</originalsourceid><addsrcrecordid>eNotj8tqwzAQRUWh0JDmA7qqoGuno5El28uSPiHQjfdGlUepgx-pJIf67-skXd3FPTPcw9idgHWaKwWPxv82xzUiyDVo1PKKLVBKkeQp4g1bhbAHANQZKiUXrHymSL5r-qbfcdvOaU3bTtxTS0fTR-7IxNFT4E3P7RSHjqKfeG2i4WM4HR3IhyZEmtnvoRvaYTfdsmtn2kCr_1yy8vWl3Lwn28-3j83TNjEKMUFIrZMoMyMdOkUWjbaktQIqTJZJMFBnwjqrgfRX4YQFSOta1blyOteFXLL7y9uzcnXwTWf8VJ3Uq7P6TDxciIMffkYKsdoPo-_nTdVcF0KIXKD8A0ffXZE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2639111812</pqid></control><display><type>article</type><title>Determining clinically relevant features in cytometry data using persistent homology</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Mukherjee, Soham ; Wethington, Darren ; Dey, Tamal K ; Das, Jayajit</creator><creatorcontrib>Mukherjee, Soham ; Wethington, Darren ; Dey, Tamal K ; Das, Jayajit</creatorcontrib><description>Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-na\"ive CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-na\"ive CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2203.06263</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Boolean algebra ; Coronaviruses ; COVID-19 ; Cytometry ; Data analysis ; Datasets ; Decision trees ; Homology ; Lymphocytes ; Proteins ; Quantitative Biology - Quantitative Methods ; Topology</subject><ispartof>arXiv.org, 2022-03</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.1371/journal.pcbi.1009931$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2203.06263$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mukherjee, Soham</creatorcontrib><creatorcontrib>Wethington, Darren</creatorcontrib><creatorcontrib>Dey, Tamal K</creatorcontrib><creatorcontrib>Das, Jayajit</creatorcontrib><title>Determining clinically relevant features in cytometry data using persistent homology</title><title>arXiv.org</title><description>Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-na\"ive CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-na\"ive CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.</description><subject>Boolean algebra</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>Cytometry</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Homology</subject><subject>Lymphocytes</subject><subject>Proteins</subject><subject>Quantitative Biology - Quantitative Methods</subject><subject>Topology</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj8tqwzAQRUWh0JDmA7qqoGuno5El28uSPiHQjfdGlUepgx-pJIf67-skXd3FPTPcw9idgHWaKwWPxv82xzUiyDVo1PKKLVBKkeQp4g1bhbAHANQZKiUXrHymSL5r-qbfcdvOaU3bTtxTS0fTR-7IxNFT4E3P7RSHjqKfeG2i4WM4HR3IhyZEmtnvoRvaYTfdsmtn2kCr_1yy8vWl3Lwn28-3j83TNjEKMUFIrZMoMyMdOkUWjbaktQIqTJZJMFBnwjqrgfRX4YQFSOta1blyOteFXLL7y9uzcnXwTWf8VJ3Uq7P6TDxciIMffkYKsdoPo-_nTdVcF0KIXKD8A0ffXZE</recordid><startdate>20220311</startdate><enddate>20220311</enddate><creator>Mukherjee, Soham</creator><creator>Wethington, Darren</creator><creator>Dey, Tamal K</creator><creator>Das, Jayajit</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20220311</creationdate><title>Determining clinically relevant features in cytometry data using persistent homology</title><author>Mukherjee, Soham ; Wethington, Darren ; Dey, Tamal K ; Das, Jayajit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a522-204cf3237a3f2f5ec2a6ce6650e9a7730a0d71cfc60e6b9f1c004dd5d85f68693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Boolean algebra</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>Cytometry</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Homology</topic><topic>Lymphocytes</topic><topic>Proteins</topic><topic>Quantitative Biology - Quantitative Methods</topic><topic>Topology</topic><toplevel>online_resources</toplevel><creatorcontrib>Mukherjee, Soham</creatorcontrib><creatorcontrib>Wethington, Darren</creatorcontrib><creatorcontrib>Dey, Tamal K</creatorcontrib><creatorcontrib>Das, Jayajit</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mukherjee, Soham</au><au>Wethington, Darren</au><au>Dey, Tamal K</au><au>Das, Jayajit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Determining clinically relevant features in cytometry data using persistent homology</atitle><jtitle>arXiv.org</jtitle><date>2022-03-11</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-na\"ive CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-na\"ive CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2203.06263</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-03
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2203_06263
source arXiv.org; Free E- Journals
subjects Boolean algebra
Coronaviruses
COVID-19
Cytometry
Data analysis
Datasets
Decision trees
Homology
Lymphocytes
Proteins
Quantitative Biology - Quantitative Methods
Topology
title Determining clinically relevant features in cytometry data using persistent homology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T11%3A11%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Determining%20clinically%20relevant%20features%20in%20cytometry%20data%20using%20persistent%20homology&rft.jtitle=arXiv.org&rft.au=Mukherjee,%20Soham&rft.date=2022-03-11&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2203.06263&rft_dat=%3Cproquest_arxiv%3E2639111812%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2639111812&rft_id=info:pmid/&rfr_iscdi=true