A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications

Introduction In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of cancer research and clinical oncology 2024-01, Vol.150 (1), p.3-3, Article 3
Hauptverfasser:	Shi, Xiuchao, Yue, Chunxiao, Quan, Meiping, Li, Yalin, Nashwan Sam, Hiba
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms blood Blood cancer breasts Cancer Cancer Research Cell interactions Clustering Communication data collection Environmental factors Gene expression genes Hematology Hereditary diseases Internal Medicine Leukocytes (basophilic) Medicine Medicine & Public Health Oncology Promoters
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3
container_issue	1
container_start_page	3
container_title	Journal of cancer research and clinical oncology
container_volume	150
creator	Shi, Xiuchao Yue, Chunxiao Quan, Meiping Li, Yalin Nashwan Sam, Hiba
description	Introduction In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. Objective Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. Methodology The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. Results The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between “basophils” and “ciliary.epithelial.cells” with 62,809 promoters. Conclusion The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.
doi_str_mv	10.1007/s00432-023-05559-4
format	Article
fullrecord	<record><control><sourceid>proquest_C6C</sourceid><recordid>TN_cdi_proquest_miscellaneous_2910190426</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3153571721</sourcerecordid><originalsourceid>FETCH-LOGICAL-c431t-f3217c350e9cdf591e0c9375b8c4965b84ac42fd93d970b3b2a49377b57b1c173</originalsourceid><addsrcrecordid>eNqFksFu1DAQhi0EosvCC3BAlrj0YvDYcbw-VhUUpEpc4Bw5zmTrKomD7ZT2YXhXnKaAxKGcxp755rdH8xPyGvg74Fy_T5xXUjAuJONKKcOqJ2QHawqkVE_JjoMGpgTUJ-RFSte83JUWz8mJPEB94CB25OcZTTh6lpYZ441P2FGcSqYdkLphSRmjn47UDscQfb4aaR8i7Xxy4WarRBxs9mFKV35OtMX8A3EqRN9jxCmvLNqEpXRH8TZH6_La5nAYWA5sjbT1YQhH7-xAXRjHZSrHe82X5Flvh4SvHuKefPv44ev5J3b55eLz-dklc5WEzHopQDupOBrX9coAcmekVu3BVaYuobKuEn1nZGc0b2UrbFXqulW6BQda7snppjvH8H3BlJuxjFi-ZicMS2okKKk0aAH_RYUBDoZXoi7o23_Q67DEqQxSKCGVBJD8cYofTC2rsuY9ERvlYkgpYt_M0Y823jXAm9UNzeaGprihuXdDsza9eZBe2hG7Py2_118AuQFpXpeJ8e_bj8j-Ajwjwdo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2923531130</pqid></control><display><type>article</type><title>A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications</title><source>Springer Nature OA Free Journals</source><creator>Shi, Xiuchao ; Yue, Chunxiao ; Quan, Meiping ; Li, Yalin ; Nashwan Sam, Hiba</creator><creatorcontrib>Shi, Xiuchao ; Yue, Chunxiao ; Quan, Meiping ; Li, Yalin ; Nashwan Sam, Hiba</creatorcontrib><description>Introduction In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. Objective Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. Methodology The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. Results The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between “basophils” and “ciliary.epithelial.cells” with 62,809 promoters. Conclusion The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.</description><identifier>ISSN: 0171-5216</identifier><identifier>EISSN: 1432-1335</identifier><identifier>DOI: 10.1007/s00432-023-05559-4</identifier><identifier>PMID: 38168012</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; blood ; Blood cancer ; breasts ; Cancer ; Cancer Research ; Cell interactions ; Clustering ; Communication ; data collection ; Environmental factors ; Gene expression ; genes ; Hematology ; Hereditary diseases ; Internal Medicine ; Leukocytes (basophilic) ; Medicine ; Medicine & Public Health ; Oncology ; Promoters</subject><ispartof>Journal of cancer research and clinical oncology, 2024-01, Vol.150 (1), p.3-3, Article 3</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c431t-f3217c350e9cdf591e0c9375b8c4965b84ac42fd93d970b3b2a49377b57b1c173</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00432-023-05559-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://doi.org/10.1007/s00432-023-05559-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,860,27901,27902,41096,41464,42165,42533,51294,51551</link.rule.ids><linktorsrc>$$Uhttps://doi.org/10.1007/s00432-023-05559-4$$EView_record_in_Springer_Nature$$FView_record_in_$$GSpringer_Nature</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38168012$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shi, Xiuchao</creatorcontrib><creatorcontrib>Yue, Chunxiao</creatorcontrib><creatorcontrib>Quan, Meiping</creatorcontrib><creatorcontrib>Li, Yalin</creatorcontrib><creatorcontrib>Nashwan Sam, Hiba</creatorcontrib><title>A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications</title><title>Journal of cancer research and clinical oncology</title><addtitle>J Cancer Res Clin Oncol</addtitle><addtitle>J Cancer Res Clin Oncol</addtitle><description>Introduction In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. Objective Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. Methodology The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. Results The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between “basophils” and “ciliary.epithelial.cells” with 62,809 promoters. Conclusion The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.</description><subject>Algorithms</subject><subject>blood</subject><subject>Blood cancer</subject><subject>breasts</subject><subject>Cancer</subject><subject>Cancer Research</subject><subject>Cell interactions</subject><subject>Clustering</subject><subject>Communication</subject><subject>data collection</subject><subject>Environmental factors</subject><subject>Gene expression</subject><subject>genes</subject><subject>Hematology</subject><subject>Hereditary diseases</subject><subject>Internal Medicine</subject><subject>Leukocytes (basophilic)</subject><subject>Medicine</subject><subject>Medicine & Public Health</subject><subject>Oncology</subject><subject>Promoters</subject><issn>0171-5216</issn><issn>1432-1335</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFksFu1DAQhi0EosvCC3BAlrj0YvDYcbw-VhUUpEpc4Bw5zmTrKomD7ZT2YXhXnKaAxKGcxp755rdH8xPyGvg74Fy_T5xXUjAuJONKKcOqJ2QHawqkVE_JjoMGpgTUJ-RFSte83JUWz8mJPEB94CB25OcZTTh6lpYZ441P2FGcSqYdkLphSRmjn47UDscQfb4aaR8i7Xxy4WarRBxs9mFKV35OtMX8A3EqRN9jxCmvLNqEpXRH8TZH6_La5nAYWA5sjbT1YQhH7-xAXRjHZSrHe82X5Flvh4SvHuKefPv44ev5J3b55eLz-dklc5WEzHopQDupOBrX9coAcmekVu3BVaYuobKuEn1nZGc0b2UrbFXqulW6BQda7snppjvH8H3BlJuxjFi-ZicMS2okKKk0aAH_RYUBDoZXoi7o23_Q67DEqQxSKCGVBJD8cYofTC2rsuY9ERvlYkgpYt_M0Y823jXAm9UNzeaGprihuXdDsza9eZBe2hG7Py2_118AuQFpXpeJ8e_bj8j-Ajwjwdo</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Shi, Xiuchao</creator><creator>Yue, Chunxiao</creator><creator>Quan, Meiping</creator><creator>Li, Yalin</creator><creator>Nashwan Sam, Hiba</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TO</scope><scope>H94</scope><scope>K9.</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>MBDVC</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope></search><sort><creationdate>20240101</creationdate><title>A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications</title><author>Shi, Xiuchao ; Yue, Chunxiao ; Quan, Meiping ; Li, Yalin ; Nashwan Sam, Hiba</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c431t-f3217c350e9cdf591e0c9375b8c4965b84ac42fd93d970b3b2a49377b57b1c173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>blood</topic><topic>Blood cancer</topic><topic>breasts</topic><topic>Cancer</topic><topic>Cancer Research</topic><topic>Cell interactions</topic><topic>Clustering</topic><topic>Communication</topic><topic>data collection</topic><topic>Environmental factors</topic><topic>Gene expression</topic><topic>genes</topic><topic>Hematology</topic><topic>Hereditary diseases</topic><topic>Internal Medicine</topic><topic>Leukocytes (basophilic)</topic><topic>Medicine</topic><topic>Medicine & Public Health</topic><topic>Oncology</topic><topic>Promoters</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shi, Xiuchao</creatorcontrib><creatorcontrib>Yue, Chunxiao</creatorcontrib><creatorcontrib>Quan, Meiping</creatorcontrib><creatorcontrib>Li, Yalin</creatorcontrib><creatorcontrib>Nashwan Sam, Hiba</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><jtitle>Journal of cancer research and clinical oncology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shi, Xiuchao</au><au>Yue, Chunxiao</au><au>Quan, Meiping</au><au>Li, Yalin</au><au>Nashwan Sam, Hiba</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications</atitle><jtitle>Journal of cancer research and clinical oncology</jtitle><stitle>J Cancer Res Clin Oncol</stitle><addtitle>J Cancer Res Clin Oncol</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>150</volume><issue>1</issue><spage>3</spage><epage>3</epage><pages>3-3</pages><artnum>3</artnum><issn>0171-5216</issn><eissn>1432-1335</eissn><abstract>Introduction In recent decades, many theories have been proposed about the cause of hereditary diseases such as cancer. However, most studies state genetic and environmental factors as the most important parameters. It has been shown that gene expression data are valuable information about hereditary diseases and their analysis can identify the relationships between these diseases. Objective Identification of damaged genes from various diseases can be done through the discovery of cell-to-cell biological communications. Also, extraction of intercellular communications can identify relationships between different diseases. For example, gene disorders that cause damage to the same cells in both breast and blood cancers. Hence, the purpose is to discover cell-to-cell biological communications in gene expression data. Methodology The identification of cell-to-cell biological communications for various cancer diseases has been widely performed by clustering algorithms. However, this field remains open due to the abundance of unprocessed gene expression data. Accordingly, this paper focuses on the development of a semi-supervised ensemble clustering algorithm that can discover relationships between different diseases through the extraction of cell-to-cell biological communications. The proposed clustering framework includes a stratified feature sampling mechanism and a novel similarity metric to deal with high-dimensional data and improve the diversity of primary partitions. Results The performance of the proposed clustering algorithm is verified with several datasets from the UCI machine learning repository and then applied to the FANTOM5 dataset to extract cell-to-cell biological communications. The used version of this dataset contains 108 cells and 86,427 promoters from 702 samples. The strength of communication between two similar cells from different diseases indicates the relationship of those diseases. Here, the strength of communication is determined by promoter, so we found the highest cell-to-cell biological communication between “basophils” and “ciliary.epithelial.cells” with 62,809 promoters. Conclusion The maximum cell-to-cell biological similarity in each cluster can be used to detect the relationship between different diseases such as cancer.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><pmid>38168012</pmid><doi>10.1007/s00432-023-05559-4</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0171-5216
ispartof	Journal of cancer research and clinical oncology, 2024-01, Vol.150 (1), p.3-3, Article 3
issn	0171-5216 1432-1335
language	eng
recordid	cdi_proquest_miscellaneous_2910190426
source	Springer Nature OA Free Journals
subjects	Algorithms blood Blood cancer breasts Cancer Cancer Research Cell interactions Clustering Communication data collection Environmental factors Gene expression genes Hematology Hereditary diseases Internal Medicine Leukocytes (basophilic) Medicine Medicine & Public Health Oncology Promoters
title	A semi-supervised ensemble clustering algorithm for discovering relationships between different diseases by extracting cell-to-cell biological communications
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A44%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_C6C&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20semi-supervised%20ensemble%20clustering%20algorithm%20for%20discovering%20relationships%20between%20different%20diseases%20by%20extracting%20cell-to-cell%20biological%20communications&rft.jtitle=Journal%20of%20cancer%20research%20and%20clinical%20oncology&rft.au=Shi,%20Xiuchao&rft.date=2024-01-01&rft.volume=150&rft.issue=1&rft.spage=3&rft.epage=3&rft.pages=3-3&rft.artnum=3&rft.issn=0171-5216&rft.eissn=1432-1335&rft_id=info:doi/10.1007/s00432-023-05559-4&rft_dat=%3Cproquest_C6C%3E3153571721%3C/proquest_C6C%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2923531130&rft_id=info:pmid/38168012&rfr_iscdi=true