dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning

MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learnin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics (Oxford, England) England), 2022-10, Vol.38 (21), p.4919-4926
Hauptverfasser:	Cao, Han, Zhang, Youcheng, Baumbach, Jan, Burton, Paul R, Dwyer, Dominic, Koutsouleris, Nikolaos, Matschinske, Julian, Marcon, Yannick, Rajan, Sivanesan, Rieg, Thilo, Ryser-Welch, Patricia, Späth, Julian, Herrmann, Carl, Schwarz, Emanuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Original Papers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4926
container_issue	21
container_start_page	4919
container_title	Bioinformatics (Oxford, England)
container_volume	38
creator	Cao, Han Zhang, Youcheng Baumbach, Jan Burton, Paul R Dwyer, Dominic Koutsouleris, Nikolaos Matschinske, Julian Marcon, Yannick Rajan, Sivanesan Rieg, Thilo Ryser-Welch, Patricia Späth, Julian Herrmann, Carl Schwarz, Emanuel
description	MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n
doi_str_mv	10.1093/bioinformatics/btac616
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9620828</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711844230</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</originalsourceid><addsrcrecordid>eNpVkV9LwzAUxYMoTqdfQfLog3VJ06adD4IM_8HEl_kcb9ObLa5tZpJO9u2tOASf7oF7-J0Dh5ALzq45m4pJZZ3tjPMtRKvDpIqgJZcH5IQLWSRZyfnhn2ZiRE5D-GCM5SyXx2QkJCvElPMT8l6Hl8X8hgLVrt30ccC5DhpqPLT45fyaDiF04-0W9C7ZeAzot7ZbXtHahuht1Uesads30SYRwpq2oFe2Q9og-G4wnpEjA03A8_0dk7eH-8XsKZm_Pj7P7uaJHorEpJbcGEgrYBLM0C2XORhR5TWkKS-nOWpjUtBFxllVFmhKIRFNzipW8wLzTIzJ7S9301ct1hq76KFRQ_MW_E45sOr_p7MrtXRbNZUpK9NyAFzuAd599hiiam3Q2DTQoeuDSgvOyyxLBRus8teqvQvBo_mL4Uz9zKP-z6P284hvaMyL-A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711844230</pqid></control><display><type>article</type><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><source>Oxford Journals Open Access Collection</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel</creator><creatorcontrib>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel ; The COMMITMENT Consortium</creatorcontrib><description>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btac616</identifier><identifier>PMID: 36073911</identifier><language>eng</language><publisher>Oxford University Press</publisher><subject>Original Papers</subject><ispartof>Bioinformatics (Oxford, England), 2022-10, Vol.38 (21), p.4919-4926</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</citedby><cites>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</cites><orcidid>0000-0002-5998-1363 ; 0000-0001-5799-9634 ; 0000-0003-4989-4722 ; 0000-0002-5226-218X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9620828/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9620828/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids></links><search><creatorcontrib>Cao, Han</creatorcontrib><creatorcontrib>Zhang, Youcheng</creatorcontrib><creatorcontrib>Baumbach, Jan</creatorcontrib><creatorcontrib>Burton, Paul R</creatorcontrib><creatorcontrib>Dwyer, Dominic</creatorcontrib><creatorcontrib>Koutsouleris, Nikolaos</creatorcontrib><creatorcontrib>Matschinske, Julian</creatorcontrib><creatorcontrib>Marcon, Yannick</creatorcontrib><creatorcontrib>Rajan, Sivanesan</creatorcontrib><creatorcontrib>Rieg, Thilo</creatorcontrib><creatorcontrib>Ryser-Welch, Patricia</creatorcontrib><creatorcontrib>Späth, Julian</creatorcontrib><creatorcontrib>Herrmann, Carl</creatorcontrib><creatorcontrib>Schwarz, Emanuel</creatorcontrib><creatorcontrib>The COMMITMENT Consortium</creatorcontrib><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><title>Bioinformatics (Oxford, England)</title><description>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</description><subject>Original Papers</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpVkV9LwzAUxYMoTqdfQfLog3VJ06adD4IM_8HEl_kcb9ObLa5tZpJO9u2tOASf7oF7-J0Dh5ALzq45m4pJZZ3tjPMtRKvDpIqgJZcH5IQLWSRZyfnhn2ZiRE5D-GCM5SyXx2QkJCvElPMT8l6Hl8X8hgLVrt30ccC5DhpqPLT45fyaDiF04-0W9C7ZeAzot7ZbXtHahuht1Uesads30SYRwpq2oFe2Q9og-G4wnpEjA03A8_0dk7eH-8XsKZm_Pj7P7uaJHorEpJbcGEgrYBLM0C2XORhR5TWkKS-nOWpjUtBFxllVFmhKIRFNzipW8wLzTIzJ7S9301ct1hq76KFRQ_MW_E45sOr_p7MrtXRbNZUpK9NyAFzuAd599hiiam3Q2DTQoeuDSgvOyyxLBRus8teqvQvBo_mL4Uz9zKP-z6P284hvaMyL-A</recordid><startdate>20221031</startdate><enddate>20221031</enddate><creator>Cao, Han</creator><creator>Zhang, Youcheng</creator><creator>Baumbach, Jan</creator><creator>Burton, Paul R</creator><creator>Dwyer, Dominic</creator><creator>Koutsouleris, Nikolaos</creator><creator>Matschinske, Julian</creator><creator>Marcon, Yannick</creator><creator>Rajan, Sivanesan</creator><creator>Rieg, Thilo</creator><creator>Ryser-Welch, Patricia</creator><creator>Späth, Julian</creator><creator>Herrmann, Carl</creator><creator>Schwarz, Emanuel</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5998-1363</orcidid><orcidid>https://orcid.org/0000-0001-5799-9634</orcidid><orcidid>https://orcid.org/0000-0003-4989-4722</orcidid><orcidid>https://orcid.org/0000-0002-5226-218X</orcidid></search><sort><creationdate>20221031</creationdate><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><author>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Original Papers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Han</creatorcontrib><creatorcontrib>Zhang, Youcheng</creatorcontrib><creatorcontrib>Baumbach, Jan</creatorcontrib><creatorcontrib>Burton, Paul R</creatorcontrib><creatorcontrib>Dwyer, Dominic</creatorcontrib><creatorcontrib>Koutsouleris, Nikolaos</creatorcontrib><creatorcontrib>Matschinske, Julian</creatorcontrib><creatorcontrib>Marcon, Yannick</creatorcontrib><creatorcontrib>Rajan, Sivanesan</creatorcontrib><creatorcontrib>Rieg, Thilo</creatorcontrib><creatorcontrib>Ryser-Welch, Patricia</creatorcontrib><creatorcontrib>Späth, Julian</creatorcontrib><creatorcontrib>Herrmann, Carl</creatorcontrib><creatorcontrib>Schwarz, Emanuel</creatorcontrib><creatorcontrib>The COMMITMENT Consortium</creatorcontrib><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Han</au><au>Zhang, Youcheng</au><au>Baumbach, Jan</au><au>Burton, Paul R</au><au>Dwyer, Dominic</au><au>Koutsouleris, Nikolaos</au><au>Matschinske, Julian</au><au>Marcon, Yannick</au><au>Rajan, Sivanesan</au><au>Rieg, Thilo</au><au>Ryser-Welch, Patricia</au><au>Späth, Julian</au><au>Herrmann, Carl</au><au>Schwarz, Emanuel</au><aucorp>The COMMITMENT Consortium</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><date>2022-10-31</date><risdate>2022</risdate><volume>38</volume><issue>21</issue><spage>4919</spage><epage>4926</epage><pages>4919-4926</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</abstract><pub>Oxford University Press</pub><pmid>36073911</pmid><doi>10.1093/bioinformatics/btac616</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-5998-1363</orcidid><orcidid>https://orcid.org/0000-0001-5799-9634</orcidid><orcidid>https://orcid.org/0000-0003-4989-4722</orcidid><orcidid>https://orcid.org/0000-0002-5226-218X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics (Oxford, England), 2022-10, Vol.38 (21), p.4919-4926
issn	1367-4803 1367-4811
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9620828
source	Oxford Journals Open Access Collection; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects	Original Papers
title	dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T12%3A08%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=dsMTL:%20a%20computational%20framework%20for%20privacy-preserving,%20distributed%20multi-task%20machine%20learning&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Cao,%20Han&rft.aucorp=The%20COMMITMENT%20Consortium&rft.date=2022-10-31&rft.volume=38&rft.issue=21&rft.spage=4919&rft.epage=4926&rft.pages=4919-4926&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btac616&rft_dat=%3Cproquest_pubme%3E2711844230%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2711844230&rft_id=info:pmid/36073911&rfr_iscdi=true