dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning

MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learnin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2022-10, Vol.38 (21), p.4919-4926
Hauptverfasser: Cao, Han, Zhang, Youcheng, Baumbach, Jan, Burton, Paul R, Dwyer, Dominic, Koutsouleris, Nikolaos, Matschinske, Julian, Marcon, Yannick, Rajan, Sivanesan, Rieg, Thilo, Ryser-Welch, Patricia, Späth, Julian, Herrmann, Carl, Schwarz, Emanuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4926
container_issue 21
container_start_page 4919
container_title Bioinformatics (Oxford, England)
container_volume 38
creator Cao, Han
Zhang, Youcheng
Baumbach, Jan
Burton, Paul R
Dwyer, Dominic
Koutsouleris, Nikolaos
Matschinske, Julian
Marcon, Yannick
Rajan, Sivanesan
Rieg, Thilo
Ryser-Welch, Patricia
Späth, Julian
Herrmann, Carl
Schwarz, Emanuel
description MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n 
doi_str_mv 10.1093/bioinformatics/btac616
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9620828</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2711844230</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</originalsourceid><addsrcrecordid>eNpVkV9LwzAUxYMoTqdfQfLog3VJ06adD4IM_8HEl_kcb9ObLa5tZpJO9u2tOASf7oF7-J0Dh5ALzq45m4pJZZ3tjPMtRKvDpIqgJZcH5IQLWSRZyfnhn2ZiRE5D-GCM5SyXx2QkJCvElPMT8l6Hl8X8hgLVrt30ccC5DhpqPLT45fyaDiF04-0W9C7ZeAzot7ZbXtHahuht1Uesads30SYRwpq2oFe2Q9og-G4wnpEjA03A8_0dk7eH-8XsKZm_Pj7P7uaJHorEpJbcGEgrYBLM0C2XORhR5TWkKS-nOWpjUtBFxllVFmhKIRFNzipW8wLzTIzJ7S9301ct1hq76KFRQ_MW_E45sOr_p7MrtXRbNZUpK9NyAFzuAd599hiiam3Q2DTQoeuDSgvOyyxLBRus8teqvQvBo_mL4Uz9zKP-z6P284hvaMyL-A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2711844230</pqid></control><display><type>article</type><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><source>Oxford Journals Open Access Collection</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel</creator><creatorcontrib>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel ; The COMMITMENT Consortium</creatorcontrib><description>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n &lt; 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btac616</identifier><identifier>PMID: 36073911</identifier><language>eng</language><publisher>Oxford University Press</publisher><subject>Original Papers</subject><ispartof>Bioinformatics (Oxford, England), 2022-10, Vol.38 (21), p.4919-4926</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</citedby><cites>FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</cites><orcidid>0000-0002-5998-1363 ; 0000-0001-5799-9634 ; 0000-0003-4989-4722 ; 0000-0002-5226-218X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9620828/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9620828/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids></links><search><creatorcontrib>Cao, Han</creatorcontrib><creatorcontrib>Zhang, Youcheng</creatorcontrib><creatorcontrib>Baumbach, Jan</creatorcontrib><creatorcontrib>Burton, Paul R</creatorcontrib><creatorcontrib>Dwyer, Dominic</creatorcontrib><creatorcontrib>Koutsouleris, Nikolaos</creatorcontrib><creatorcontrib>Matschinske, Julian</creatorcontrib><creatorcontrib>Marcon, Yannick</creatorcontrib><creatorcontrib>Rajan, Sivanesan</creatorcontrib><creatorcontrib>Rieg, Thilo</creatorcontrib><creatorcontrib>Ryser-Welch, Patricia</creatorcontrib><creatorcontrib>Späth, Julian</creatorcontrib><creatorcontrib>Herrmann, Carl</creatorcontrib><creatorcontrib>Schwarz, Emanuel</creatorcontrib><creatorcontrib>The COMMITMENT Consortium</creatorcontrib><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><title>Bioinformatics (Oxford, England)</title><description>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n &lt; 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</description><subject>Original Papers</subject><issn>1367-4803</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpVkV9LwzAUxYMoTqdfQfLog3VJ06adD4IM_8HEl_kcb9ObLa5tZpJO9u2tOASf7oF7-J0Dh5ALzq45m4pJZZ3tjPMtRKvDpIqgJZcH5IQLWSRZyfnhn2ZiRE5D-GCM5SyXx2QkJCvElPMT8l6Hl8X8hgLVrt30ccC5DhpqPLT45fyaDiF04-0W9C7ZeAzot7ZbXtHahuht1Uesads30SYRwpq2oFe2Q9og-G4wnpEjA03A8_0dk7eH-8XsKZm_Pj7P7uaJHorEpJbcGEgrYBLM0C2XORhR5TWkKS-nOWpjUtBFxllVFmhKIRFNzipW8wLzTIzJ7S9301ct1hq76KFRQ_MW_E45sOr_p7MrtXRbNZUpK9NyAFzuAd599hiiam3Q2DTQoeuDSgvOyyxLBRus8teqvQvBo_mL4Uz9zKP-z6P284hvaMyL-A</recordid><startdate>20221031</startdate><enddate>20221031</enddate><creator>Cao, Han</creator><creator>Zhang, Youcheng</creator><creator>Baumbach, Jan</creator><creator>Burton, Paul R</creator><creator>Dwyer, Dominic</creator><creator>Koutsouleris, Nikolaos</creator><creator>Matschinske, Julian</creator><creator>Marcon, Yannick</creator><creator>Rajan, Sivanesan</creator><creator>Rieg, Thilo</creator><creator>Ryser-Welch, Patricia</creator><creator>Späth, Julian</creator><creator>Herrmann, Carl</creator><creator>Schwarz, Emanuel</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5998-1363</orcidid><orcidid>https://orcid.org/0000-0001-5799-9634</orcidid><orcidid>https://orcid.org/0000-0003-4989-4722</orcidid><orcidid>https://orcid.org/0000-0002-5226-218X</orcidid></search><sort><creationdate>20221031</creationdate><title>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</title><author>Cao, Han ; Zhang, Youcheng ; Baumbach, Jan ; Burton, Paul R ; Dwyer, Dominic ; Koutsouleris, Nikolaos ; Matschinske, Julian ; Marcon, Yannick ; Rajan, Sivanesan ; Rieg, Thilo ; Ryser-Welch, Patricia ; Späth, Julian ; Herrmann, Carl ; Schwarz, Emanuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-d61ffa2ba06af739565af3b5da221895ecff2ac7410b87ef836eef50b0d17e543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Original Papers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Han</creatorcontrib><creatorcontrib>Zhang, Youcheng</creatorcontrib><creatorcontrib>Baumbach, Jan</creatorcontrib><creatorcontrib>Burton, Paul R</creatorcontrib><creatorcontrib>Dwyer, Dominic</creatorcontrib><creatorcontrib>Koutsouleris, Nikolaos</creatorcontrib><creatorcontrib>Matschinske, Julian</creatorcontrib><creatorcontrib>Marcon, Yannick</creatorcontrib><creatorcontrib>Rajan, Sivanesan</creatorcontrib><creatorcontrib>Rieg, Thilo</creatorcontrib><creatorcontrib>Ryser-Welch, Patricia</creatorcontrib><creatorcontrib>Späth, Julian</creatorcontrib><creatorcontrib>Herrmann, Carl</creatorcontrib><creatorcontrib>Schwarz, Emanuel</creatorcontrib><creatorcontrib>The COMMITMENT Consortium</creatorcontrib><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Han</au><au>Zhang, Youcheng</au><au>Baumbach, Jan</au><au>Burton, Paul R</au><au>Dwyer, Dominic</au><au>Koutsouleris, Nikolaos</au><au>Matschinske, Julian</au><au>Marcon, Yannick</au><au>Rajan, Sivanesan</au><au>Rieg, Thilo</au><au>Ryser-Welch, Patricia</au><au>Späth, Julian</au><au>Herrmann, Carl</au><au>Schwarz, Emanuel</au><aucorp>The COMMITMENT Consortium</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><date>2022-10-31</date><risdate>2022</risdate><volume>38</volume><issue>21</issue><spage>4919</spage><epage>4926</epage><pages>4919-4926</pages><issn>1367-4803</issn><eissn>1367-4811</eissn><abstract>MOTIVATIONIn multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTSHere, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n &lt; 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATIONdsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.</abstract><pub>Oxford University Press</pub><pmid>36073911</pmid><doi>10.1093/bioinformatics/btac616</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-5998-1363</orcidid><orcidid>https://orcid.org/0000-0001-5799-9634</orcidid><orcidid>https://orcid.org/0000-0003-4989-4722</orcidid><orcidid>https://orcid.org/0000-0002-5226-218X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics (Oxford, England), 2022-10, Vol.38 (21), p.4919-4926
issn 1367-4803
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9620828
source Oxford Journals Open Access Collection; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects Original Papers
title dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T12%3A08%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=dsMTL:%20a%20computational%20framework%20for%20privacy-preserving,%20distributed%20multi-task%20machine%20learning&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Cao,%20Han&rft.aucorp=The%20COMMITMENT%20Consortium&rft.date=2022-10-31&rft.volume=38&rft.issue=21&rft.spage=4919&rft.epage=4926&rft.pages=4919-4926&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btac616&rft_dat=%3Cproquest_pubme%3E2711844230%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2711844230&rft_id=info:pmid/36073911&rfr_iscdi=true