Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence

[Display omitted] •Patients' data are horizontally separated by individual and vertically separated by type.•Confederated machine learning to model data both horizontally and vertically separated.•Real-world medical data from many locations across the country. A patient’s health information is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2022-10, Vol.134, p.104151-104151, Article 104151
Hauptverfasser:	Liu, Dianbo, Fox, Kathe, Weber, Griffin, Miller, Tim
Format:	Artikel
Sprache:	eng
Schlagworte:	Confederated machine learning Diagnosis Disease prediction Federated machine learning Healthcare insurance claims Lab results Medication
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	104151
container_issue
container_start_page	104151
container_title	Journal of biomedical informatics
container_volume	134
creator	Liu, Dianbo Fox, Kathe Weber, Griffin Miller, Tim
description	[Display omitted] •Patients' data are horizontally separated by individual and vertically separated by type.•Confederated machine learning to model data both horizontally and vertically separated.•Real-world medical data from many locations across the country. A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning”, which we aim to develop in this study. We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.
doi_str_mv	10.1016/j.jbi.2022.104151
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2694415110</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046422001630</els_id><sourcerecordid>2694415110</sourcerecordid><originalsourceid>FETCH-LOGICAL-c373t-3c8e2843c278b2f5a6804d1d9a82b4aa71ac8432a1871bd523a2ae8f4ef0fec3</originalsourceid><addsrcrecordid>eNp9UctOwzAQjBBIlMIHcPORAym282gKJ1TxqFSJA71bG3vTOkqcYruV8md8Hg5BcONk7-7MaHcmiq4ZnTHK8rt6Vpd6xinnoU5Zxk6iCcsSHtO0oKe__zw9jy6cqyllLMvySfS57EyFCi14VKRBsEabLdGG7BAav5Ng8Z5sLOjvfgtypw3-AdtOYePIwQ2F0k52xqActBR4IA73MEqXfRBV-qjVAZrbcer7PRIwimiFxmvfk6qzZA12i_G7hAZ_liCudx5bsjIem0Zv0Ui8jM4qaBxe_bzTaPP8tFm-xuu3l9XycR3LZJ74OJEF8iJNJJ8XJa8yyAuaKqYWUPAyBZgzkGHMgRVzVqqMJ8ABiyrFilYok2l0M8rubfdxQOdFG44MW4DB7uAEzxfpYDejAcpGqLSdcxYrsbe6BdsLRsUQkqhFCEkMIYkxpMB5GDnBRDxqtMJJPZyntA02CtXpf9hfcHKefQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2694415110</pqid></control><display><type>article</type><title>Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence</title><source>Elsevier ScienceDirect Journals Complete</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Liu, Dianbo ; Fox, Kathe ; Weber, Griffin ; Miller, Tim</creator><creatorcontrib>Liu, Dianbo ; Fox, Kathe ; Weber, Griffin ; Miller, Tim</creatorcontrib><description>[Display omitted] •Patients' data are horizontally separated by individual and vertically separated by type.•Confederated machine learning to model data both horizontally and vertically separated.•Real-world medical data from many locations across the country. A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning”, which we aim to develop in this study. We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2022.104151</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Confederated machine learning ; Diagnosis ; Disease prediction ; Federated machine learning ; Healthcare insurance claims ; Lab results ; Medication</subject><ispartof>Journal of biomedical informatics, 2022-10, Vol.134, p.104151-104151, Article 104151</ispartof><rights>2022 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c373t-3c8e2843c278b2f5a6804d1d9a82b4aa71ac8432a1871bd523a2ae8f4ef0fec3</citedby><cites>FETCH-LOGICAL-c373t-3c8e2843c278b2f5a6804d1d9a82b4aa71ac8432a1871bd523a2ae8f4ef0fec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jbi.2022.104151$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Liu, Dianbo</creatorcontrib><creatorcontrib>Fox, Kathe</creatorcontrib><creatorcontrib>Weber, Griffin</creatorcontrib><creatorcontrib>Miller, Tim</creatorcontrib><title>Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence</title><title>Journal of biomedical informatics</title><description>[Display omitted] •Patients' data are horizontally separated by individual and vertically separated by type.•Confederated machine learning to model data both horizontally and vertically separated.•Real-world medical data from many locations across the country. A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning”, which we aim to develop in this study. We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.</description><subject>Confederated machine learning</subject><subject>Diagnosis</subject><subject>Disease prediction</subject><subject>Federated machine learning</subject><subject>Healthcare insurance claims</subject><subject>Lab results</subject><subject>Medication</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9UctOwzAQjBBIlMIHcPORAym282gKJ1TxqFSJA71bG3vTOkqcYruV8md8Hg5BcONk7-7MaHcmiq4ZnTHK8rt6Vpd6xinnoU5Zxk6iCcsSHtO0oKe__zw9jy6cqyllLMvySfS57EyFCi14VKRBsEabLdGG7BAav5Ng8Z5sLOjvfgtypw3-AdtOYePIwQ2F0k52xqActBR4IA73MEqXfRBV-qjVAZrbcer7PRIwimiFxmvfk6qzZA12i_G7hAZ_liCudx5bsjIem0Zv0Ui8jM4qaBxe_bzTaPP8tFm-xuu3l9XycR3LZJ74OJEF8iJNJJ8XJa8yyAuaKqYWUPAyBZgzkGHMgRVzVqqMJ8ABiyrFilYok2l0M8rubfdxQOdFG44MW4DB7uAEzxfpYDejAcpGqLSdcxYrsbe6BdsLRsUQkqhFCEkMIYkxpMB5GDnBRDxqtMJJPZyntA02CtXpf9hfcHKefQ</recordid><startdate>202210</startdate><enddate>202210</enddate><creator>Liu, Dianbo</creator><creator>Fox, Kathe</creator><creator>Weber, Griffin</creator><creator>Miller, Tim</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>202210</creationdate><title>Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence</title><author>Liu, Dianbo ; Fox, Kathe ; Weber, Griffin ; Miller, Tim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c373t-3c8e2843c278b2f5a6804d1d9a82b4aa71ac8432a1871bd523a2ae8f4ef0fec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Confederated machine learning</topic><topic>Diagnosis</topic><topic>Disease prediction</topic><topic>Federated machine learning</topic><topic>Healthcare insurance claims</topic><topic>Lab results</topic><topic>Medication</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Dianbo</creatorcontrib><creatorcontrib>Fox, Kathe</creatorcontrib><creatorcontrib>Weber, Griffin</creatorcontrib><creatorcontrib>Miller, Tim</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Dianbo</au><au>Fox, Kathe</au><au>Weber, Griffin</au><au>Miller, Tim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence</atitle><jtitle>Journal of biomedical informatics</jtitle><date>2022-10</date><risdate>2022</risdate><volume>134</volume><spage>104151</spage><epage>104151</epage><pages>104151-104151</pages><artnum>104151</artnum><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •Patients' data are horizontally separated by individual and vertically separated by type.•Confederated machine learning to model data both horizontally and vertically separated.•Real-world medical data from many locations across the country. A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning”, which we aim to develop in this study. We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.jbi.2022.104151</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-0464
ispartof	Journal of biomedical informatics, 2022-10, Vol.134, p.104151-104151, Article 104151
issn	1532-0464 1532-0480
language	eng
recordid	cdi_proquest_miscellaneous_2694415110
source	Elsevier ScienceDirect Journals Complete; EZB-FREE-00999 freely available EZB journals
subjects	Confederated machine learning Diagnosis Disease prediction Federated machine learning Healthcare insurance claims Lab results Medication
title	Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T03%3A19%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Confederated%20learning%20in%20healthcare:%20Training%20machine%20learning%20models%20using%20disconnected%20data%20separated%20by%20individual,%20data%20type%20and%20identity%20for%20Large-Scale%20health%20system%20Intelligence&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Liu,%20Dianbo&rft.date=2022-10&rft.volume=134&rft.spage=104151&rft.epage=104151&rft.pages=104151-104151&rft.artnum=104151&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2022.104151&rft_dat=%3Cproquest_cross%3E2694415110%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2694415110&rft_id=info:pmid/&rft_els_id=S1532046422001630&rfr_iscdi=true