Securing a Local Training Dataset Size in Federated Learning

Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal inf...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.104135-104143
Hauptverfasser:	Shin, Young Ah, Noh, Geontae, Jeong, Ik Rae, Chun, Ji Young
Format:	Artikel
Sprache:	eng
Schlagworte:	Clients Computational modeling Cryptography Data models Data privacy Datasets Electronic health records Federated learning Homomorphic encryption Iterative methods Machine learning Mathematical models Parameters Privacy privacy-preserving Security Servers Training data training dataset
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	104143
container_issue
container_start_page	104135
container_title	IEEE access
container_volume	10
creator	Shin, Young Ah Noh, Geontae Jeong, Ik Rae Chun, Ji Young
description	Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging ( \mathbf {PP-FedAvg} ) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.
doi_str_mv	10.1109/ACCESS.2022.3210702
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9905592</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9905592</ieee_id><doaj_id>oai_doaj_org_article_225c3606e0764fe6b008a905ab3d4285</doaj_id><sourcerecordid>2722549603</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</originalsourceid><addsrcrecordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d6WLIFvYQ0aQOGHpyexVpaB4fUTmXn0H595TqE7kXLMDM7GkLmjC4Yo_ppuVqti2LBKecLwRlNKb8hE86UjoUU6vbffk9mXXegYbIAyXRCngu0Z183-wiivLVwjHYe6mYAXqCHDvuoqH8wqptogw499OiiHMEPlAdyV8Gxw9nlnZKPzXq3eovz99ftapnHNqFZH2vFlBAWneUOWVUOqS3TlHEnEypRoGI6o2WiQSgpSltlSmjqqixjALYUU7IdfV0LB3Py9Sf4b9NCbf6A1u8N-L62RzScSysUVUhTlVSoyvBV0FRCKVzCMxm8Hkevk2-_ztj15tCefRPiG54GdaIVFYElRpb1bdd5rK5XGTVDfjO2bobWzaX1oJqPqhoRrwodzkvNxS--knqr</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2722549603</pqid></control><display><type>article</type><title>Securing a Local Training Dataset Size in Federated Learning</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</creator><creatorcontrib>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</creatorcontrib><description>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (<inline-formula> <tex-math notation="LaTeX">\mathbf {PP-FedAvg} </tex-math></inline-formula>) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3210702</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Clients ; Computational modeling ; Cryptography ; Data models ; Data privacy ; Datasets ; Electronic health records ; Federated learning ; Homomorphic encryption ; Iterative methods ; Machine learning ; Mathematical models ; Parameters ; Privacy ; privacy-preserving ; Security ; Servers ; Training data ; training dataset</subject><ispartof>IEEE access, 2022, Vol.10, p.104135-104143</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</citedby><cites>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</cites><orcidid>0000-0002-5329-8918 ; 0000-0001-7969-7143 ; 0000-0003-2547-7529 ; 0000-0002-4120-2165</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9905592$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Shin, Young Ah</creatorcontrib><creatorcontrib>Noh, Geontae</creatorcontrib><creatorcontrib>Jeong, Ik Rae</creatorcontrib><creatorcontrib>Chun, Ji Young</creatorcontrib><title>Securing a Local Training Dataset Size in Federated Learning</title><title>IEEE access</title><addtitle>Access</addtitle><description>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (<inline-formula> <tex-math notation="LaTeX">\mathbf {PP-FedAvg} </tex-math></inline-formula>) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</description><subject>Clients</subject><subject>Computational modeling</subject><subject>Cryptography</subject><subject>Data models</subject><subject>Data privacy</subject><subject>Datasets</subject><subject>Electronic health records</subject><subject>Federated learning</subject><subject>Homomorphic encryption</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Parameters</subject><subject>Privacy</subject><subject>privacy-preserving</subject><subject>Security</subject><subject>Servers</subject><subject>Training data</subject><subject>training dataset</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d6WLIFvYQ0aQOGHpyexVpaB4fUTmXn0H595TqE7kXLMDM7GkLmjC4Yo_ppuVqti2LBKecLwRlNKb8hE86UjoUU6vbffk9mXXegYbIAyXRCngu0Z183-wiivLVwjHYe6mYAXqCHDvuoqH8wqptogw499OiiHMEPlAdyV8Gxw9nlnZKPzXq3eovz99ftapnHNqFZH2vFlBAWneUOWVUOqS3TlHEnEypRoGI6o2WiQSgpSltlSmjqqixjALYUU7IdfV0LB3Py9Sf4b9NCbf6A1u8N-L62RzScSysUVUhTlVSoyvBV0FRCKVzCMxm8Hkevk2-_ztj15tCefRPiG54GdaIVFYElRpb1bdd5rK5XGTVDfjO2bobWzaX1oJqPqhoRrwodzkvNxS--knqr</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Shin, Young Ah</creator><creator>Noh, Geontae</creator><creator>Jeong, Ik Rae</creator><creator>Chun, Ji Young</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5329-8918</orcidid><orcidid>https://orcid.org/0000-0001-7969-7143</orcidid><orcidid>https://orcid.org/0000-0003-2547-7529</orcidid><orcidid>https://orcid.org/0000-0002-4120-2165</orcidid></search><sort><creationdate>2022</creationdate><title>Securing a Local Training Dataset Size in Federated Learning</title><author>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Clients</topic><topic>Computational modeling</topic><topic>Cryptography</topic><topic>Data models</topic><topic>Data privacy</topic><topic>Datasets</topic><topic>Electronic health records</topic><topic>Federated learning</topic><topic>Homomorphic encryption</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Parameters</topic><topic>Privacy</topic><topic>privacy-preserving</topic><topic>Security</topic><topic>Servers</topic><topic>Training data</topic><topic>training dataset</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shin, Young Ah</creatorcontrib><creatorcontrib>Noh, Geontae</creatorcontrib><creatorcontrib>Jeong, Ik Rae</creatorcontrib><creatorcontrib>Chun, Ji Young</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shin, Young Ah</au><au>Noh, Geontae</au><au>Jeong, Ik Rae</au><au>Chun, Ji Young</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Securing a Local Training Dataset Size in Federated Learning</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>104135</spage><epage>104143</epage><pages>104135-104143</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (<inline-formula> <tex-math notation="LaTeX">\mathbf {PP-FedAvg} </tex-math></inline-formula>) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3210702</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-5329-8918</orcidid><orcidid>https://orcid.org/0000-0001-7969-7143</orcidid><orcidid>https://orcid.org/0000-0003-2547-7529</orcidid><orcidid>https://orcid.org/0000-0002-4120-2165</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2022, Vol.10, p.104135-104143
issn	2169-3536 2169-3536
language	eng
recordid	cdi_ieee_primary_9905592
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Clients Computational modeling Cryptography Data models Data privacy Datasets Electronic health records Federated learning Homomorphic encryption Iterative methods Machine learning Mathematical models Parameters Privacy privacy-preserving Security Servers Training data training dataset
title	Securing a Local Training Dataset Size in Federated Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T08%3A37%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Securing%20a%20Local%20Training%20Dataset%20Size%20in%20Federated%20Learning&rft.jtitle=IEEE%20access&rft.au=Shin,%20Young%20Ah&rft.date=2022&rft.volume=10&rft.spage=104135&rft.epage=104143&rft.pages=104135-104143&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3210702&rft_dat=%3Cproquest_ieee_%3E2722549603%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2722549603&rft_id=info:pmid/&rft_ieee_id=9905592&rft_doaj_id=oai_doaj_org_article_225c3606e0764fe6b008a905ab3d4285&rfr_iscdi=true