Securing a Local Training Dataset Size in Federated Learning

Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal inf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2022, Vol.10, p.104135-104143
Hauptverfasser: Shin, Young Ah, Noh, Geontae, Jeong, Ik Rae, Chun, Ji Young
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 104143
container_issue
container_start_page 104135
container_title IEEE access
container_volume 10
creator Shin, Young Ah
Noh, Geontae
Jeong, Ik Rae
Chun, Ji Young
description Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging ( \mathbf {PP-FedAvg} ) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.
doi_str_mv 10.1109/ACCESS.2022.3210702
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9905592</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9905592</ieee_id><doaj_id>oai_doaj_org_article_225c3606e0764fe6b008a905ab3d4285</doaj_id><sourcerecordid>2722549603</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</originalsourceid><addsrcrecordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d6WLIFvYQ0aQOGHpyexVpaB4fUTmXn0H595TqE7kXLMDM7GkLmjC4Yo_ppuVqti2LBKecLwRlNKb8hE86UjoUU6vbffk9mXXegYbIAyXRCngu0Z183-wiivLVwjHYe6mYAXqCHDvuoqH8wqptogw499OiiHMEPlAdyV8Gxw9nlnZKPzXq3eovz99ftapnHNqFZH2vFlBAWneUOWVUOqS3TlHEnEypRoGI6o2WiQSgpSltlSmjqqixjALYUU7IdfV0LB3Py9Sf4b9NCbf6A1u8N-L62RzScSysUVUhTlVSoyvBV0FRCKVzCMxm8Hkevk2-_ztj15tCefRPiG54GdaIVFYElRpb1bdd5rK5XGTVDfjO2bobWzaX1oJqPqhoRrwodzkvNxS--knqr</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2722549603</pqid></control><display><type>article</type><title>Securing a Local Training Dataset Size in Federated Learning</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</creator><creatorcontrib>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</creatorcontrib><description>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (&lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;\mathbf {PP-FedAvg} &lt;/tex-math&gt;&lt;/inline-formula&gt;) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3210702</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Clients ; Computational modeling ; Cryptography ; Data models ; Data privacy ; Datasets ; Electronic health records ; Federated learning ; Homomorphic encryption ; Iterative methods ; Machine learning ; Mathematical models ; Parameters ; Privacy ; privacy-preserving ; Security ; Servers ; Training data ; training dataset</subject><ispartof>IEEE access, 2022, Vol.10, p.104135-104143</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</citedby><cites>FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</cites><orcidid>0000-0002-5329-8918 ; 0000-0001-7969-7143 ; 0000-0003-2547-7529 ; 0000-0002-4120-2165</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9905592$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Shin, Young Ah</creatorcontrib><creatorcontrib>Noh, Geontae</creatorcontrib><creatorcontrib>Jeong, Ik Rae</creatorcontrib><creatorcontrib>Chun, Ji Young</creatorcontrib><title>Securing a Local Training Dataset Size in Federated Learning</title><title>IEEE access</title><addtitle>Access</addtitle><description>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (&lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;\mathbf {PP-FedAvg} &lt;/tex-math&gt;&lt;/inline-formula&gt;) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</description><subject>Clients</subject><subject>Computational modeling</subject><subject>Cryptography</subject><subject>Data models</subject><subject>Data privacy</subject><subject>Datasets</subject><subject>Electronic health records</subject><subject>Federated learning</subject><subject>Homomorphic encryption</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Parameters</subject><subject>Privacy</subject><subject>privacy-preserving</subject><subject>Security</subject><subject>Servers</subject><subject>Training data</subject><subject>training dataset</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtqwzAQFKWFhjRfkIuhZ6d6WLIFvYQ0aQOGHpyexVpaB4fUTmXn0H595TqE7kXLMDM7GkLmjC4Yo_ppuVqti2LBKecLwRlNKb8hE86UjoUU6vbffk9mXXegYbIAyXRCngu0Z183-wiivLVwjHYe6mYAXqCHDvuoqH8wqptogw499OiiHMEPlAdyV8Gxw9nlnZKPzXq3eovz99ftapnHNqFZH2vFlBAWneUOWVUOqS3TlHEnEypRoGI6o2WiQSgpSltlSmjqqixjALYUU7IdfV0LB3Py9Sf4b9NCbf6A1u8N-L62RzScSysUVUhTlVSoyvBV0FRCKVzCMxm8Hkevk2-_ztj15tCefRPiG54GdaIVFYElRpb1bdd5rK5XGTVDfjO2bobWzaX1oJqPqhoRrwodzkvNxS--knqr</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Shin, Young Ah</creator><creator>Noh, Geontae</creator><creator>Jeong, Ik Rae</creator><creator>Chun, Ji Young</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5329-8918</orcidid><orcidid>https://orcid.org/0000-0001-7969-7143</orcidid><orcidid>https://orcid.org/0000-0003-2547-7529</orcidid><orcidid>https://orcid.org/0000-0002-4120-2165</orcidid></search><sort><creationdate>2022</creationdate><title>Securing a Local Training Dataset Size in Federated Learning</title><author>Shin, Young Ah ; Noh, Geontae ; Jeong, Ik Rae ; Chun, Ji Young</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-961633cedc2de1fb1109c19012d5405e3e61980b49a3653bcf86390df881aacb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Clients</topic><topic>Computational modeling</topic><topic>Cryptography</topic><topic>Data models</topic><topic>Data privacy</topic><topic>Datasets</topic><topic>Electronic health records</topic><topic>Federated learning</topic><topic>Homomorphic encryption</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Parameters</topic><topic>Privacy</topic><topic>privacy-preserving</topic><topic>Security</topic><topic>Servers</topic><topic>Training data</topic><topic>training dataset</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shin, Young Ah</creatorcontrib><creatorcontrib>Noh, Geontae</creatorcontrib><creatorcontrib>Jeong, Ik Rae</creatorcontrib><creatorcontrib>Chun, Ji Young</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shin, Young Ah</au><au>Noh, Geontae</au><au>Jeong, Ik Rae</au><au>Chun, Ji Young</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Securing a Local Training Dataset Size in Federated Learning</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>104135</spage><epage>104143</epage><pages>104135-104143</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging (&lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;\mathbf {PP-FedAvg} &lt;/tex-math&gt;&lt;/inline-formula&gt;) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3210702</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-5329-8918</orcidid><orcidid>https://orcid.org/0000-0001-7969-7143</orcidid><orcidid>https://orcid.org/0000-0003-2547-7529</orcidid><orcidid>https://orcid.org/0000-0002-4120-2165</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2022, Vol.10, p.104135-104143
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_9905592
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Clients
Computational modeling
Cryptography
Data models
Data privacy
Datasets
Electronic health records
Federated learning
Homomorphic encryption
Iterative methods
Machine learning
Mathematical models
Parameters
Privacy
privacy-preserving
Security
Servers
Training data
training dataset
title Securing a Local Training Dataset Size in Federated Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T08%3A37%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Securing%20a%20Local%20Training%20Dataset%20Size%20in%20Federated%20Learning&rft.jtitle=IEEE%20access&rft.au=Shin,%20Young%20Ah&rft.date=2022&rft.volume=10&rft.spage=104135&rft.epage=104143&rft.pages=104135-104143&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3210702&rft_dat=%3Cproquest_ieee_%3E2722549603%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2722549603&rft_id=info:pmid/&rft_ieee_id=9905592&rft_doaj_id=oai_doaj_org_article_225c3606e0764fe6b008a905ab3d4285&rfr_iscdi=true