ViP: A Differentially Private Foundation Model for Computer Vision
Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Yu, Yaodong Sanjabi, Maziar Ma, Yi Chaudhuri, Kamalika Guo, Chuan |
description | Artificial intelligence (AI) has seen a tremendous surge in capabilities
thanks to the use of foundation models trained on internet-scale data. On the
flip side, the uncurated nature of internet-scale data also poses significant
privacy and legal risks, as they often contain personal information or
copyrighted material that should not be trained on without permission. In this
work, we propose as a mitigation measure a recipe to train foundation vision
models with differential privacy (DP) guarantee. We identify masked
autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and
train ViP -- a Vision transformer with differential Privacy -- under a strict
privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the
quality of representation learned by ViP using standard downstream vision
tasks; in particular, ViP achieves a (non-private) linear probing accuracy of
$55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained
and evaluated on ImageNet). Our result suggests that scaling to internet-scale
data can be practical for private learning. Code is available at
\url{https://github.com/facebookresearch/ViP-MAE}. |
doi_str_mv | 10.48550/arxiv.2306.08842 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_08842</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_08842</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-32d5167a4871ef32b11bab75f63b6696e57af544b31bfc6e38ecc18c2f684aef3</originalsourceid><addsrcrecordid>eNotz71uwjAUBWAvDBXtA3SqXyBp_HftdoO0lEpUZUCs0XVyLVkKMTIBlbcvhU5nODpH-hh7FFWpnTHVM-afeCqlqqCsnNPyjs23cf3KZ_wthkCZhjFi35_5OscTjsQX6Th0OMY08K_UUc9DyrxOu_1xpMy38XBp7tkkYH-gh_-css3ifVMvi9X3x2c9WxUIVhZKdkaARe2soKCkF8KjtyaA8gAvQMZiMFp7JXxogZSjthWulQGcxstiyp5ut1dEs89xh_nc_GGaK0b9AlAuRF8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ViP: A Differentially Private Foundation Model for Computer Vision</title><source>arXiv.org</source><creator>Yu, Yaodong ; Sanjabi, Maziar ; Ma, Yi ; Chaudhuri, Kamalika ; Guo, Chuan</creator><creatorcontrib>Yu, Yaodong ; Sanjabi, Maziar ; Ma, Yi ; Chaudhuri, Kamalika ; Guo, Chuan</creatorcontrib><description>Artificial intelligence (AI) has seen a tremendous surge in capabilities
thanks to the use of foundation models trained on internet-scale data. On the
flip side, the uncurated nature of internet-scale data also poses significant
privacy and legal risks, as they often contain personal information or
copyrighted material that should not be trained on without permission. In this
work, we propose as a mitigation measure a recipe to train foundation vision
models with differential privacy (DP) guarantee. We identify masked
autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and
train ViP -- a Vision transformer with differential Privacy -- under a strict
privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the
quality of representation learned by ViP using standard downstream vision
tasks; in particular, ViP achieves a (non-private) linear probing accuracy of
$55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained
and evaluated on ImageNet). Our result suggests that scaling to internet-scale
data can be practical for private learning. Code is available at
\url{https://github.com/facebookresearch/ViP-MAE}.</description><identifier>DOI: 10.48550/arxiv.2306.08842</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Cryptography and Security ; Computer Science - Learning</subject><creationdate>2023-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.08842$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.08842$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Yaodong</creatorcontrib><creatorcontrib>Sanjabi, Maziar</creatorcontrib><creatorcontrib>Ma, Yi</creatorcontrib><creatorcontrib>Chaudhuri, Kamalika</creatorcontrib><creatorcontrib>Guo, Chuan</creatorcontrib><title>ViP: A Differentially Private Foundation Model for Computer Vision</title><description>Artificial intelligence (AI) has seen a tremendous surge in capabilities
thanks to the use of foundation models trained on internet-scale data. On the
flip side, the uncurated nature of internet-scale data also poses significant
privacy and legal risks, as they often contain personal information or
copyrighted material that should not be trained on without permission. In this
work, we propose as a mitigation measure a recipe to train foundation vision
models with differential privacy (DP) guarantee. We identify masked
autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and
train ViP -- a Vision transformer with differential Privacy -- under a strict
privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the
quality of representation learned by ViP using standard downstream vision
tasks; in particular, ViP achieves a (non-private) linear probing accuracy of
$55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained
and evaluated on ImageNet). Our result suggests that scaling to internet-scale
data can be practical for private learning. Code is available at
\url{https://github.com/facebookresearch/ViP-MAE}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71uwjAUBWAvDBXtA3SqXyBp_HftdoO0lEpUZUCs0XVyLVkKMTIBlbcvhU5nODpH-hh7FFWpnTHVM-afeCqlqqCsnNPyjs23cf3KZ_wthkCZhjFi35_5OscTjsQX6Th0OMY08K_UUc9DyrxOu_1xpMy38XBp7tkkYH-gh_-css3ifVMvi9X3x2c9WxUIVhZKdkaARe2soKCkF8KjtyaA8gAvQMZiMFp7JXxogZSjthWulQGcxstiyp5ut1dEs89xh_nc_GGaK0b9AlAuRF8</recordid><startdate>20230615</startdate><enddate>20230615</enddate><creator>Yu, Yaodong</creator><creator>Sanjabi, Maziar</creator><creator>Ma, Yi</creator><creator>Chaudhuri, Kamalika</creator><creator>Guo, Chuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230615</creationdate><title>ViP: A Differentially Private Foundation Model for Computer Vision</title><author>Yu, Yaodong ; Sanjabi, Maziar ; Ma, Yi ; Chaudhuri, Kamalika ; Guo, Chuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-32d5167a4871ef32b11bab75f63b6696e57af544b31bfc6e38ecc18c2f684aef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Yaodong</creatorcontrib><creatorcontrib>Sanjabi, Maziar</creatorcontrib><creatorcontrib>Ma, Yi</creatorcontrib><creatorcontrib>Chaudhuri, Kamalika</creatorcontrib><creatorcontrib>Guo, Chuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Yaodong</au><au>Sanjabi, Maziar</au><au>Ma, Yi</au><au>Chaudhuri, Kamalika</au><au>Guo, Chuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ViP: A Differentially Private Foundation Model for Computer Vision</atitle><date>2023-06-15</date><risdate>2023</risdate><abstract>Artificial intelligence (AI) has seen a tremendous surge in capabilities
thanks to the use of foundation models trained on internet-scale data. On the
flip side, the uncurated nature of internet-scale data also poses significant
privacy and legal risks, as they often contain personal information or
copyrighted material that should not be trained on without permission. In this
work, we propose as a mitigation measure a recipe to train foundation vision
models with differential privacy (DP) guarantee. We identify masked
autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and
train ViP -- a Vision transformer with differential Privacy -- under a strict
privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the
quality of representation learned by ViP using standard downstream vision
tasks; in particular, ViP achieves a (non-private) linear probing accuracy of
$55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained
and evaluated on ImageNet). Our result suggests that scaling to internet-scale
data can be practical for private learning. Code is available at
\url{https://github.com/facebookresearch/ViP-MAE}.</abstract><doi>10.48550/arxiv.2306.08842</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2306.08842 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2306_08842 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Computer Science - Cryptography and Security Computer Science - Learning |
title | ViP: A Differentially Private Foundation Model for Computer Vision |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T17%3A00%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ViP:%20A%20Differentially%20Private%20Foundation%20Model%20for%20Computer%20Vision&rft.au=Yu,%20Yaodong&rft.date=2023-06-15&rft_id=info:doi/10.48550/arxiv.2306.08842&rft_dat=%3Carxiv_GOX%3E2306_08842%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |