Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systemat...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Boutin, Victor Mukherji, Rishav Agrawal, Aditya Muzellec, Sabine Fel, Thomas Serre, Thomas VanRullen, Rufin |
description | Humans can effortlessly draw new categories from a single exemplar, a feat
that has long posed a challenge for generative models. However, this gap has
started to close with recent advances in diffusion models. This one-shot
drawing task requires powerful inductive biases that have not been
systematically investigated. Here, we study how different inductive biases
shape the latent space of Latent Diffusion Models (LDMs). Along with standard
LDM regularizers (KL and vector quantization), we explore supervised
regularizations (including classification and prototype-based representation)
and contrastive inductive biases (using SimCLR and redundancy reduction
objectives). We demonstrate that LDMs with redundancy reduction and
prototype-based regularizations produce near-human-like drawings (regarding
both samples' recognizability and originality) -- better mimicking human
perception (as evaluated psychophysically). Overall, our results suggest that
the gap between humans and machines in one-shot drawings is almost closed. |
doi_str_mv | 10.48550/arxiv.2406.06079 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>hal_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_06079</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_04800050v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</originalsourceid><addsrcrecordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><source>arXiv.org</source><creator>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creator><creatorcontrib>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creatorcontrib><description>Humans can effortlessly draw new categories from a single exemplar, a feat
that has long posed a challenge for generative models. However, this gap has
started to close with recent advances in diffusion models. This one-shot
drawing task requires powerful inductive biases that have not been
systematically investigated. Here, we study how different inductive biases
shape the latent space of Latent Diffusion Models (LDMs). Along with standard
LDM regularizers (KL and vector quantization), we explore supervised
regularizations (including classification and prototype-based representation)
and contrastive inductive biases (using SimCLR and redundancy reduction
objectives). We demonstrate that LDMs with redundancy reduction and
prototype-based regularizations produce near-human-like drawings (regarding
both samples' recognizability and originality) -- better mimicking human
perception (as evaluated psychophysically). Overall, our results suggest that
the gap between humans and machines in one-shot drawings is almost closed.</description><identifier>DOI: 10.48550/arxiv.2406.06079</identifier><language>eng</language><subject>Computer Science ; Computer Science - Computer Vision and Pattern Recognition</subject><ispartof>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024</ispartof><rights>http://creativecommons.org/licenses/by/4.0</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3611-7716 ; 0000-0003-3372-5940</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,309,310,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.06079$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.06079$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04800050$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><title>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</title><description>Humans can effortlessly draw new categories from a single exemplar, a feat
that has long posed a challenge for generative models. However, this gap has
started to close with recent advances in diffusion models. This one-shot
drawing task requires powerful inductive biases that have not been
systematically investigated. Here, we study how different inductive biases
shape the latent space of Latent Diffusion Models (LDMs). Along with standard
LDM regularizers (KL and vector quantization), we explore supervised
regularizations (including classification and prototype-based representation)
and contrastive inductive biases (using SimCLR and redundancy reduction
objectives). We demonstrate that LDMs with redundancy reduction and
prototype-based regularizations produce near-human-like drawings (regarding
both samples' recognizability and originality) -- better mimicking human
perception (as evaluated psychophysically). Overall, our results suggest that
the gap between humans and machines in one-shot drawings is almost closed.</description><subject>Computer Science</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>GOX</sourceid><recordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</recordid><startdate>20240610</startdate><enddate>20240610</enddate><creator>Boutin, Victor</creator><creator>Mukherji, Rishav</creator><creator>Agrawal, Aditya</creator><creator>Muzellec, Sabine</creator><creator>Fel, Thomas</creator><creator>Serre, Thomas</creator><creator>VanRullen, Rufin</creator><scope>AKY</scope><scope>GOX</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid></search><sort><creationdate>20240610</creationdate><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><author>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Boutin, Victor</au><au>Mukherji, Rishav</au><au>Agrawal, Aditya</au><au>Muzellec, Sabine</au><au>Fel, Thomas</au><au>Serre, Thomas</au><au>VanRullen, Rufin</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</atitle><btitle>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</btitle><date>2024-06-10</date><risdate>2024</risdate><abstract>Humans can effortlessly draw new categories from a single exemplar, a feat
that has long posed a challenge for generative models. However, this gap has
started to close with recent advances in diffusion models. This one-shot
drawing task requires powerful inductive biases that have not been
systematically investigated. Here, we study how different inductive biases
shape the latent space of Latent Diffusion Models (LDMs). Along with standard
LDM regularizers (KL and vector quantization), we explore supervised
regularizations (including classification and prototype-based representation)
and contrastive inductive biases (using SimCLR and redundancy reduction
objectives). We demonstrate that LDMs with redundancy reduction and
prototype-based regularizations produce near-human-like drawings (regarding
both samples' recognizability and originality) -- better mimicking human
perception (as evaluated psychophysically). Overall, our results suggest that
the gap between humans and machines in one-shot drawings is almost closed.</abstract><doi>10.48550/arxiv.2406.06079</doi><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.06079 |
ispartof | Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024 |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_06079 |
source | arXiv.org |
subjects | Computer Science Computer Science - Computer Vision and Pattern Recognition |
title | Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T12%3A02%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Latent%20Representation%20Matters:%20Human-like%20Sketches%20in%20One-shot%20Drawing%20Tasks&rft.btitle=Proceedings%20of%20the%2038th%20Conference%20on%20Neural%20Information%20Processing%20Systems%20(NeurIPS)&rft.au=Boutin,%20Victor&rft.date=2024-06-10&rft_id=info:doi/10.48550/arxiv.2406.06079&rft_dat=%3Chal_GOX%3Eoai_HAL_hal_04800050v1%3C/hal_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |