Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systemat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Boutin, Victor, Mukherji, Rishav, Agrawal, Aditya, Muzellec, Sabine, Fel, Thomas, Serre, Thomas, VanRullen, Rufin
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Boutin, Victor
Mukherji, Rishav
Agrawal, Aditya
Muzellec, Sabine
Fel, Thomas
Serre, Thomas
VanRullen, Rufin
description Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.
doi_str_mv 10.48550/arxiv.2406.06079
format Conference Proceeding
fullrecord <record><control><sourceid>hal_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_06079</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_04800050v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</originalsourceid><addsrcrecordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><source>arXiv.org</source><creator>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creator><creatorcontrib>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creatorcontrib><description>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</description><identifier>DOI: 10.48550/arxiv.2406.06079</identifier><language>eng</language><subject>Computer Science ; Computer Science - Computer Vision and Pattern Recognition</subject><ispartof>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024</ispartof><rights>http://creativecommons.org/licenses/by/4.0</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3611-7716 ; 0000-0003-3372-5940</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,309,310,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.06079$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.06079$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04800050$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><title>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</title><description>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</description><subject>Computer Science</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>GOX</sourceid><recordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</recordid><startdate>20240610</startdate><enddate>20240610</enddate><creator>Boutin, Victor</creator><creator>Mukherji, Rishav</creator><creator>Agrawal, Aditya</creator><creator>Muzellec, Sabine</creator><creator>Fel, Thomas</creator><creator>Serre, Thomas</creator><creator>VanRullen, Rufin</creator><scope>AKY</scope><scope>GOX</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid></search><sort><creationdate>20240610</creationdate><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><author>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Boutin, Victor</au><au>Mukherji, Rishav</au><au>Agrawal, Aditya</au><au>Muzellec, Sabine</au><au>Fel, Thomas</au><au>Serre, Thomas</au><au>VanRullen, Rufin</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</atitle><btitle>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</btitle><date>2024-06-10</date><risdate>2024</risdate><abstract>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</abstract><doi>10.48550/arxiv.2406.06079</doi><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2406.06079
ispartof Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024
issn
language eng
recordid cdi_arxiv_primary_2406_06079
source arXiv.org
subjects Computer Science
Computer Science - Computer Vision and Pattern Recognition
title Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T12%3A02%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Latent%20Representation%20Matters:%20Human-like%20Sketches%20in%20One-shot%20Drawing%20Tasks&rft.btitle=Proceedings%20of%20the%2038th%20Conference%20on%20Neural%20Information%20Processing%20Systems%20(NeurIPS)&rft.au=Boutin,%20Victor&rft.date=2024-06-10&rft_id=info:doi/10.48550/arxiv.2406.06079&rft_dat=%3Chal_GOX%3Eoai_HAL_hal_04800050v1%3C/hal_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true