Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systemat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Boutin, Victor, Mukherji, Rishav, Agrawal, Aditya, Muzellec, Sabine, Fel, Thomas, Serre, Thomas, VanRullen, Rufin
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Computer Science Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Boutin, Victor Mukherji, Rishav Agrawal, Aditya Muzellec, Sabine Fel, Thomas Serre, Thomas VanRullen, Rufin
description	Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.
doi_str_mv	10.48550/arxiv.2406.06079
format	Conference Proceeding
fullrecord	<record><control><sourceid>hal_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_06079</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_04800050v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</originalsourceid><addsrcrecordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><source>arXiv.org</source><creator>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creator><creatorcontrib>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</creatorcontrib><description>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</description><identifier>DOI: 10.48550/arxiv.2406.06079</identifier><language>eng</language><subject>Computer Science ; Computer Science - Computer Vision and Pattern Recognition</subject><ispartof>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024</ispartof><rights>http://creativecommons.org/licenses/by/4.0</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3611-7716 ; 0000-0003-3372-5940</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,309,310,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.06079$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.06079$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-04800050$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><title>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</title><description>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</description><subject>Computer Science</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>GOX</sourceid><recordid>eNo9kD1PwzAQhr0woMIPYMIrQ8I5jh2HrSofRQqqBGWOro5DrKROZZsC_560RSz3nl49OukeQq4YpLkSAm7Rf9t9muUgU5BQlOekqjAaF-mr2XkTpg2jHR19wRiND3d0-blFlwy2N_StN1F3JlDr6MqZJHRjpPcev6z7oGsMfbggZy0OwVz-5Yy8Pz6sF8ukWj09L-ZVggxYmTQZlwXnGWy01EyLghUFN4qLrMxbwVSTo9GtybhuSqlKnEYjpUKVtQ1uUPEZuTnd7XCod95u0f_UI9p6Oa_qQwe5AgABezax1yf2-Ps_fXBQHx3wX57wVYw</recordid><startdate>20240610</startdate><enddate>20240610</enddate><creator>Boutin, Victor</creator><creator>Mukherji, Rishav</creator><creator>Agrawal, Aditya</creator><creator>Muzellec, Sabine</creator><creator>Fel, Thomas</creator><creator>Serre, Thomas</creator><creator>VanRullen, Rufin</creator><scope>AKY</scope><scope>GOX</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid></search><sort><creationdate>20240610</creationdate><title>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</title><author>Boutin, Victor ; Mukherji, Rishav ; Agrawal, Aditya ; Muzellec, Sabine ; Fel, Thomas ; Serre, Thomas ; VanRullen, Rufin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1019-d23673320bc6c1c571773e835294f518d4aecfe23cd9689a968d668a82fdaba83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Boutin, Victor</creatorcontrib><creatorcontrib>Mukherji, Rishav</creatorcontrib><creatorcontrib>Agrawal, Aditya</creatorcontrib><creatorcontrib>Muzellec, Sabine</creatorcontrib><creatorcontrib>Fel, Thomas</creatorcontrib><creatorcontrib>Serre, Thomas</creatorcontrib><creatorcontrib>VanRullen, Rufin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Boutin, Victor</au><au>Mukherji, Rishav</au><au>Agrawal, Aditya</au><au>Muzellec, Sabine</au><au>Fel, Thomas</au><au>Serre, Thomas</au><au>VanRullen, Rufin</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks</atitle><btitle>Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)</btitle><date>2024-06-10</date><risdate>2024</risdate><abstract>Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed.</abstract><doi>10.48550/arxiv.2406.06079</doi><orcidid>https://orcid.org/0000-0002-3611-7716</orcidid><orcidid>https://orcid.org/0000-0003-3372-5940</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2406.06079
ispartof	Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024
issn
language	eng
recordid	cdi_arxiv_primary_2406_06079
source	arXiv.org
subjects	Computer Science Computer Science - Computer Vision and Pattern Recognition
title	Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T12%3A02%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Latent%20Representation%20Matters:%20Human-like%20Sketches%20in%20One-shot%20Drawing%20Tasks&rft.btitle=Proceedings%20of%20the%2038th%20Conference%20on%20Neural%20Information%20Processing%20Systems%20(NeurIPS)&rft.au=Boutin,%20Victor&rft.date=2024-06-10&rft_id=info:doi/10.48550/arxiv.2406.06079&rft_dat=%3Chal_GOX%3Eoai_HAL_hal_04800050v1%3C/hal_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true