Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance

Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamle...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-03
Hauptverfasser: Pan, Yulin, Mao, Chaojie, Jiang, Zeyinzi, Han, Zhen, Zhang, Jingfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Pan, Yulin
Mao, Chaojie
Jiang, Zeyinzi
Han, Zhen
Zhang, Jingfeng
description Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3015048816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3015048816</sourcerecordid><originalsourceid>FETCH-proquest_journals_30150488163</originalsourceid><addsrcrecordid>eNqNyk8LgjAYgPERBEn5HV7oqjA3NekW0j_oVHaWpa82yWluo-jTV9AH6PQcfs-IOIzzwE9CxibE1bqhlLJ4waKIO-R86Aph0IOV1rJWHhyxkgqXkIlWqhpSq03XyheWsG9FjbBXvZDKfO0hzRUyfBr_ZC8NFga2VpZCFTgj40rcNLq_Tsl8s87Snd8P3d2iNnnT2UF9KOc0iGiYJEHM_7veM-8_4w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3015048816</pqid></control><display><type>article</type><title>Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance</title><source>Free E- Journals</source><creator>Pan, Yulin ; Mao, Chaojie ; Jiang, Zeyinzi ; Han, Zhen ; Zhang, Jingfeng</creator><creatorcontrib>Pan, Yulin ; Mao, Chaojie ; Jiang, Zeyinzi ; Han, Zhen ; Zhang, Jingfeng</creatorcontrib><description>Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Editing ; Semantics</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Pan, Yulin</creatorcontrib><creatorcontrib>Mao, Chaojie</creatorcontrib><creatorcontrib>Jiang, Zeyinzi</creatorcontrib><creatorcontrib>Han, Zhen</creatorcontrib><creatorcontrib>Zhang, Jingfeng</creatorcontrib><title>Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance</title><title>arXiv.org</title><description>Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}.</description><subject>Editing</subject><subject>Semantics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyk8LgjAYgPERBEn5HV7oqjA3NekW0j_oVHaWpa82yWluo-jTV9AH6PQcfs-IOIzzwE9CxibE1bqhlLJ4waKIO-R86Aph0IOV1rJWHhyxkgqXkIlWqhpSq03XyheWsG9FjbBXvZDKfO0hzRUyfBr_ZC8NFga2VpZCFTgj40rcNLq_Tsl8s87Snd8P3d2iNnnT2UF9KOc0iGiYJEHM_7veM-8_4w</recordid><startdate>20240328</startdate><enddate>20240328</enddate><creator>Pan, Yulin</creator><creator>Mao, Chaojie</creator><creator>Jiang, Zeyinzi</creator><creator>Han, Zhen</creator><creator>Zhang, Jingfeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240328</creationdate><title>Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance</title><author>Pan, Yulin ; Mao, Chaojie ; Jiang, Zeyinzi ; Han, Zhen ; Zhang, Jingfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30150488163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Editing</topic><topic>Semantics</topic><toplevel>online_resources</toplevel><creatorcontrib>Pan, Yulin</creatorcontrib><creatorcontrib>Mao, Chaojie</creatorcontrib><creatorcontrib>Jiang, Zeyinzi</creatorcontrib><creatorcontrib>Han, Zhen</creatorcontrib><creatorcontrib>Zhang, Jingfeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pan, Yulin</au><au>Mao, Chaojie</au><au>Jiang, Zeyinzi</au><au>Han, Zhen</au><au>Zhang, Jingfeng</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance</atitle><jtitle>arXiv.org</jtitle><date>2024-03-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_3015048816
source Free E- Journals
subjects Editing
Semantics
title Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T16%3A26%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Locate,%20Assign,%20Refine:%20Taming%20Customized%20Image%20Inpainting%20with%20Text-Subject%20Guidance&rft.jtitle=arXiv.org&rft.au=Pan,%20Yulin&rft.date=2024-03-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3015048816%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3015048816&rft_id=info:pmid/&rfr_iscdi=true