Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mayr, Martin, Dreier, Marcel, Kordon, Florian, Seuret, Mathias, Zöllner, Jochen, Wu, Fei, Maier, Andreas, Christlein, Vincent
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Mayr, Martin
Dreier, Marcel
Kordon, Florian
Seuret, Mathias
Zöllner, Jochen
Wu, Fei
Maier, Andreas
Christlein, Vincent
description The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.
doi_str_mv 10.48550/arxiv.2409.00786
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2409_00786</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2409_00786</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2409_007863</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwMLcw42TwiUotytcNzsgvUQhILEpML0osyNDNSS1LzVHwSMxLKS_KLMnMS1fwzM0sSSzJzM9TKM8syVDwSSxJzStRcMlMSystBon65qek5hTzMLCmJeYUp_JCaW4GeTfXEGcPXbC98QVFmbmJRZXxIPvjwfYbE1YBAA-OO24</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models</title><source>arXiv.org</source><creator>Mayr, Martin ; Dreier, Marcel ; Kordon, Florian ; Seuret, Mathias ; Zöllner, Jochen ; Wu, Fei ; Maier, Andreas ; Christlein, Vincent</creator><creatorcontrib>Mayr, Martin ; Dreier, Marcel ; Kordon, Florian ; Seuret, Mathias ; Zöllner, Jochen ; Wu, Fei ; Maier, Andreas ; Christlein, Vincent</creatorcontrib><description>The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.</description><identifier>DOI: 10.48550/arxiv.2409.00786</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2409.00786$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2409.00786$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mayr, Martin</creatorcontrib><creatorcontrib>Dreier, Marcel</creatorcontrib><creatorcontrib>Kordon, Florian</creatorcontrib><creatorcontrib>Seuret, Mathias</creatorcontrib><creatorcontrib>Zöllner, Jochen</creatorcontrib><creatorcontrib>Wu, Fei</creatorcontrib><creatorcontrib>Maier, Andreas</creatorcontrib><creatorcontrib>Christlein, Vincent</creatorcontrib><title>Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models</title><description>The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw1DMwMLcw42TwiUotytcNzsgvUQhILEpML0osyNDNSS1LzVHwSMxLKS_KLMnMS1fwzM0sSSzJzM9TKM8syVDwSSxJzStRcMlMSystBon65qek5hTzMLCmJeYUp_JCaW4GeTfXEGcPXbC98QVFmbmJRZXxIPvjwfYbE1YBAA-OO24</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Mayr, Martin</creator><creator>Dreier, Marcel</creator><creator>Kordon, Florian</creator><creator>Seuret, Mathias</creator><creator>Zöllner, Jochen</creator><creator>Wu, Fei</creator><creator>Maier, Andreas</creator><creator>Christlein, Vincent</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240901</creationdate><title>Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models</title><author>Mayr, Martin ; Dreier, Marcel ; Kordon, Florian ; Seuret, Mathias ; Zöllner, Jochen ; Wu, Fei ; Maier, Andreas ; Christlein, Vincent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2409_007863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Mayr, Martin</creatorcontrib><creatorcontrib>Dreier, Marcel</creatorcontrib><creatorcontrib>Kordon, Florian</creatorcontrib><creatorcontrib>Seuret, Mathias</creatorcontrib><creatorcontrib>Zöllner, Jochen</creatorcontrib><creatorcontrib>Wu, Fei</creatorcontrib><creatorcontrib>Maier, Andreas</creatorcontrib><creatorcontrib>Christlein, Vincent</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mayr, Martin</au><au>Dreier, Marcel</au><au>Kordon, Florian</au><au>Seuret, Mathias</au><au>Zöllner, Jochen</au><au>Wu, Fei</au><au>Maier, Andreas</au><au>Christlein, Vincent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models</atitle><date>2024-09-01</date><risdate>2024</risdate><abstract>The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.</abstract><doi>10.48550/arxiv.2409.00786</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2409.00786
ispartof
issn
language eng
recordid cdi_arxiv_primary_2409_00786
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T21%3A44%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Zero-Shot%20Paragraph-level%20Handwriting%20Imitation%20with%20Latent%20Diffusion%20Models&rft.au=Mayr,%20Martin&rft.date=2024-09-01&rft_id=info:doi/10.48550/arxiv.2409.00786&rft_dat=%3Carxiv_GOX%3E2409_00786%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true