Lester: rotoscope animation through video object segmentation and tracking

This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Tous, Ruben
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Tous, Ruben
description This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.
doi_str_mv 10.48550/arxiv.2402.09883
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_09883</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_09883</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-334d979d7418d203f19881bcc730750035d8e52ccfc2fb0b140c15a6642603723</originalsourceid><addsrcrecordid>eNotj71OwzAURr0woMIDMOEXSLj2tWOHDVX8KhJL98ixndQttSvHVPD2lLbTtxx9OoeQOwa10FLCg8k_4VBzAbyGVmu8Jh-dn4vPjzSnkmab9p6aGHamhBRpWef0Pa3pITifaBo23hY6-2nnYzkTJjpasrHbEKcbcjWar9nfXnZBVi_Pq-Vb1X2-vi-fuso0CitE4VrVOiWYdhxwZEcRNlirEJQEQOm0l9za0fJxgIEJsEyaphG8AVQcF-T-fHuK6ff5aJt_-_-o_hSFf5I8RxU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Lester: rotoscope animation through video object segmentation and tracking</title><source>arXiv.org</source><creator>Tous, Ruben</creator><creatorcontrib>Tous, Ruben</creatorcontrib><description>This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.</description><identifier>DOI: 10.48550/arxiv.2402.09883</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Graphics ; Computer Science - Multimedia</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.09883$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.09883$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tous, Ruben</creatorcontrib><title>Lester: rotoscope animation through video object segmentation and tracking</title><description>This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Graphics</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAURr0woMIDMOEXSLj2tWOHDVX8KhJL98ixndQttSvHVPD2lLbTtxx9OoeQOwa10FLCg8k_4VBzAbyGVmu8Jh-dn4vPjzSnkmab9p6aGHamhBRpWef0Pa3pITifaBo23hY6-2nnYzkTJjpasrHbEKcbcjWar9nfXnZBVi_Pq-Vb1X2-vi-fuso0CitE4VrVOiWYdhxwZEcRNlirEJQEQOm0l9za0fJxgIEJsEyaphG8AVQcF-T-fHuK6ff5aJt_-_-o_hSFf5I8RxU</recordid><startdate>20240215</startdate><enddate>20240215</enddate><creator>Tous, Ruben</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240215</creationdate><title>Lester: rotoscope animation through video object segmentation and tracking</title><author>Tous, Ruben</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-334d979d7418d203f19881bcc730750035d8e52ccfc2fb0b140c15a6642603723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Graphics</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Tous, Ruben</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tous, Ruben</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lester: rotoscope animation through video object segmentation and tracking</atitle><date>2024-02-15</date><risdate>2024</risdate><abstract>This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.</abstract><doi>10.48550/arxiv.2402.09883</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.09883
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_09883
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Graphics
Computer Science - Multimedia
title Lester: rotoscope animation through video object segmentation and tracking
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A50%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lester:%20rotoscope%20animation%20through%20video%20object%20segmentation%20and%20tracking&rft.au=Tous,%20Ruben&rft.date=2024-02-15&rft_id=info:doi/10.48550/arxiv.2402.09883&rft_dat=%3Carxiv_GOX%3E2402_09883%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true