ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation

We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lattari, Francesco, Ciccone, Marco, Matteucci, Matteo, Masci, Jonathan, Visin, Francesco
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lattari, Francesco Ciccone, Marco Matteucci, Matteo Masci, Jonathan Visin, Francesco
description	We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the $10$-th position.
doi_str_mv	10.48550/arxiv.1806.05510
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1806_05510</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1806_05510</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-b745a7f2b7bca45d37121a0c074617d8bba3a6543a19edb296a0fb4e4de959903</originalsourceid><addsrcrecordid>eNotz81Kw0AUBeDZuJDqA7hyXiDxTjI_GXcSrAqtBRvchjuZmzaSP9Jp1beXxq4OBw4HPsbuBMQyUwoecPppTrHIQMeglIBrtv6gfOhP7xQe-WfjaeAb90VV4FvaddQHDM3Q8-8m7Pl2PJeooG4cJmz5kjAcJzrw9eCP7Ty8YVc1tge6veSCFcvnIn-NVpuXt_xpFaE2EDkjFZo6ccZVKJVPjUgEQgVGamF85hymqJVMUVjyLrEaoXaSpCerrIV0we7_b2dPOU5Nh9NveXaVsyv9A0jeSKo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation</title><source>arXiv.org</source><creator>Lattari, Francesco ; Ciccone, Marco ; Matteucci, Matteo ; Masci, Jonathan ; Visin, Francesco</creator><creatorcontrib>Lattari, Francesco ; Ciccone, Marco ; Matteucci, Matteo ; Masci, Jonathan ; Visin, Francesco</creatorcontrib><description>We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the $10$-th position.</description><identifier>DOI: 10.48550/arxiv.1806.05510</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2018-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1806.05510$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1806.05510$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lattari, Francesco</creatorcontrib><creatorcontrib>Ciccone, Marco</creatorcontrib><creatorcontrib>Matteucci, Matteo</creatorcontrib><creatorcontrib>Masci, Jonathan</creatorcontrib><creatorcontrib>Visin, Francesco</creatorcontrib><title>ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation</title><description>We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the $10$-th position.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81Kw0AUBeDZuJDqA7hyXiDxTjI_GXcSrAqtBRvchjuZmzaSP9Jp1beXxq4OBw4HPsbuBMQyUwoecPppTrHIQMeglIBrtv6gfOhP7xQe-WfjaeAb90VV4FvaddQHDM3Q8-8m7Pl2PJeooG4cJmz5kjAcJzrw9eCP7Ty8YVc1tge6veSCFcvnIn-NVpuXt_xpFaE2EDkjFZo6ccZVKJVPjUgEQgVGamF85hymqJVMUVjyLrEaoXaSpCerrIV0we7_b2dPOU5Nh9NveXaVsyv9A0jeSKo</recordid><startdate>20180614</startdate><enddate>20180614</enddate><creator>Lattari, Francesco</creator><creator>Ciccone, Marco</creator><creator>Matteucci, Matteo</creator><creator>Masci, Jonathan</creator><creator>Visin, Francesco</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180614</creationdate><title>ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation</title><author>Lattari, Francesco ; Ciccone, Marco ; Matteucci, Matteo ; Masci, Jonathan ; Visin, Francesco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-b745a7f2b7bca45d37121a0c074617d8bba3a6543a19edb296a0fb4e4de959903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lattari, Francesco</creatorcontrib><creatorcontrib>Ciccone, Marco</creatorcontrib><creatorcontrib>Matteucci, Matteo</creatorcontrib><creatorcontrib>Masci, Jonathan</creatorcontrib><creatorcontrib>Visin, Francesco</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lattari, Francesco</au><au>Ciccone, Marco</au><au>Matteucci, Matteo</au><au>Masci, Jonathan</au><au>Visin, Francesco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation</atitle><date>2018-06-14</date><risdate>2018</risdate><abstract>We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the $10$-th position.</abstract><doi>10.48550/arxiv.1806.05510</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1806.05510
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1806_05510
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T18%3A00%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ReConvNet:%20Video%20Object%20Segmentation%20with%20Spatio-Temporal%20Features%20Modulation&rft.au=Lattari,%20Francesco&rft.date=2018-06-14&rft_id=info:doi/10.48550/arxiv.1806.05510&rft_dat=%3Carxiv_GOX%3E1806_05510%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true