Triple-cooperative Video Shadow Detection

Shadow detection in a single image has received significant research interest in recent years. However, much fewer works have been explored in shadow detection over dynamic scenes. The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection. In t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Zhihao, Wan, Liang, Zhu, Lei, Shen, Jia, Fu, Huazhu, Liu, Wennan, Qin, Jing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Zhihao Wan, Liang Zhu, Lei Shen, Jia Fu, Huazhu Liu, Wennan Qin, Jing
description	Shadow detection in a single image has received significant research interest in recent years. However, much fewer works have been explored in shadow detection over dynamic scenes. The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection. In this work, we collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions. All the frames are annotated with a high-quality pixel-level shadow mask. To the best of our knowledge, this is the first learning-oriented dataset for video shadow detection. Furthermore, we develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net). It utilizes triple parallel networks in a cooperative manner to learn discriminative representations at intra-video and inter-video levels. Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos. Finally, we conduct a comprehensive study on ViSha, evaluating 12 state-of-the-art models (including single image shadow detectors, video object segmentation, and saliency detection methods). Experiments demonstrate that our model outperforms SOTA competitors.
doi_str_mv	10.48550/arxiv.2103.06533
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2103_06533</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2103_06533</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-e41e01a237df5124b3574f3acb169244d96d0e9e190fa1848e8c2f01ca9b73763</originalsourceid><addsrcrecordid>eNotzrtOwzAUgGEvDKjwAExk7ZDg4-NLPKJylSp1IOoandjHwlLBkYkKvD2iMP3br0-IK5Cd7o2RN1S_8rFTILGT1iCei_VQ83zgNpQyc6UlH7nZ58ileXmlWD6bO144LLm8X4izRIcPvvzvSgwP98Pmqd3uHp83t9uWrMOWNbAEUuhiMqD0hMbphBQmsF5pHb2Nkj2Dl4mg1z33QSUJgfzk0Flcieu_7ck6zjW_Uf0ef83jyYw_ERg61Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Triple-cooperative Video Shadow Detection</title><source>arXiv.org</source><creator>Chen, Zhihao ; Wan, Liang ; Zhu, Lei ; Shen, Jia ; Fu, Huazhu ; Liu, Wennan ; Qin, Jing</creator><creatorcontrib>Chen, Zhihao ; Wan, Liang ; Zhu, Lei ; Shen, Jia ; Fu, Huazhu ; Liu, Wennan ; Qin, Jing</creatorcontrib><description>Shadow detection in a single image has received significant research interest in recent years. However, much fewer works have been explored in shadow detection over dynamic scenes. The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection. In this work, we collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions. All the frames are annotated with a high-quality pixel-level shadow mask. To the best of our knowledge, this is the first learning-oriented dataset for video shadow detection. Furthermore, we develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net). It utilizes triple parallel networks in a cooperative manner to learn discriminative representations at intra-video and inter-video levels. Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos. Finally, we conduct a comprehensive study on ViSha, evaluating 12 state-of-the-art models (including single image shadow detectors, video object segmentation, and saliency detection methods). Experiments demonstrate that our model outperforms SOTA competitors.</description><identifier>DOI: 10.48550/arxiv.2103.06533</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2021-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2103.06533$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2103.06533$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Zhihao</creatorcontrib><creatorcontrib>Wan, Liang</creatorcontrib><creatorcontrib>Zhu, Lei</creatorcontrib><creatorcontrib>Shen, Jia</creatorcontrib><creatorcontrib>Fu, Huazhu</creatorcontrib><creatorcontrib>Liu, Wennan</creatorcontrib><creatorcontrib>Qin, Jing</creatorcontrib><title>Triple-cooperative Video Shadow Detection</title><description>Shadow detection in a single image has received significant research interest in recent years. However, much fewer works have been explored in shadow detection over dynamic scenes. The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection. In this work, we collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions. All the frames are annotated with a high-quality pixel-level shadow mask. To the best of our knowledge, this is the first learning-oriented dataset for video shadow detection. Furthermore, we develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net). It utilizes triple parallel networks in a cooperative manner to learn discriminative representations at intra-video and inter-video levels. Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos. Finally, we conduct a comprehensive study on ViSha, evaluating 12 state-of-the-art models (including single image shadow detectors, video object segmentation, and saliency detection methods). Experiments demonstrate that our model outperforms SOTA competitors.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtOwzAUgGEvDKjwAExk7ZDg4-NLPKJylSp1IOoandjHwlLBkYkKvD2iMP3br0-IK5Cd7o2RN1S_8rFTILGT1iCei_VQ83zgNpQyc6UlH7nZ58ileXmlWD6bO144LLm8X4izRIcPvvzvSgwP98Pmqd3uHp83t9uWrMOWNbAEUuhiMqD0hMbphBQmsF5pHb2Nkj2Dl4mg1z33QSUJgfzk0Flcieu_7ck6zjW_Uf0ef83jyYw_ERg61Q</recordid><startdate>20210311</startdate><enddate>20210311</enddate><creator>Chen, Zhihao</creator><creator>Wan, Liang</creator><creator>Zhu, Lei</creator><creator>Shen, Jia</creator><creator>Fu, Huazhu</creator><creator>Liu, Wennan</creator><creator>Qin, Jing</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210311</creationdate><title>Triple-cooperative Video Shadow Detection</title><author>Chen, Zhihao ; Wan, Liang ; Zhu, Lei ; Shen, Jia ; Fu, Huazhu ; Liu, Wennan ; Qin, Jing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-e41e01a237df5124b3574f3acb169244d96d0e9e190fa1848e8c2f01ca9b73763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Zhihao</creatorcontrib><creatorcontrib>Wan, Liang</creatorcontrib><creatorcontrib>Zhu, Lei</creatorcontrib><creatorcontrib>Shen, Jia</creatorcontrib><creatorcontrib>Fu, Huazhu</creatorcontrib><creatorcontrib>Liu, Wennan</creatorcontrib><creatorcontrib>Qin, Jing</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Zhihao</au><au>Wan, Liang</au><au>Zhu, Lei</au><au>Shen, Jia</au><au>Fu, Huazhu</au><au>Liu, Wennan</au><au>Qin, Jing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Triple-cooperative Video Shadow Detection</atitle><date>2021-03-11</date><risdate>2021</risdate><abstract>Shadow detection in a single image has received significant research interest in recent years. However, much fewer works have been explored in shadow detection over dynamic scenes. The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection. In this work, we collect a new video shadow detection dataset, which contains 120 videos with 11, 685 frames, covering 60 object categories, varying lengths, and different motion/lighting conditions. All the frames are annotated with a high-quality pixel-level shadow mask. To the best of our knowledge, this is the first learning-oriented dataset for video shadow detection. Furthermore, we develop a new baseline model, named triple-cooperative video shadow detection network (TVSD-Net). It utilizes triple parallel networks in a cooperative manner to learn discriminative representations at intra-video and inter-video levels. Within the network, a dual gated co-attention module is proposed to constrain features from neighboring frames in the same video, while an auxiliary similarity loss is introduced to mine semantic information between different videos. Finally, we conduct a comprehensive study on ViSha, evaluating 12 state-of-the-art models (including single image shadow detectors, video object segmentation, and saliency detection methods). Experiments demonstrate that our model outperforms SOTA competitors.</abstract><doi>10.48550/arxiv.2103.06533</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2103.06533
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2103_06533
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Triple-cooperative Video Shadow Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T02%3A35%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Triple-cooperative%20Video%20Shadow%20Detection&rft.au=Chen,%20Zhihao&rft.date=2021-03-11&rft_id=info:doi/10.48550/arxiv.2103.06533&rft_dat=%3Carxiv_GOX%3E2103_06533%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true