Weakly-Supervised RGBD Video Object Segmentation

Depth information opens up opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, RGBD VOS is still unexplored due to the high-cost collection and time-consuming annotation of RGBD segmentation data. In this work, we first introduce a new benchma...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2024-01, Vol.PP, p.1-1
Hauptverfasser: Yang, Jinyu, Gao, Mingqi, Zheng, Feng, Zhen, Xiantong, Ji, Rongrong, Shao, Ling, Leonardis, Ales
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Depth information opens up opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, RGBD VOS is still unexplored due to the high-cost collection and time-consuming annotation of RGBD segmentation data. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames) and is annotated with masks and bounding boxes. Then, we propose a novel and strong baseline model - Fused Color-Depth Network (FusedCDNet) which can be learned merely under bounding box supervision and then be used to generate masks with a bounding box guideline only in the first frame. In summary, our model includes three major advantages: a weakly-supervised training strategy to overcome the high-cost labeling, a cross-modal fusion module to handle complex scenes, and weakly-supervised prediction to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project http://github.com/yjybuaa/depthvos/, which will facilitate the development of RGBD VOS.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2024.3374130