READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation
34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023 We present READMem (Robust Embedding Association for a Diverse Memory), a modular framework for semi-automatic video object segmentation (sVOS) methods designed to handle unconstrained videos. Contemporary s...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 34th British Machine Vision Conference 2023, {BMVC} 2023,
Aberdeen, UK, November 20-24, 2023 We present READMem (Robust Embedding Association for a Diverse Memory), a
modular framework for semi-automatic video object segmentation (sVOS) methods
designed to handle unconstrained videos. Contemporary sVOS works typically
aggregate video frames in an ever-expanding memory, demanding high hardware
resources for long-term applications. To mitigate memory requirements and
prevent near object duplicates (caused by information of adjacent frames),
previous methods introduce a hyper-parameter that controls the frequency of
frames eligible to be stored. This parameter has to be adjusted according to
concrete video properties (such as rapidity of appearance changes and video
length) and does not generalize well. Instead, we integrate the embedding of a
new frame into the memory only if it increases the diversity of the memory
content. Furthermore, we propose a robust association of the embeddings stored
in the memory with query embeddings during the update process. Our approach
avoids the accumulation of redundant data, allowing us in return, to restrict
the memory size and prevent extreme memory demands in long videos. We extend
popular sVOS baselines with READMem, which previously showed limited
performance on long videos. Our approach achieves competitive results on the
Long-time Video dataset (LV1) while not hindering performance on short
sequences. Our code is publicly available. |
---|---|
DOI: | 10.48550/arxiv.2305.12823 |