Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding
Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2022-10, Vol.32 (10), p.7190-7203 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2022.3179314 |