Weakly supervised temporal action localization: a survey

Temporal action localization (TAL) is one of the most important tasks in video understanding. Weakly supervised temporal action localization (WTAL) involves classifying and localizing all the action instances in untrimmed videos under the supervision of only video-level category labels, which is a c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2024-02, Vol.83 (32), p.78361-78386
Hauptverfasser: Li, Ronglu, Zhang, Tianyi, Zhang, Rubo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Temporal action localization (TAL) is one of the most important tasks in video understanding. Weakly supervised temporal action localization (WTAL) involves classifying and localizing all the action instances in untrimmed videos under the supervision of only video-level category labels, which is a challenging task because of the absence of frame-level annotations. In this study, first, we review the development process of the WTAL task in recent years, summarize and analyze the main problems of WTAL. Second, we classify and compare the research approaches of existing models and thoroughly discuss methods based on multiple instance learning (MIL), feature erasing, the attention mechanism, similarity propagation, pseudo-ground truth generation, contrastive learning, and adversarial learning. Then, we present the datasets and evaluation criteria for the WTAL task. Finally, we discuss the main application areas and further developments in WTAL.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-024-18554-9