Weakly supervised temporal action localization: a survey
Temporal action localization (TAL) is one of the most important tasks in video understanding. Weakly supervised temporal action localization (WTAL) involves classifying and localizing all the action instances in untrimmed videos under the supervision of only video-level category labels, which is a c...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2024-02, Vol.83 (32), p.78361-78386 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Temporal action localization (TAL) is one of the most important tasks in video understanding. Weakly supervised temporal action localization (WTAL) involves classifying and localizing all the action instances in untrimmed videos under the supervision of only video-level category labels, which is a challenging task because of the absence of frame-level annotations. In this study, first, we review the development process of the WTAL task in recent years, summarize and analyze the main problems of WTAL. Second, we classify and compare the research approaches of existing models and thoroughly discuss methods based on multiple instance learning (MIL), feature erasing, the attention mechanism, similarity propagation, pseudo-ground truth generation, contrastive learning, and adversarial learning. Then, we present the datasets and evaluation criteria for the WTAL task. Finally, we discuss the main application areas and further developments in WTAL. |
---|---|
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-024-18554-9 |