MKP-Net: Memory knowledge propagation network for point-supervised temporal action localization in livestreaming

Standardized regulation of livestreaming is an important element of cyberspace governance. Temporal action localization (TAL) can localize the occurrence of specific actions to better understand human activities. Due to the short duration and inconspicuous boundaries of human-specific actions, it is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2024-11, Vol.248, p.104109, Article 104109
Hauptverfasser: Chen, Lin, Zhang, Jing, Zhang, Yian, Kang, Junpeng, Zhuo, Li
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Standardized regulation of livestreaming is an important element of cyberspace governance. Temporal action localization (TAL) can localize the occurrence of specific actions to better understand human activities. Due to the short duration and inconspicuous boundaries of human-specific actions, it is very cumbersome to obtain sufficient labeled data for training in untrimmed livestreaming. The point-supervised approach requires only a single-frame annotation for each action instance and can effectively balance cost and performance. Therefore, we propose a memory knowledge propagation network (MKP-Net) for point-supervised temporal action localization in livestreaming, including (1) a plug-and-play memory module is introduced to model prototype features of foreground actions and background knowledge using point-level annotations, (2) the memory knowledge propagation mechanism is used to generate discriminative feature representation in a multi-instance learning pipeline, and (3) localization completeness learning is performed by designing a dual optimization loss for refining and localizing temporal actions. Experimental results show that our method achieves 61.4% and 49.1% SOTAs on THUMOS14 and self-built BJUT-PTAL datasets, respectively, with an inference speed of 711 FPS. •We present a plug-and-play memory module that uses point-level annotations to model prototype features.•A memory knowledge propagation mechanism is proposed to provide more discriminative feature representations.•A dual optimization loss is designed to improve the quality of pseudo-labels.•A point-supervised temporal action localization pipeline is provided with SOTA performance in livestreaming.•We collect a BJUT-PTAL dataset for temporal action localization evaluation in livestreaming.
ISSN:1077-3142
DOI:10.1016/j.cviu.2024.104109