TSCANet: a two-stream context aggregation network for weakly-supervised temporal action localization
Weakly supervised temporal action localization classifies and localizes actions in uncropped videos by using only video-level labels. Many current methods employ feature extractors initially intended for post-cropped video action classification. The accuracy of localization decreases when feature ex...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2025-01, Vol.81 (1), Article 311 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Weakly supervised temporal action localization classifies and localizes actions in uncropped videos by using only video-level labels. Many current methods employ feature extractors initially intended for post-cropped video action classification. The accuracy of localization decreases when feature extractors of this type are used, because they may introduce redundant information into the action localization task. To overcome the aforementioned constraints, we propose a WSTAL technique based on the two-stream context aggregation network (TSCANet), which consists of two main modules: a multistage temporal feature aggregation module (MSTFA) and a feature alignment module (FA). The MSTFA enables TSCANet to rapidly expand the receptive field and acquire temporal dependencies between long-distance segments by stacking dilated convolutional layers. Therefore, MSTFA allows the model to better aggregate temporal information in optical flow features to reduce redundant information in the original features. To avoid inconsistencies between the enhanced optical flow and RGB flow features, this study designed an FA to calibrate RGB features using optimized optical flow features through a mutual learning approach. On THUMOS14 and ActivityNet datasets, many comparative tests are carried out, and an improved localization performance is attained. In particular, localization at low t-IoU thresholds outperforms many of the existing WSTAL methods. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-024-06810-6 |