Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition
Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware ac...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the
activity performed together by a group of individuals with the video-level
label and without actor-level labels. We propose Flow-Assisted Motion Learning
Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor
encoder to extract actor features and the two-pathways relation module to infer
the interaction among actors and their activity. Flaming-Net leverages an
additional optical flow modality in the training stage to enhance its motion
awareness when finding locally active actors. The first pathway of the relation
module, the actor-centric path, initially captures the temporal dynamics of
individual actors and then constructs inter-actor relationships. In parallel,
the group-centric path starts by building spatial connections between actors
within the same timeframe and then captures simultaneous spatio-temporal
dynamics among them. We demonstrate that Flaming-Net achieves new
state-of-the-art WSGAR results on two benchmarks, including a 2.8%p higher MPCA
score on the NBA dataset. Importantly, we use the optical flow modality only
for training and not for inference. |
---|---|
DOI: | 10.48550/arxiv.2405.18012 |