FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
Real-life applications of action recognition often require a fine-grained understanding of subtle movements, e.g., in sports analytics, user interactions in AR/VR, and surgical videos. Although fine-grained actions are more costly to annotate, existing semi-supervised action recognition has mainly f...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Real-life applications of action recognition often require a fine-grained
understanding of subtle movements, e.g., in sports analytics, user interactions
in AR/VR, and surgical videos. Although fine-grained actions are more costly to
annotate, existing semi-supervised action recognition has mainly focused on
coarse-grained action recognition. Since fine-grained actions are more
challenging due to the absence of scene bias, classifying these actions
requires an understanding of action-phases. Hence, existing coarse-grained
semi-supervised methods do not work effectively. In this work, we for the first
time thoroughly investigate semi-supervised fine-grained action recognition
(FGAR). We observe that alignment distances like dynamic time warping (DTW)
provide a suitable action-phase-aware measure for comparing fine-grained
actions, a concept previously unexploited in FGAR. However, since regular DTW
distance is pairwise and assumes strict alignment between pairs, it is not
directly suitable for classifying fine-grained actions. To utilize such
alignment distances in a limited-label setting, we propose an
Alignability-Verification-based Metric learning technique to effectively
discriminate between fine-grained action pairs. Our learnable alignability
score provides a better phase-aware measure, which we use to refine the
pseudo-labels of the primary video encoder. Our collaborative
pseudo-labeling-based framework `\textit{FinePseudo}' significantly outperforms
prior methods on four fine-grained action recognition datasets: Diving48,
FineGym99, FineGym288, and FineDiving, and shows improvement on existing
coarse-grained datasets: Kinetics400 and Something-SomethingV2. We also
demonstrate the robustness of our collaborative pseudo-labeling in handling
novel unlabeled classes in open-world semi-supervised setups. Project Page:
https://daveishan.github.io/finepsuedo-webpage/. |
---|---|
DOI: | 10.48550/arxiv.2409.01448 |