Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities
Action recognition and detection in the context of long untrimmed video sequences has seen an increased attention from the research community. However, annotation of complex activities is usually time consuming and challenging in practice. Therefore, recent works started to tackle the problem of uns...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Action recognition and detection in the context of long untrimmed video
sequences has seen an increased attention from the research community. However,
annotation of complex activities is usually time consuming and challenging in
practice. Therefore, recent works started to tackle the problem of unsupervised
learning of sub-actions in complex activities. This paper proposes a novel
approach for unsupervised sub-action learning in complex activities. The
proposed method maps both visual and temporal representations to a latent space
where the sub-actions are learnt discriminatively in an end-to-end fashion. To
this end, we propose to learn sub-actions as latent concepts and a novel
discriminative latent concept learning (DLCL) module aids in learning
sub-actions. The proposed DLCL module lends on the idea of latent concepts to
learn compact representations in the latent embedding space in an unsupervised
way. The result is a set of latent vectors that can be interpreted as cluster
centers in the embedding space. The latent space itself is formed by a joint
visual and temporal embedding capturing the visual similarity and temporal
ordering of the data. Our joint learning with discriminative latent concept
module is novel which eliminates the need for explicit clustering. We validate
our approach on three benchmark datasets and show that the proposed combination
of visual-temporal embedding and discriminative latent concepts allow to learn
robust action representations in an unsupervised setting. |
---|---|
DOI: | 10.48550/arxiv.2105.00067 |