Activity Grammars for Temporal Action Segmentation
Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, r...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Sequence prediction on temporal data requires the ability to understand
compositional structures of multi-level semantics beyond individual and
contextual properties. The task of temporal action segmentation, which aims at
translating an untrimmed activity video into a sequence of action segments,
remains challenging for this reason. This paper addresses the problem by
introducing an effective activity grammar to guide neural predictions for
temporal action segmentation. We propose a novel grammar induction algorithm
that extracts a powerful context-free grammar from action sequence data. We
also develop an efficient generalized parser that transforms frame-level
probability distributions into a reliable sequence of actions according to the
induced grammar with recursive rules. Our approach can be combined with any
neural network for temporal action segmentation to enhance the sequence
prediction and discover its compositional structure. Experimental results
demonstrate that our method significantly improves temporal action segmentation
in terms of both performance and interpretability on two standard benchmarks,
Breakfast and 50 Salads. |
---|---|
DOI: | 10.48550/arxiv.2312.04266 |