Distill and Collect for Semi-Supervised Temporal Action Segmentation
Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is available. In contrast, we can easily collect a large corpus of...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent temporal action segmentation approaches need frame annotations during
training to be effective. These annotations are very expensive and
time-consuming to obtain. This limits their performances when only limited
annotated data is available. In contrast, we can easily collect a large corpus
of in-domain unannotated videos by scavenging through the internet. Thus, this
paper proposes an approach for the temporal action segmentation task that can
simultaneously leverage knowledge from annotated and unannotated video
sequences. Our approach uses multi-stream distillation that repeatedly refines
and finally combines their frame predictions. Our model also predicts the
action order, which is later used as a temporal constraint while estimating
frames labels to counter the lack of supervision for unannotated videos. In the
end, our evaluation of the proposed approach on two different datasets
demonstrates its capability to achieve comparable performance to the full
supervision despite limited annotation. |
---|---|
DOI: | 10.48550/arxiv.2211.01311 |