InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this report, we present our champion solutions to five tracks at Ego4D
challenge. We leverage our developed InternVideo, a video foundation model, for
five Ego4D tasks, including Moment Queries, Natural Language Queries, Future
Hand Prediction, State Change Object Detection, and Short-term Object
Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt
the strong foundation model to the downstream ego-centric video understanding
tasks with simple head designs. In these five tasks, the performance of
InternVideo-Ego4D comprehensively surpasses the baseline methods and the
champions of CVPR2022, demonstrating the powerful representation ability of
InternVideo as a video foundation model. Our code will be released at
https://github.com/OpenGVLab/ego4d-eccv2022-solutions |
---|---|
DOI: | 10.48550/arxiv.2211.09529 |