InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chen, Guo, Xing, Sen, Chen, Zhe, Wang, Yi, Li, Kunchang, Li, Yizhuo, Liu, Yi, Wang, Jiahao, Zheng, Yin-Dong, Huang, Bingkun, Zhao, Zhiyu, Pan, Junting, Huang, Yifei, Wang, Zun, Yu, Jiashuo, He, Yinan, Zhang, Hongjie, Lu, Tong, Wang, Yali, Wang, Limin, Qiao, Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions
DOI:10.48550/arxiv.2211.09529