Research Progress on Vision–Language Multimodal Pretraining Model Technology

Because the pretraining model is not limited by the scale of data annotation and can learn general semantic information, it performs well in tasks related to natural language processing and computer vision. In recent years, more and more attention has been paid to research on the multimodal pretrain...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2022-11, Vol.11 (21), p.3556
Hauptverfasser: Wang, Huansha, Huang, Ruiyang, Zhang, Jianpeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Because the pretraining model is not limited by the scale of data annotation and can learn general semantic information, it performs well in tasks related to natural language processing and computer vision. In recent years, more and more attention has been paid to research on the multimodal pretraining model. Many vision–language multimodal datasets and related models have been proposed one after another. In order to better summarize and analyze the development status and future trend of vision–language multimodal pretraining model technology, firstly this paper comprehensively combs the category system and related tasks of vision–language multimodal pretraining. Secondly, research progress on vision–language multimodal pretraining is summarized and analyzed from the two dimensions of image–language and video–language models. Finally, problems with and development trends in vision–language multimodal pretraining are discussed.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics11213556