METHOD FOR MULTIMODAL EMBEDDING AND SYSTEM THEREFOR

Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	HAN, Bo Hyung, PARK, Jae Yoo, PARK, Jeong Hyung
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of token features for a text sample through a text encoder, softly masking patch features associated with a specific token of the text sample, generating a joint embedding by inputting the masked patch features and the token features into a multimodal encoder, and updating the multimodal encoder by performing an image-text matching (ITM) task based on the joint embedding.