METHOD FOR MULTIMODAL EMBEDDING AND SYSTEM THEREFOR

Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HAN, Bo Hyung, PARK, Jae Yoo, PARK, Jeong Hyung
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of token features for a text sample through a text encoder, softly masking patch features associated with a specific token of the text sample, generating a joint embedding by inputting the masked patch features and the token features into a multimodal encoder, and updating the multimodal encoder by performing an image-text matching (ITM) task based on the joint embedding.