METHOD FOR MULTIMODAL EMBEDDING AND SYSTEM THEREFOR
Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Provided are a method for multimodal embedding and a system therefor. The method according to some embodiments may include generating a plurality of patch features for an image sample through an image encoder, wherein the image sample and text sample form a positive pair, generating a plurality of token features for a text sample through a text encoder, softly masking patch features associated with a specific token of the text sample, generating a joint embedding by inputting the masked patch features and the token features into a multimodal encoder, and updating the multimodal encoder by performing an image-text matching (ITM) task based on the joint embedding. |
---|