A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large training datasets, then the resulting aligned models perform well on zero-shot cl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tejankar, Ajinkya, Sanjabi, Maziar, Wu, Bichen, Xie, Saining, Khabsa, Madian, Pirsiavash, Hamed, Firooz, Hamed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!