Explaining CLIP's performance disparities on data from blind/low vision users

Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used L...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Massiceti, Daniela, Longden, Camilla, Słowik, Agnieszka, Wills, Samuel, Grayson, Martin, Morrison, Cecily
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!