Totally Looks Like - How Humans Compare, Compared to Machines
Perceptual judgment of image similarity by humans relies on rich internal representations ranging from low-level features to high-level concepts, scene properties and even cultural associations. However, existing methods and datasets attempting to explain perceived similarity use stimuli which argua...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Perceptual judgment of image similarity by humans relies on rich internal
representations ranging from low-level features to high-level concepts, scene
properties and even cultural associations. However, existing methods and
datasets attempting to explain perceived similarity use stimuli which arguably
do not cover the full breadth of factors that affect human similarity
judgments, even those geared toward this goal. We introduce a new dataset
dubbed Totally-Looks-Like (TLL) after a popular entertainment website, which
contains images paired by humans as being visually similar. The dataset
contains 6016 image-pairs from the wild, shedding light upon a rich and diverse
set of criteria employed by human beings. We conduct experiments to try to
reproduce the pairings via features extracted from state-of-the-art deep
convolutional neural networks, as well as additional human experiments to
verify the consistency of the collected data. Though we create conditions to
artificially make the matching task increasingly easier, we show that
machine-extracted representations perform very poorly in terms of reproducing
the matching selected by humans. We discuss and analyze these results,
suggesting future directions for improvement of learned image representations. |
---|---|
DOI: | 10.48550/arxiv.1803.01485 |