Shape-Based Measures Improve Scene Categorization

Converging evidence indicates that deep neural network models that are trained on large datasets are biased toward color and texture information. Humans, on the other hand, can easily recognize objects and scenes from images as well as from bounding contours. Mid-level vision is characterized by the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2024-04, Vol.46 (4), p.2041-2053
Hauptverfasser:	Rezanejad, Morteza, Wilder, John, Walther, Dirk B., Jepson, Allan D., Dickinson, Sven, Siddiqi, Kaleem
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Classification Contour geometry Contours Convolutional neural networks Feature extraction Gestalt grouping cues medial axis transform (MAT) Object recognition Observers Organizations scene categorization scene perception shape based measures Task analysis Transforms Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Converging evidence indicates that deep neural network models that are trained on large datasets are biased toward color and texture information. Humans, on the other hand, can easily recognize objects and scenes from images as well as from bounding contours. Mid-level vision is characterized by the recombination and organization of simple primary features into more complex ones by a set of so-called Gestalt grouping rules. While described qualitatively in the human literature, a computational implementation of these perceptual grouping rules is so far missing. In this article, we contribute a novel set of algorithms for the detection of contour-based cues in complex scenes. We use the medial axis transform (MAT) to locally score contours according to these grouping rules. We demonstrate the benefit of these cues for scene categorization in two ways: (i) Both human observers and CNN models categorize scenes most accurately when perceptual grouping information is emphasized. (ii) Weighting the contours with these measures boosts performance of a CNN model significantly compared to the use of unweighted contours. Our work suggests that, even though these measures are computed directly from contours in the image, current CNN models do not appear to extract or utilize these grouping cues.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2023.3333352