Zero-shot counting with a dual-stream neural network model

To understand a visual scene, observers need to both recognize objects and encode relational structure. For example, a scene comprising three apples requires the observer to encode concepts of “apple” and “three.” In the primate brain, these functions rely on dual (ventral and dorsal) processing str...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neuron (Cambridge, Mass.) Mass.), 2024-12, Vol.112 (24), p.4147-4158.e5
Hauptverfasser: Thompson, Jessica A.F., Sheahan, Hannah, Dumbalska, Tsvetomira, Sandbrink, Julian D., Piazza, Manuela, Summerfield, Christopher
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To understand a visual scene, observers need to both recognize objects and encode relational structure. For example, a scene comprising three apples requires the observer to encode concepts of “apple” and “three.” In the primate brain, these functions rely on dual (ventral and dorsal) processing streams. Object recognition in primates has been successfully modeled with deep neural networks, but how scene structure (including numerosity) is encoded remains poorly understood. Here, we built a deep learning model, based on the dual-stream architecture of the primate brain, which is able to count items “zero-shot”—even if the objects themselves are unfamiliar. Our dual-stream network forms spatial response fields and lognormal number codes that resemble those observed in the macaque posterior parietal cortex. The dual-stream network also makes successful predictions about human counting behavior. Our results provide evidence for an enactive theory of the role of the posterior parietal cortex in visual scene understanding. •We describe a dual-stream neural network model that displays zero-shot counting•With ablations, we show how our dual-stream architecture supports this ability•The model replicates several aspects of human counting behavior and development•The learned representations mimic properties of neural codes for number and space How does the brain represent the structure of a visual scene (the relations among items, e.g., the cardinality) independent of scene contents (the objects in the scene, e.g., item identity)? Thompson et al. propose a dual-stream neural network model based on the parallel pathways of the primate visual system.
ISSN:0896-6273
1097-4199
1097-4199
DOI:10.1016/j.neuron.2024.10.008