Going beyond still images to improve input variance resilience in multi-stream vision understanding models

Traditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientific reports 2024-07, Vol.14 (1), p.15366-13, Article 15366
Hauptverfasser:	Fadaei, Amir Hosein, Dehaqani, Mohammad-Reza A.
Format:	Artikel
Sprache:	eng
Schlagworte:	631/378 631/378/116 631/378/116/1925 631/378/2613 Adaptability Brain research Classification Deep neural networks Humanities and Social Sciences multidisciplinary Neuroimaging Science Science (multidisciplinary) Spatiotemporal features Temporal variations Vision classification Vision understanding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-understanding models with spatiotemporal features has been limited. Drawing inspiration from natural vision, which exhibits remarkable resilience to input changes, our research focuses on the development of a brain-inspired model for vision understanding trained with videos. Our findings demonstrate that models that train on videos instead of still images and include temporal features become more resilient to various alternations on input media.
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-024-66346-w