Tobias: A Random CNN Sees Objects

This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (" T he ob ject i s a t s ight") in this paper. This empirical i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2024-02, Vol.46 (2), p.1-15
Hauptverfasser:	Cao, Yun-Hao, Wu, Jianxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmark testing Bias Convolutional neural networks Empirical analysis Image classification Location awareness Machine learning object localization Object recognition randomly initialized networks Self-supervised learning Supervised learning Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (" T he ob ject i s a t s ight") in this paper. This empirical inductive bias is further theoretically analyzed and empirically verified, and successfully applied to self-supervised learning as well as supervised learning. For self-supervised learning, a CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with different backgrounds, where the foreground and background separation is guided by Tobias. Experimental results show that the proposed Tobias significantly improves downstream tasks, especially for object detection. This paper also shows that Tobias has consistent improvements on training sets of different sizes, and is more resilient to changes in image augmentations. Furthermore, we apply Tobias to supervised image classification by letting the average pooling layer focus on foreground regions, which achieves improved performance on various benchmarks.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2023.3329498