Leveraging Color Channel Independence for Improved Unsupervised Object Detection
Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encod...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Object-centric architectures can learn to extract distinct object
representations from visual scenes, enabling downstream applications on the
object level. Similarly to autoencoder-based image models, object-centric
approaches have been trained on the unsupervised reconstruction loss of images
encoded by RGB color spaces. In our work, we challenge the common assumption
that RGB images are the optimal color space for unsupervised learning in
computer vision. We discuss conceptually and empirically that other color
spaces, such as HSV, bear essential characteristics for object-centric
representation learning, like robustness to lighting conditions. We further
show that models improve when requiring them to predict additional color
channels. Specifically, we propose to transform the predicted targets to the
RGB-S space, which extends RGB with HSV's saturation component and leads to
markedly better reconstruction and disentanglement for five common evaluation
datasets. The use of composite color spaces can be implemented with basically
no computational overhead, is agnostic of the models' architecture, and is
universally applicable across a wide range of visual computing tasks and
training types. The findings of our approach encourage additional
investigations in computer vision tasks beyond object-centric learning. |
---|---|
DOI: | 10.48550/arxiv.2412.15150 |