Leveraging Color Channel Independence for Improved Unsupervised Object Detection

Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encod...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jäckl, Bastian, Metz, Yannick, Schlegel, Udo, Keim, Daniel A, Fischer, Maximilian T
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jäckl, Bastian Metz, Yannick Schlegel, Udo Keim, Daniel A Fischer, Maximilian T
description	Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.
doi_str_mv	10.48550/arxiv.2412.15150
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_15150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_15150</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_151503</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0NTQ14GQI8EktSy1KTM_MS1dwzs_JL1JwzkjMy0vNUfDMS0ktSAUSecmpCmlACc_cgqL8stQUhdC84tKC1KKyzGIgxz8pKzW5RMEltQRIZebn8TCwpiXmFKfyQmluBnk31xBnD12w3fEFRZm5iUWV8SA3xIPdYExYBQAVNTzR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Leveraging Color Channel Independence for Improved Unsupervised Object Detection</title><source>arXiv.org</source><creator>Jäckl, Bastian ; Metz, Yannick ; Schlegel, Udo ; Keim, Daniel A ; Fischer, Maximilian T</creator><creatorcontrib>Jäckl, Bastian ; Metz, Yannick ; Schlegel, Udo ; Keim, Daniel A ; Fischer, Maximilian T</creatorcontrib><description>Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.</description><identifier>DOI: 10.48550/arxiv.2412.15150</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.15150$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.15150$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jäckl, Bastian</creatorcontrib><creatorcontrib>Metz, Yannick</creatorcontrib><creatorcontrib>Schlegel, Udo</creatorcontrib><creatorcontrib>Keim, Daniel A</creatorcontrib><creatorcontrib>Fischer, Maximilian T</creatorcontrib><title>Leveraging Color Channel Independence for Improved Unsupervised Object Detection</title><description>Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0NTQ14GQI8EktSy1KTM_MS1dwzs_JL1JwzkjMy0vNUfDMS0ktSAUSecmpCmlACc_cgqL8stQUhdC84tKC1KKyzGIgxz8pKzW5RMEltQRIZebn8TCwpiXmFKfyQmluBnk31xBnD12w3fEFRZm5iUWV8SA3xIPdYExYBQAVNTzR</recordid><startdate>20241219</startdate><enddate>20241219</enddate><creator>Jäckl, Bastian</creator><creator>Metz, Yannick</creator><creator>Schlegel, Udo</creator><creator>Keim, Daniel A</creator><creator>Fischer, Maximilian T</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241219</creationdate><title>Leveraging Color Channel Independence for Improved Unsupervised Object Detection</title><author>Jäckl, Bastian ; Metz, Yannick ; Schlegel, Udo ; Keim, Daniel A ; Fischer, Maximilian T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_151503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jäckl, Bastian</creatorcontrib><creatorcontrib>Metz, Yannick</creatorcontrib><creatorcontrib>Schlegel, Udo</creatorcontrib><creatorcontrib>Keim, Daniel A</creatorcontrib><creatorcontrib>Fischer, Maximilian T</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jäckl, Bastian</au><au>Metz, Yannick</au><au>Schlegel, Udo</au><au>Keim, Daniel A</au><au>Fischer, Maximilian T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Color Channel Independence for Improved Unsupervised Object Detection</atitle><date>2024-12-19</date><risdate>2024</risdate><abstract>Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.</abstract><doi>10.48550/arxiv.2412.15150</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.15150
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_15150
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Leveraging Color Channel Independence for Improved Unsupervised Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T12%3A19%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Color%20Channel%20Independence%20for%20Improved%20Unsupervised%20Object%20Detection&rft.au=J%C3%A4ckl,%20Bastian&rft.date=2024-12-19&rft_id=info:doi/10.48550/arxiv.2412.15150&rft_dat=%3Carxiv_GOX%3E2412_15150%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true