Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales

Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Qi, Shuren, Zhang, Yushu, Wang, Chao, Xia, Zhihua, Cao, Xiaochun, Weng, Jian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Qi, Shuren Zhang, Yushu Wang, Chao Xia, Zhihua Cao, Xiaochun Weng, Jian
description	Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
doi_str_mv	10.48550/arxiv.2402.15430
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_15430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_15430</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</originalsourceid><addsrcrecordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><source>arXiv.org</source><creator>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creator><creatorcontrib>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creatorcontrib><description>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</description><identifier>DOI: 10.48550/arxiv.2402.15430</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.15430$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.15430$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><description>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</recordid><startdate>20240223</startdate><enddate>20240223</enddate><creator>Qi, Shuren</creator><creator>Zhang, Yushu</creator><creator>Wang, Chao</creator><creator>Xia, Zhihua</creator><creator>Cao, Xiaochun</creator><creator>Weng, Jian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240223</creationdate><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><author>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qi, Shuren</au><au>Zhang, Yushu</au><au>Wang, Chao</au><au>Xia, Zhihua</au><au>Cao, Xiaochun</au><au>Weng, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</atitle><date>2024-02-23</date><risdate>2024</risdate><abstract>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</abstract><doi>10.48550/arxiv.2402.15430</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2402.15430
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2402_15430
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T06%3A41%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Invariance%20for%20Robust%20and%20Interpretable%20Vision%20Tasks%20at%20Larger%20Scales&rft.au=Qi,%20Shuren&rft.date=2024-02-23&rft_id=info:doi/10.48550/arxiv.2402.15430&rft_dat=%3Carxiv_GOX%3E2402_15430%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true