Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales

Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Qi, Shuren, Zhang, Yushu, Wang, Chao, Xia, Zhihua, Cao, Xiaochun, Weng, Jian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Qi, Shuren
Zhang, Yushu
Wang, Chao
Xia, Zhihua
Cao, Xiaochun
Weng, Jian
description Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
doi_str_mv 10.48550/arxiv.2402.15430
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_15430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_15430</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</originalsourceid><addsrcrecordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><source>arXiv.org</source><creator>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creator><creatorcontrib>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creatorcontrib><description>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</description><identifier>DOI: 10.48550/arxiv.2402.15430</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.15430$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.15430$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><description>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</recordid><startdate>20240223</startdate><enddate>20240223</enddate><creator>Qi, Shuren</creator><creator>Zhang, Yushu</creator><creator>Wang, Chao</creator><creator>Xia, Zhihua</creator><creator>Cao, Xiaochun</creator><creator>Weng, Jian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240223</creationdate><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><author>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qi, Shuren</au><au>Zhang, Yushu</au><au>Wang, Chao</au><au>Xia, Zhihua</au><au>Cao, Xiaochun</au><au>Weng, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</atitle><date>2024-02-23</date><risdate>2024</risdate><abstract>Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant representations typically exhibit limited discriminability, limiting their applications in larger-scale trustworthy vision tasks. For this open problem, we conduct a systematic investigation of hierarchical invariance, exploring this topic from theoretical, practical, and application perspectives. At the theoretical level, we show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a fully interpretable manner. The general blueprint, specific definitions, invariant properties, and numerical implementations are provided. At the practical level, we discuss how to customize this theoretical framework into a given task. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We demonstrate the above arguments with accuracy, invariance, and efficiency results on texture, digit, and parasite classification experiments. Furthermore, at the application level, our representations are explored in real-world forensics tasks on adversarial perturbations and Artificial Intelligence Generated Content (AIGC). Such applications reveal that the proposed strategy not only realizes the theoretically promised invariance, but also exhibits competitive discriminability even in the era of deep learning. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.</abstract><doi>10.48550/arxiv.2402.15430</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.15430
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_15430
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
title Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T06%3A41%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Invariance%20for%20Robust%20and%20Interpretable%20Vision%20Tasks%20at%20Larger%20Scales&rft.au=Qi,%20Shuren&rft.date=2024-02-23&rft_id=info:doi/10.48550/arxiv.2402.15430&rft_dat=%3Carxiv_GOX%3E2402_15430%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true