Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
Developing robust and interpretable vision systems is a crucial step towards trustworthy artificial intelligence. In this regard, a promising paradigm considers embedding task-required invariant structures, e.g., geometric invariance, in the fundamental image representation. However, such invariant...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Qi, Shuren Zhang, Yushu Wang, Chao Xia, Zhihua Cao, Xiaochun Weng, Jian |
description | Developing robust and interpretable vision systems is a crucial step towards
trustworthy artificial intelligence. In this regard, a promising paradigm
considers embedding task-required invariant structures, e.g., geometric
invariance, in the fundamental image representation. However, such invariant
representations typically exhibit limited discriminability, limiting their
applications in larger-scale trustworthy vision tasks. For this open problem,
we conduct a systematic investigation of hierarchical invariance, exploring
this topic from theoretical, practical, and application perspectives. At the
theoretical level, we show how to construct over-complete invariants with a
Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a
fully interpretable manner. The general blueprint, specific definitions,
invariant properties, and numerical implementations are provided. At the
practical level, we discuss how to customize this theoretical framework into a
given task. With the over-completeness, discriminative features w.r.t. the task
can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We
demonstrate the above arguments with accuracy, invariance, and efficiency
results on texture, digit, and parasite classification experiments.
Furthermore, at the application level, our representations are explored in
real-world forensics tasks on adversarial perturbations and Artificial
Intelligence Generated Content (AIGC). Such applications reveal that the
proposed strategy not only realizes the theoretically promised invariance, but
also exhibits competitive discriminability even in the era of deep learning.
For robust and interpretable vision tasks at larger scales, hierarchical
invariant representation can be considered as an effective alternative to
traditional CNN and invariants. |
doi_str_mv | 10.48550/arxiv.2402.15430 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_15430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_15430</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</originalsourceid><addsrcrecordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><source>arXiv.org</source><creator>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creator><creatorcontrib>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</creatorcontrib><description>Developing robust and interpretable vision systems is a crucial step towards
trustworthy artificial intelligence. In this regard, a promising paradigm
considers embedding task-required invariant structures, e.g., geometric
invariance, in the fundamental image representation. However, such invariant
representations typically exhibit limited discriminability, limiting their
applications in larger-scale trustworthy vision tasks. For this open problem,
we conduct a systematic investigation of hierarchical invariance, exploring
this topic from theoretical, practical, and application perspectives. At the
theoretical level, we show how to construct over-complete invariants with a
Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a
fully interpretable manner. The general blueprint, specific definitions,
invariant properties, and numerical implementations are provided. At the
practical level, we discuss how to customize this theoretical framework into a
given task. With the over-completeness, discriminative features w.r.t. the task
can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We
demonstrate the above arguments with accuracy, invariance, and efficiency
results on texture, digit, and parasite classification experiments.
Furthermore, at the application level, our representations are explored in
real-world forensics tasks on adversarial perturbations and Artificial
Intelligence Generated Content (AIGC). Such applications reveal that the
proposed strategy not only realizes the theoretically promised invariance, but
also exhibits competitive discriminability even in the era of deep learning.
For robust and interpretable vision tasks at larger scales, hierarchical
invariant representation can be considered as an effective alternative to
traditional CNN and invariants.</description><identifier>DOI: 10.48550/arxiv.2402.15430</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.15430$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.15430$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><description>Developing robust and interpretable vision systems is a crucial step towards
trustworthy artificial intelligence. In this regard, a promising paradigm
considers embedding task-required invariant structures, e.g., geometric
invariance, in the fundamental image representation. However, such invariant
representations typically exhibit limited discriminability, limiting their
applications in larger-scale trustworthy vision tasks. For this open problem,
we conduct a systematic investigation of hierarchical invariance, exploring
this topic from theoretical, practical, and application perspectives. At the
theoretical level, we show how to construct over-complete invariants with a
Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a
fully interpretable manner. The general blueprint, specific definitions,
invariant properties, and numerical implementations are provided. At the
practical level, we discuss how to customize this theoretical framework into a
given task. With the over-completeness, discriminative features w.r.t. the task
can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We
demonstrate the above arguments with accuracy, invariance, and efficiency
results on texture, digit, and parasite classification experiments.
Furthermore, at the application level, our representations are explored in
real-world forensics tasks on adversarial perturbations and Artificial
Intelligence Generated Content (AIGC). Such applications reveal that the
proposed strategy not only realizes the theoretically promised invariance, but
also exhibits competitive discriminability even in the era of deep learning.
For robust and interpretable vision tasks at larger scales, hierarchical
invariant representation can be considered as an effective alternative to
traditional CNN and invariants.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FKw0AURWfjQqof4Mr5gcQ3mZlkspSithAQanAbXiYvdmhMyptY9O-NtasL58KBI8SdgtQ4a-EB-Tuc0sxAliprNFyL3SYQI_t98DjI7XhCDjh6kv3Ecje1X3GWOHbLMxMfmWZsB5LvIYZplDXGQ5Q4ywr5g1i-LQ6KN-KqxyHS7WVXon5-qtebpHp92a4fqwTzApK-N773TnujCwSliiwncqbDBVlXgMstQVEqpdocNEBJNs9050qgrDOl0Stx_689RzVHDp_IP81fXHOO07-vD0j9</recordid><startdate>20240223</startdate><enddate>20240223</enddate><creator>Qi, Shuren</creator><creator>Zhang, Yushu</creator><creator>Wang, Chao</creator><creator>Xia, Zhihua</creator><creator>Cao, Xiaochun</creator><creator>Weng, Jian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240223</creationdate><title>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</title><author>Qi, Shuren ; Zhang, Yushu ; Wang, Chao ; Xia, Zhihua ; Cao, Xiaochun ; Weng, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ff4cfc83c437a011726ee84da83c5870865e079111b603009e5623d890e2d4943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Qi, Shuren</creatorcontrib><creatorcontrib>Zhang, Yushu</creatorcontrib><creatorcontrib>Wang, Chao</creatorcontrib><creatorcontrib>Xia, Zhihua</creatorcontrib><creatorcontrib>Cao, Xiaochun</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Qi, Shuren</au><au>Zhang, Yushu</au><au>Wang, Chao</au><au>Xia, Zhihua</au><au>Cao, Xiaochun</au><au>Weng, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales</atitle><date>2024-02-23</date><risdate>2024</risdate><abstract>Developing robust and interpretable vision systems is a crucial step towards
trustworthy artificial intelligence. In this regard, a promising paradigm
considers embedding task-required invariant structures, e.g., geometric
invariance, in the fundamental image representation. However, such invariant
representations typically exhibit limited discriminability, limiting their
applications in larger-scale trustworthy vision tasks. For this open problem,
we conduct a systematic investigation of hierarchical invariance, exploring
this topic from theoretical, practical, and application perspectives. At the
theoretical level, we show how to construct over-complete invariants with a
Convolutional Neural Networks (CNN)-like hierarchical architecture yet in a
fully interpretable manner. The general blueprint, specific definitions,
invariant properties, and numerical implementations are provided. At the
practical level, we discuss how to customize this theoretical framework into a
given task. With the over-completeness, discriminative features w.r.t. the task
can be adaptively formed in a Neural Architecture Search (NAS)-like manner. We
demonstrate the above arguments with accuracy, invariance, and efficiency
results on texture, digit, and parasite classification experiments.
Furthermore, at the application level, our representations are explored in
real-world forensics tasks on adversarial perturbations and Artificial
Intelligence Generated Content (AIGC). Such applications reveal that the
proposed strategy not only realizes the theoretically promised invariance, but
also exhibits competitive discriminability even in the era of deep learning.
For robust and interpretable vision tasks at larger scales, hierarchical
invariant representation can be considered as an effective alternative to
traditional CNN and invariants.</abstract><doi>10.48550/arxiv.2402.15430</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2402.15430 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2402_15430 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T06%3A41%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Invariance%20for%20Robust%20and%20Interpretable%20Vision%20Tasks%20at%20Larger%20Scales&rft.au=Qi,%20Shuren&rft.date=2024-02-23&rft_id=info:doi/10.48550/arxiv.2402.15430&rft_dat=%3Carxiv_GOX%3E2402_15430%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |