Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention

Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. Ho...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023-01, Vol.25, p.1-15
Hauptverfasser: Liu, Ziming, Guo, Song, Guo, Jingcai, Xu, Yuanyuan, Huo, Fushuo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 15
container_issue
container_start_page 1
container_title IEEE transactions on multimedia
container_volume 25
creator Liu, Ziming
Guo, Song
Guo, Jingcai
Xu, Yuanyuan
Huo, Fushuo
description Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.
doi_str_mv 10.1109/TMM.2022.3222657
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2022_3222657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9953574</ieee_id><sourcerecordid>2887111204</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</originalsourceid><addsrcrecordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2887111204</pqid></control><display><type>article</type><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creator><creatorcontrib>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creatorcontrib><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3222657</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Attention Mechanism ; Classification ; Computational modeling ; Correlation ; Feature extraction ; Image recognition ; Labels ; Multi-label Zero-shot learning ; Pattern Recognition ; Representations ; Semantic Feature Space ; Semantics ; Task analysis ; Training ; Zero-shot learning</subject><ispartof>IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</citedby><cites>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</cites><orcidid>0000-0003-1030-7834 ; 0000-0001-8001-9585 ; 0000-0001-9831-2202 ; 0000-0002-0449-4525</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><subject>Attention Mechanism</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Image recognition</subject><subject>Labels</subject><subject>Multi-label Zero-shot learning</subject><subject>Pattern Recognition</subject><subject>Representations</subject><subject>Semantic Feature Space</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Zero-shot learning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Liu, Ziming</creator><creator>Guo, Song</creator><creator>Guo, Jingcai</creator><creator>Xu, Yuanyuan</creator><creator>Huo, Fushuo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></search><sort><creationdate>20230101</creationdate><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><author>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Attention Mechanism</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Image recognition</topic><topic>Labels</topic><topic>Multi-label Zero-shot learning</topic><topic>Pattern Recognition</topic><topic>Representations</topic><topic>Semantic Feature Space</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Zero-shot learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Ziming</au><au>Guo, Song</au><au>Guo, Jingcai</au><au>Xu, Yuanyuan</au><au>Huo, Fushuo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>25</volume><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3222657</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15
issn 1520-9210
1941-0077
language eng
recordid cdi_crossref_primary_10_1109_TMM_2022_3222657
source IEEE Electronic Library (IEL)
subjects Attention Mechanism
Classification
Computational modeling
Correlation
Feature extraction
Image recognition
Labels
Multi-label Zero-shot learning
Pattern Recognition
Representations
Semantic Feature Space
Semantics
Task analysis
Training
Zero-shot learning
title Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T08%3A00%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Unbiased%20Multi-label%20Zero-Shot%20Learning%20with%20Pyramid%20and%20Semantic%20Attention&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Liu,%20Ziming&rft.date=2023-01-01&rft.volume=25&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3222657&rft_dat=%3Cproquest_RIE%3E2887111204%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2887111204&rft_id=info:pmid/&rft_ieee_id=9953574&rfr_iscdi=true