Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention

Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. Ho...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023-01, Vol.25, p.1-15
Hauptverfasser:	Liu, Ziming, Guo, Song, Guo, Jingcai, Xu, Yuanyuan, Huo, Fushuo
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention Mechanism Classification Computational modeling Correlation Feature extraction Image recognition Labels Multi-label Zero-shot learning Pattern Recognition Representations Semantic Feature Space Semantics Task analysis Training Zero-shot learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	15
container_issue
container_start_page	1
container_title	IEEE transactions on multimedia
container_volume	25
creator	Liu, Ziming Guo, Song Guo, Jingcai Xu, Yuanyuan Huo, Fushuo
description	Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.
doi_str_mv	10.1109/TMM.2022.3222657
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2022_3222657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9953574</ieee_id><sourcerecordid>2887111204</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</originalsourceid><addsrcrecordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2887111204</pqid></control><display><type>article</type><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creator><creatorcontrib>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creatorcontrib><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3222657</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Attention Mechanism ; Classification ; Computational modeling ; Correlation ; Feature extraction ; Image recognition ; Labels ; Multi-label Zero-shot learning ; Pattern Recognition ; Representations ; Semantic Feature Space ; Semantics ; Task analysis ; Training ; Zero-shot learning</subject><ispartof>IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</citedby><cites>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</cites><orcidid>0000-0003-1030-7834 ; 0000-0001-8001-9585 ; 0000-0001-9831-2202 ; 0000-0002-0449-4525</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><subject>Attention Mechanism</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Image recognition</subject><subject>Labels</subject><subject>Multi-label Zero-shot learning</subject><subject>Pattern Recognition</subject><subject>Representations</subject><subject>Semantic Feature Space</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Zero-shot learning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Liu, Ziming</creator><creator>Guo, Song</creator><creator>Guo, Jingcai</creator><creator>Xu, Yuanyuan</creator><creator>Huo, Fushuo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></search><sort><creationdate>20230101</creationdate><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><author>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Attention Mechanism</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Image recognition</topic><topic>Labels</topic><topic>Multi-label Zero-shot learning</topic><topic>Pattern Recognition</topic><topic>Representations</topic><topic>Semantic Feature Space</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Zero-shot learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Ziming</au><au>Guo, Song</au><au>Guo, Jingcai</au><au>Xu, Yuanyuan</au><au>Huo, Fushuo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>25</volume><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3222657</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15
issn	1520-9210 1941-0077
language	eng
recordid	cdi_crossref_primary_10_1109_TMM_2022_3222657
source	IEEE Electronic Library (IEL)
subjects	Attention Mechanism Classification Computational modeling Correlation Feature extraction Image recognition Labels Multi-label Zero-shot learning Pattern Recognition Representations Semantic Feature Space Semantics Task analysis Training Zero-shot learning
title	Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T08%3A00%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Unbiased%20Multi-label%20Zero-Shot%20Learning%20with%20Pyramid%20and%20Semantic%20Attention&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Liu,%20Ziming&rft.date=2023-01-01&rft.volume=25&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3222657&rft_dat=%3Cproquest_RIE%3E2887111204%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2887111204&rft_id=info:pmid/&rft_ieee_id=9953574&rfr_iscdi=true