Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention
Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. Ho...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2023-01, Vol.25, p.1-15 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 15 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on multimedia |
container_volume | 25 |
creator | Liu, Ziming Guo, Song Guo, Jingcai Xu, Yuanyuan Huo, Fushuo |
description | Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins. |
doi_str_mv | 10.1109/TMM.2022.3222657 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2022_3222657</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9953574</ieee_id><sourcerecordid>2887111204</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</originalsourceid><addsrcrecordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2887111204</pqid></control><display><type>article</type><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creator><creatorcontrib>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</creatorcontrib><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3222657</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Attention Mechanism ; Classification ; Computational modeling ; Correlation ; Feature extraction ; Image recognition ; Labels ; Multi-label Zero-shot learning ; Pattern Recognition ; Representations ; Semantic Feature Space ; Semantics ; Task analysis ; Training ; Zero-shot learning</subject><ispartof>IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</citedby><cites>FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</cites><orcidid>0000-0003-1030-7834 ; 0000-0001-8001-9585 ; 0000-0001-9831-2202 ; 0000-0002-0449-4525</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9953574$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</description><subject>Attention Mechanism</subject><subject>Classification</subject><subject>Computational modeling</subject><subject>Correlation</subject><subject>Feature extraction</subject><subject>Image recognition</subject><subject>Labels</subject><subject>Multi-label Zero-shot learning</subject><subject>Pattern Recognition</subject><subject>Representations</subject><subject>Semantic Feature Space</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Zero-shot learning</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1LAzEUxIMoWKt3wUvA89a8bDZpjqX4BS0KbS9eQnb3rU3Z7tYkpfS_N9Li6c1hZt7wI-Qe2AiA6aflfD7ijPNRzjmXhbogA9ACMsaUuky64CzTHNg1uQlhwxiIgqkBWSz7g_V1oKuudDZgTef7NrqstSW29At9ny3WfaQztL5z3Tc9uLimn0dvt66mtqvpAre2i66ikxgxib67JVeNbQPene-QrF6el9O3bPbx-j6dzLKKa4hZmokcKq6EKhXWtrTCoh1LyYCLXGKpsdC60KXgdWmRKdlIKRvQ2OSArMiH5PHUu_P9zx5DNJt-77v00vDxWAEAZyK52MlV-T4Ej43Zebe1_miAmT90JqEzf-jMGV2KPJwiDhH_7WlLXiiR_wJmR2mn</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Liu, Ziming</creator><creator>Guo, Song</creator><creator>Guo, Jingcai</creator><creator>Xu, Yuanyuan</creator><creator>Huo, Fushuo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></search><sort><creationdate>20230101</creationdate><title>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</title><author>Liu, Ziming ; Guo, Song ; Guo, Jingcai ; Xu, Yuanyuan ; Huo, Fushuo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-226e21c2747b7edaba4aea866012436eb9e59959b42dbae076f666f19ef31e053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Attention Mechanism</topic><topic>Classification</topic><topic>Computational modeling</topic><topic>Correlation</topic><topic>Feature extraction</topic><topic>Image recognition</topic><topic>Labels</topic><topic>Multi-label Zero-shot learning</topic><topic>Pattern Recognition</topic><topic>Representations</topic><topic>Semantic Feature Space</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Zero-shot learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Ziming</creatorcontrib><creatorcontrib>Guo, Song</creatorcontrib><creatorcontrib>Guo, Jingcai</creatorcontrib><creatorcontrib>Xu, Yuanyuan</creatorcontrib><creatorcontrib>Huo, Fushuo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Ziming</au><au>Guo, Song</au><au>Guo, Jingcai</au><au>Xu, Yuanyuan</au><au>Huo, Fushuo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>25</volume><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes . We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention ( PFA ) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention ( SA ) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label benchmarks MS-COCO , NUS-WIDE and Open-Images demonstrate that the proposed method surpasses other representative methods by significant margins.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3222657</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-1030-7834</orcidid><orcidid>https://orcid.org/0000-0001-8001-9585</orcidid><orcidid>https://orcid.org/0000-0001-9831-2202</orcidid><orcidid>https://orcid.org/0000-0002-0449-4525</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2023-01, Vol.25, p.1-15 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TMM_2022_3222657 |
source | IEEE Electronic Library (IEL) |
subjects | Attention Mechanism Classification Computational modeling Correlation Feature extraction Image recognition Labels Multi-label Zero-shot learning Pattern Recognition Representations Semantic Feature Space Semantics Task analysis Training Zero-shot learning |
title | Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T08%3A00%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Unbiased%20Multi-label%20Zero-Shot%20Learning%20with%20Pyramid%20and%20Semantic%20Attention&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Liu,%20Ziming&rft.date=2023-01-01&rft.volume=25&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3222657&rft_dat=%3Cproquest_RIE%3E2887111204%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2887111204&rft_id=info:pmid/&rft_ieee_id=9953574&rfr_iscdi=true |