Multimodal Framework for Long-Tailed Recognition

Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2024-11, Vol.14 (22), p.10572
Hauptverfasser: Chen, Jian, Zhao, Jianyin, Gu, Jiaojiao, Qin, Yufeng, Ji, Hong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 22
container_start_page 10572
container_title Applied sciences
container_volume 14
creator Chen, Jian
Zhao, Jianyin
Gu, Jiaojiao
Qin, Yufeng
Ji, Hong
description Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification.
doi_str_mv 10.3390/app142210572
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3132840289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3132840289</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1032-76707922a0f406c061aa09867a4f736f1d93411ac4e2a3f3bdf796e6703b2a583</originalsourceid><addsrcrecordid>eNpNkEFLxDAQhYMouKx78wcUvFqdZLJJc5TFVaEiyHoOs22ydG2bmmwR_72V9bDvMu_wZubxMXbN4Q7RwD0NA5dCcFhqccZmArTKUXJ9fuIv2SKlPUwyHAsOMwavY3toulBTm60jde47xM_Mh5iVod_lG2paV2fvrgq7vjk0ob9iF57a5Bb_c84-1o-b1XNevj29rB7KvOKAItdKgzZCEHgJqgLFicAUSpP0GpXntZkKcaqkE4Qet7XXRrlpC7eClgXO2c3x7hDD1-jSwe7DGPvppUWOopAgCjOlbo-pKoaUovN2iE1H8cdysH9Y7CkW_AVOWlJ8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3132840289</pqid></control><display><type>article</type><title>Multimodal Framework for Long-Tailed Recognition</title><source>DOAJ Directory of Open Access Journals</source><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Chen, Jian ; Zhao, Jianyin ; Gu, Jiaojiao ; Qin, Yufeng ; Ji, Hong</creator><creatorcontrib>Chen, Jian ; Zhao, Jianyin ; Gu, Jiaojiao ; Qin, Yufeng ; Ji, Hong</creatorcontrib><description>Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app142210572</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Classification ; Semantics</subject><ispartof>Applied sciences, 2024-11, Vol.14 (22), p.10572</ispartof><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1032-76707922a0f406c061aa09867a4f736f1d93411ac4e2a3f3bdf796e6703b2a583</cites><orcidid>0000-0002-7301-824X ; 0000-0003-4711-4374</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,782,786,866,27933,27934</link.rule.ids></links><search><creatorcontrib>Chen, Jian</creatorcontrib><creatorcontrib>Zhao, Jianyin</creatorcontrib><creatorcontrib>Gu, Jiaojiao</creatorcontrib><creatorcontrib>Qin, Yufeng</creatorcontrib><creatorcontrib>Ji, Hong</creatorcontrib><title>Multimodal Framework for Long-Tailed Recognition</title><title>Applied sciences</title><description>Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification.</description><subject>Classification</subject><subject>Semantics</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpNkEFLxDAQhYMouKx78wcUvFqdZLJJc5TFVaEiyHoOs22ydG2bmmwR_72V9bDvMu_wZubxMXbN4Q7RwD0NA5dCcFhqccZmArTKUXJ9fuIv2SKlPUwyHAsOMwavY3toulBTm60jde47xM_Mh5iVod_lG2paV2fvrgq7vjk0ob9iF57a5Bb_c84-1o-b1XNevj29rB7KvOKAItdKgzZCEHgJqgLFicAUSpP0GpXntZkKcaqkE4Qet7XXRrlpC7eClgXO2c3x7hDD1-jSwe7DGPvppUWOopAgCjOlbo-pKoaUovN2iE1H8cdysH9Y7CkW_AVOWlJ8</recordid><startdate>20241116</startdate><enddate>20241116</enddate><creator>Chen, Jian</creator><creator>Zhao, Jianyin</creator><creator>Gu, Jiaojiao</creator><creator>Qin, Yufeng</creator><creator>Ji, Hong</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-7301-824X</orcidid><orcidid>https://orcid.org/0000-0003-4711-4374</orcidid></search><sort><creationdate>20241116</creationdate><title>Multimodal Framework for Long-Tailed Recognition</title><author>Chen, Jian ; Zhao, Jianyin ; Gu, Jiaojiao ; Qin, Yufeng ; Ji, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1032-76707922a0f406c061aa09867a4f736f1d93411ac4e2a3f3bdf796e6703b2a583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Jian</creatorcontrib><creatorcontrib>Zhao, Jianyin</creatorcontrib><creatorcontrib>Gu, Jiaojiao</creatorcontrib><creatorcontrib>Qin, Yufeng</creatorcontrib><creatorcontrib>Ji, Hong</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Jian</au><au>Zhao, Jianyin</au><au>Gu, Jiaojiao</au><au>Qin, Yufeng</au><au>Ji, Hong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Framework for Long-Tailed Recognition</atitle><jtitle>Applied sciences</jtitle><date>2024-11-16</date><risdate>2024</risdate><volume>14</volume><issue>22</issue><spage>10572</spage><pages>10572-</pages><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/app142210572</doi><orcidid>https://orcid.org/0000-0002-7301-824X</orcidid><orcidid>https://orcid.org/0000-0003-4711-4374</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2076-3417
ispartof Applied sciences, 2024-11, Vol.14 (22), p.10572
issn 2076-3417
2076-3417
language eng
recordid cdi_proquest_journals_3132840289
source DOAJ Directory of Open Access Journals; MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Classification
Semantics
title Multimodal Framework for Long-Tailed Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T13%3A26%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Framework%20for%20Long-Tailed%20Recognition&rft.jtitle=Applied%20sciences&rft.au=Chen,%20Jian&rft.date=2024-11-16&rft.volume=14&rft.issue=22&rft.spage=10572&rft.pages=10572-&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app142210572&rft_dat=%3Cproquest_cross%3E3132840289%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3132840289&rft_id=info:pmid/&rfr_iscdi=true