Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer vision 2022-10, Vol.130 (10), p.2517-2531
Hauptverfasser: Sinha, Saptarshi, Ohashi, Hiroki, Nakamura, Katsuyuki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2531
container_issue 10
container_start_page 2517
container_title International journal of computer vision
container_volume 130
creator Sinha, Saptarshi
Ohashi, Hiroki
Nakamura, Katsuyuki
description Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
doi_str_mv 10.1007/s11263-022-01643-3
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2712349949</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A716919217</galeid><sourcerecordid>A716919217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</originalsourceid><addsrcrecordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2712349949</pqid></control><display><type>article</type><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><source>SpringerNature Journals</source><creator>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creator><creatorcontrib>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creatorcontrib><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-022-01643-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Computer Imaging ; Computer Science ; Data sampling ; Datasets ; Image classification ; Image Processing and Computer Vision ; Image segmentation ; Methods ; Neural networks ; Object recognition ; Pattern Recognition ; Pattern Recognition and Graphics ; Sampling methods ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531</ispartof><rights>The Author(s) 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</citedby><cites>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</cites><orcidid>0000-0002-5207-1551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-022-01643-3$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-022-01643-3$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data sampling</subject><subject>Datasets</subject><subject>Image classification</subject><subject>Image Processing and Computer Vision</subject><subject>Image segmentation</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Sampling methods</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Sinha, Saptarshi</creator><creator>Ohashi, Hiroki</creator><creator>Nakamura, Katsuyuki</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid></search><sort><creationdate>20221001</creationdate><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><author>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data sampling</topic><topic>Datasets</topic><topic>Image classification</topic><topic>Image Processing and Computer Vision</topic><topic>Image segmentation</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Sampling methods</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sinha, Saptarshi</au><au>Ohashi, Hiroki</au><au>Nakamura, Katsuyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>130</volume><issue>10</issue><spage>2517</spage><epage>2531</epage><pages>2517-2531</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-022-01643-3</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0920-5691
ispartof International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531
issn 0920-5691
1573-1405
language eng
recordid cdi_proquest_journals_2712349949
source SpringerNature Journals
subjects Artificial Intelligence
Artificial neural networks
Computer Imaging
Computer Science
Data sampling
Datasets
Image classification
Image Processing and Computer Vision
Image segmentation
Methods
Neural networks
Object recognition
Pattern Recognition
Pattern Recognition and Graphics
Sampling methods
Training
Vision
title Class-Difficulty Based Methods for Long-Tailed Visual Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A05%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Class-Difficulty%20Based%20Methods%20for%20Long-Tailed%20Visual%20Recognition&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Sinha,%20Saptarshi&rft.date=2022-10-01&rft.volume=130&rft.issue=10&rft.spage=2517&rft.epage=2531&rft.pages=2517-2531&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-022-01643-3&rft_dat=%3Cgale_proqu%3EA716919217%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2712349949&rft_id=info:pmid/&rft_galeid=A716919217&rfr_iscdi=true