Class-Difficulty Based Methods for Long-Tailed Visual Recognition
Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives...
Gespeichert in:
Veröffentlicht in: | International journal of computer vision 2022-10, Vol.130 (10), p.2517-2531 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2531 |
---|---|
container_issue | 10 |
container_start_page | 2517 |
container_title | International journal of computer vision |
container_volume | 130 |
creator | Sinha, Saptarshi Ohashi, Hiroki Nakamura, Katsuyuki |
description | Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases. |
doi_str_mv | 10.1007/s11263-022-01643-3 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2712349949</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A716919217</galeid><sourcerecordid>A716919217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</originalsourceid><addsrcrecordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2712349949</pqid></control><display><type>article</type><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><source>SpringerNature Journals</source><creator>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creator><creatorcontrib>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creatorcontrib><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-022-01643-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Computer Imaging ; Computer Science ; Data sampling ; Datasets ; Image classification ; Image Processing and Computer Vision ; Image segmentation ; Methods ; Neural networks ; Object recognition ; Pattern Recognition ; Pattern Recognition and Graphics ; Sampling methods ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531</ispartof><rights>The Author(s) 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</citedby><cites>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</cites><orcidid>0000-0002-5207-1551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-022-01643-3$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-022-01643-3$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data sampling</subject><subject>Datasets</subject><subject>Image classification</subject><subject>Image Processing and Computer Vision</subject><subject>Image segmentation</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Sampling methods</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Sinha, Saptarshi</creator><creator>Ohashi, Hiroki</creator><creator>Nakamura, Katsuyuki</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid></search><sort><creationdate>20221001</creationdate><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><author>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data sampling</topic><topic>Datasets</topic><topic>Image classification</topic><topic>Image Processing and Computer Vision</topic><topic>Image segmentation</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Sampling methods</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sinha, Saptarshi</au><au>Ohashi, Hiroki</au><au>Nakamura, Katsuyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>130</volume><issue>10</issue><spage>2517</spage><epage>2531</epage><pages>2517-2531</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-022-01643-3</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-5691 |
ispartof | International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531 |
issn | 0920-5691 1573-1405 |
language | eng |
recordid | cdi_proquest_journals_2712349949 |
source | SpringerNature Journals |
subjects | Artificial Intelligence Artificial neural networks Computer Imaging Computer Science Data sampling Datasets Image classification Image Processing and Computer Vision Image segmentation Methods Neural networks Object recognition Pattern Recognition Pattern Recognition and Graphics Sampling methods Training Vision |
title | Class-Difficulty Based Methods for Long-Tailed Visual Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A05%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Class-Difficulty%20Based%20Methods%20for%20Long-Tailed%20Visual%20Recognition&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Sinha,%20Saptarshi&rft.date=2022-10-01&rft.volume=130&rft.issue=10&rft.spage=2517&rft.epage=2531&rft.pages=2517-2531&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-022-01643-3&rft_dat=%3Cgale_proqu%3EA716919217%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2712349949&rft_id=info:pmid/&rft_galeid=A716919217&rfr_iscdi=true |