Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2022-10, Vol.130 (10), p.2517-2531
Hauptverfasser:	Sinha, Saptarshi, Ohashi, Hiroki, Nakamura, Katsuyuki
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Artificial neural networks Computer Imaging Computer Science Data sampling Datasets Image classification Image Processing and Computer Vision Image segmentation Methods Neural networks Object recognition Pattern Recognition Pattern Recognition and Graphics Sampling methods Training Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2531
container_issue	10
container_start_page	2517
container_title	International journal of computer vision
container_volume	130
creator	Sinha, Saptarshi Ohashi, Hiroki Nakamura, Katsuyuki
description	Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
doi_str_mv	10.1007/s11263-022-01643-3
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2712349949</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A716919217</galeid><sourcerecordid>A716919217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</originalsourceid><addsrcrecordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2712349949</pqid></control><display><type>article</type><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><source>SpringerNature Journals</source><creator>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creator><creatorcontrib>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</creatorcontrib><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-022-01643-3</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Computer Imaging ; Computer Science ; Data sampling ; Datasets ; Image classification ; Image Processing and Computer Vision ; Image segmentation ; Methods ; Neural networks ; Object recognition ; Pattern Recognition ; Pattern Recognition and Graphics ; Sampling methods ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531</ispartof><rights>The Author(s) 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</citedby><cites>FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</cites><orcidid>0000-0002-5207-1551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-022-01643-3$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-022-01643-3$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data sampling</subject><subject>Datasets</subject><subject>Image classification</subject><subject>Image Processing and Computer Vision</subject><subject>Image segmentation</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Sampling methods</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kU1LxDAQhoMouK7-AU8FTx6ik482zXFdP2FF0NVriGlSI7XRpAX990YriBeZQ2B4nmQmL0L7BI4IgDhOhNCKYaAUA6k4w2wDzUgpGCYcyk00A0kBl5Uk22gnpWcAoDVlM7RYdjolfOqd82bsho_iRCfbFNd2eApNKlyIxSr0LV5r3-X-g0-j7opba0Lb-8GHfhdtOd0lu_dzztH9-dl6eYlXNxdXy8UKG86qAVtbA6sbqBzomnLCuWl40xjjQAIzj9aZsjRM6FKCEVBTKcByTZ0lAh4bYHN0MN37GsPbaNOgnsMY-_ykooJQxqXkMlNHE9XqzirfuzBEbXI19sWb0FuX11ALQfJXSEpEFg7_CJkZ7PvQ6jEldXV3-5elE2tiSClap16jf9HxQxFQXzmoKQeVc1DfOSiWJTZJKcN9a-Pv3P9YnzrqiJI</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Sinha, Saptarshi</creator><creator>Ohashi, Hiroki</creator><creator>Nakamura, Katsuyuki</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid></search><sort><creationdate>20221001</creationdate><title>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</title><author>Sinha, Saptarshi ; Ohashi, Hiroki ; Nakamura, Katsuyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c436t-ee8038d06f0a824144cd4ddccf0903cbefc55c37a590c7082970e4a2fe170bd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data sampling</topic><topic>Datasets</topic><topic>Image classification</topic><topic>Image Processing and Computer Vision</topic><topic>Image segmentation</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Sampling methods</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sinha, Saptarshi</creatorcontrib><creatorcontrib>Ohashi, Hiroki</creatorcontrib><creatorcontrib>Nakamura, Katsuyuki</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sinha, Saptarshi</au><au>Ohashi, Hiroki</au><au>Nakamura, Katsuyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Class-Difficulty Based Methods for Long-Tailed Visual Recognition</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>130</volume><issue>10</issue><spage>2517</spage><epage>2531</epage><pages>2517-2531</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-022-01643-3</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5207-1551</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-5691
ispartof	International journal of computer vision, 2022-10, Vol.130 (10), p.2517-2531
issn	0920-5691 1573-1405
language	eng
recordid	cdi_proquest_journals_2712349949
source	SpringerNature Journals
subjects	Artificial Intelligence Artificial neural networks Computer Imaging Computer Science Data sampling Datasets Image classification Image Processing and Computer Vision Image segmentation Methods Neural networks Object recognition Pattern Recognition Pattern Recognition and Graphics Sampling methods Training Vision
title	Class-Difficulty Based Methods for Long-Tailed Visual Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T09%3A05%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Class-Difficulty%20Based%20Methods%20for%20Long-Tailed%20Visual%20Recognition&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Sinha,%20Saptarshi&rft.date=2022-10-01&rft.volume=130&rft.issue=10&rft.spage=2517&rft.epage=2531&rft.pages=2517-2531&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-022-01643-3&rft_dat=%3Cgale_proqu%3EA716919217%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2712349949&rft_id=info:pmid/&rft_galeid=A716919217&rfr_iscdi=true