Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training

Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth infere...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.101745-101755
Hauptverfasser:	Fu, Yanming, Han, Weigeng, Yang, Jingsang, Lu, Haodong, Yu, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Annotations Crowdsourcing Data augmentation Filtering Labels Machine learning neural network Neural networks Noise correction Noise measurement Supervised learning Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	101755
container_issue
container_start_page	101745
container_title	IEEE access
container_volume	12
creator	Fu, Yanming Han, Weigeng Yang, Jingsang Lu, Haodong Yu, Xin
description	Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth inference algorithms to deduce integrated labels effectively enhances label quality, a certain level of noise persists. To further diminish the noise within crowdsourced labeling, this paper introduces a novel Small Loss-based Noise Correction algorithm (SLNC). SLNC first filters the crowdsourced data, leveraging the characteristic of neural networks to preferentially fits clean samples, thereby obtaining relatively clean and noisy sets. It then employs data augmentation techniques to enhance the clean set and subsequently trains the corrector on this augmented set to rectify the noisy set. SLNC has been evaluated using 16 simulated and two real-world datasets. The results indicate that SLNC surpasses comparative algorithms in the quality of the final labels.
doi_str_mv	10.1109/ACCESS.2024.3432729
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3086433298</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10606484</ieee_id><doaj_id>oai_doaj_org_article_7cd20a3bf82248eeb86e845456f19272</doaj_id><sourcerecordid>3086433298</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-9d7c697fb9cbb429bb2eb2b932ea34a2a2a4d9218c0f6768d68a56a9269a8833</originalsourceid><addsrcrecordid>eNpNkcFPwyAYxRujiWb6F-ihiedO-kEpeKt1U5MlHrY7AqULSweTdpr99zJrzOAA-fJ7D_JektzmaJrniD9UdT1bLqeAgEwxwVACP0uuIKc8wwWm5yf3y-Sm7zcoLhZHRXmVfDx53w_WrdM6-O-m9_ugTZNWzvlBDta7tNJ6H6Q-PKbLrey6dOH7Pp3bbjDhKJMu0vv11riRz56D_TIuXQVpXQSuk4tWdr25-TsnyWo-W9Wv2eL95a2uFpkGxoeMN6WmvGwV10oR4EqBUaA4BiMxkRA3aTjkTKOWlpQ1lMmCSg6US8YwniRvo23j5Ubsgt3KcBBeWvE78GEtZBis7owodQNIYtUyAMKMUYwaRgpS0DbnMb3odT967YL_3Jt-EJsYi4u_FxgxSjAGziKFR0qHmEgw7f-rORLHYsRYjDgWI_6Kiaq7UWWNMScKiihhBP8A1e2J2A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3086433298</pqid></control><display><type>article</type><title>Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>IEEE Xplore Open Access Journals</source><creator>Fu, Yanming ; Han, Weigeng ; Yang, Jingsang ; Lu, Haodong ; Yu, Xin</creator><creatorcontrib>Fu, Yanming ; Han, Weigeng ; Yang, Jingsang ; Lu, Haodong ; Yu, Xin</creatorcontrib><description>Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth inference algorithms to deduce integrated labels effectively enhances label quality, a certain level of noise persists. To further diminish the noise within crowdsourced labeling, this paper introduces a novel Small Loss-based Noise Correction algorithm (SLNC). SLNC first filters the crowdsourced data, leveraging the characteristic of neural networks to preferentially fits clean samples, thereby obtaining relatively clean and noisy sets. It then employs data augmentation techniques to enhance the clean set and subsequently trains the corrector on this augmented set to rectify the noisy set. SLNC has been evaluated using 16 simulated and two real-world datasets. The results indicate that SLNC surpasses comparative algorithms in the quality of the final labels.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3432729</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Algorithms ; Annotations ; Crowdsourcing ; Data augmentation ; Filtering ; Labels ; Machine learning ; neural network ; Neural networks ; Noise correction ; Noise measurement ; Supervised learning ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.101745-101755</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-9d7c697fb9cbb429bb2eb2b932ea34a2a2a4d9218c0f6768d68a56a9269a8833</cites><orcidid>0009-0007-8189-4809 ; 0009-0006-5112-6908 ; 0000-0002-3651-4770</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10606484$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Fu, Yanming</creatorcontrib><creatorcontrib>Han, Weigeng</creatorcontrib><creatorcontrib>Yang, Jingsang</creatorcontrib><creatorcontrib>Lu, Haodong</creatorcontrib><creatorcontrib>Yu, Xin</creatorcontrib><title>Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training</title><title>IEEE access</title><addtitle>Access</addtitle><description>Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth inference algorithms to deduce integrated labels effectively enhances label quality, a certain level of noise persists. To further diminish the noise within crowdsourced labeling, this paper introduces a novel Small Loss-based Noise Correction algorithm (SLNC). SLNC first filters the crowdsourced data, leveraging the characteristic of neural networks to preferentially fits clean samples, thereby obtaining relatively clean and noisy sets. It then employs data augmentation techniques to enhance the clean set and subsequently trains the corrector on this augmented set to rectify the noisy set. SLNC has been evaluated using 16 simulated and two real-world datasets. The results indicate that SLNC surpasses comparative algorithms in the quality of the final labels.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Crowdsourcing</subject><subject>Data augmentation</subject><subject>Filtering</subject><subject>Labels</subject><subject>Machine learning</subject><subject>neural network</subject><subject>Neural networks</subject><subject>Noise correction</subject><subject>Noise measurement</subject><subject>Supervised learning</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkcFPwyAYxRujiWb6F-ihiedO-kEpeKt1U5MlHrY7AqULSweTdpr99zJrzOAA-fJ7D_JektzmaJrniD9UdT1bLqeAgEwxwVACP0uuIKc8wwWm5yf3y-Sm7zcoLhZHRXmVfDx53w_WrdM6-O-m9_ugTZNWzvlBDta7tNJ6H6Q-PKbLrey6dOH7Pp3bbjDhKJMu0vv11riRz56D_TIuXQVpXQSuk4tWdr25-TsnyWo-W9Wv2eL95a2uFpkGxoeMN6WmvGwV10oR4EqBUaA4BiMxkRA3aTjkTKOWlpQ1lMmCSg6US8YwniRvo23j5Ubsgt3KcBBeWvE78GEtZBis7owodQNIYtUyAMKMUYwaRgpS0DbnMb3odT967YL_3Jt-EJsYi4u_FxgxSjAGziKFR0qHmEgw7f-rORLHYsRYjDgWI_6Kiaq7UWWNMScKiihhBP8A1e2J2A</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Fu, Yanming</creator><creator>Han, Weigeng</creator><creator>Yang, Jingsang</creator><creator>Lu, Haodong</creator><creator>Yu, Xin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0007-8189-4809</orcidid><orcidid>https://orcid.org/0009-0006-5112-6908</orcidid><orcidid>https://orcid.org/0000-0002-3651-4770</orcidid></search><sort><creationdate>2024</creationdate><title>Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training</title><author>Fu, Yanming ; Han, Weigeng ; Yang, Jingsang ; Lu, Haodong ; Yu, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-9d7c697fb9cbb429bb2eb2b932ea34a2a2a4d9218c0f6768d68a56a9269a8833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Crowdsourcing</topic><topic>Data augmentation</topic><topic>Filtering</topic><topic>Labels</topic><topic>Machine learning</topic><topic>neural network</topic><topic>Neural networks</topic><topic>Noise correction</topic><topic>Noise measurement</topic><topic>Supervised learning</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fu, Yanming</creatorcontrib><creatorcontrib>Han, Weigeng</creatorcontrib><creatorcontrib>Yang, Jingsang</creatorcontrib><creatorcontrib>Lu, Haodong</creatorcontrib><creatorcontrib>Yu, Xin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fu, Yanming</au><au>Han, Weigeng</au><au>Yang, Jingsang</au><au>Lu, Haodong</au><au>Yu, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>101745</spage><epage>101755</epage><pages>101745-101755</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Crowdsourcing platforms provide an efficient and cost-effective means to acquire the extensive labeled data necessary for supervised learning. However, the labels provided by untrained crowdsourcing workers often contain a considerable amount of noise. Although the application of ground truth inference algorithms to deduce integrated labels effectively enhances label quality, a certain level of noise persists. To further diminish the noise within crowdsourced labeling, this paper introduces a novel Small Loss-based Noise Correction algorithm (SLNC). SLNC first filters the crowdsourced data, leveraging the characteristic of neural networks to preferentially fits clean samples, thereby obtaining relatively clean and noisy sets. It then employs data augmentation techniques to enhance the clean set and subsequently trains the corrector on this augmented set to rectify the noisy set. SLNC has been evaluated using 16 simulated and two real-world datasets. The results indicate that SLNC surpasses comparative algorithms in the quality of the final labels.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3432729</doi><tpages>11</tpages><orcidid>https://orcid.org/0009-0007-8189-4809</orcidid><orcidid>https://orcid.org/0009-0006-5112-6908</orcidid><orcidid>https://orcid.org/0000-0002-3651-4770</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.101745-101755
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_3086433298
source	DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; IEEE Xplore Open Access Journals
subjects	Accuracy Algorithms Annotations Crowdsourcing Data augmentation Filtering Labels Machine learning neural network Neural networks Noise correction Noise measurement Supervised learning Training
title	Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T17%3A40%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Boosting%20Crowdsourced%20Annotation%20Accuracy:%20Small%20Loss%20Filtering%20and%20Augmentation-Driven%20Training&rft.jtitle=IEEE%20access&rft.au=Fu,%20Yanming&rft.date=2024&rft.volume=12&rft.spage=101745&rft.epage=101755&rft.pages=101745-101755&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3432729&rft_dat=%3Cproquest_cross%3E3086433298%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3086433298&rft_id=info:pmid/&rft_ieee_id=10606484&rft_doaj_id=oai_doaj_org_article_7cd20a3bf82248eeb86e845456f19272&rfr_iscdi=true