Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty

Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the ACM on human-computer interaction 2021-10, Vol.5 (CSCW2), p.1-25, Article 335
Hauptverfasser:	Chen, Quan Ze, Weld, Daniel S., Zhang, Amy X.
Format:	Artikel
Sprache:	eng
Schlagworte:	Collaborative interaction Human computer interaction (HCI) Human-centered computing Interaction paradigms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	25
container_issue	CSCW2
container_start_page	1
container_title	Proceedings of the ACM on human-computer interaction
container_volume	5
creator	Chen, Quan Ze Weld, Daniel S. Zhang, Amy X.
description	Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions.
doi_str_mv	10.1145/3476076
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3476076</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3476076</sourcerecordid><originalsourceid>FETCH-LOGICAL-a159t-780b6b832d198ca91b4f773b7aa7f5b8836a32c4669672eabdbaf0dadaf1e503</originalsourceid><addsrcrecordid>eNpNkE1LAzEYhIMoWGrx7ik3T6v52CS73sqiVSgoWs_Lm4_F6HYjSbT033elVTzNDPMwh0HonJIrSktxzUsliZJHaMKE4gWhJTv-50_RLKV3QgitBBE1m6CnReit74P5SDe4CUPyKbsh4yaGjU3hKxpn8YuBHiKeD0PIkP1I4Y3Pb_jZ9WP8dvh1MC5m8EPenqGTDvrkZgedotXd7aq5L5aPi4dmviyAijoXqiJa6oozS-vKQE112SnFtQJQndBVxSVwZkopa6mYA201dMSChY46QfgUXe5nTQwpRde1n9GvIW5bStqfK9rDFSN5sSfBrP-g33IHamlZ-w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty</title><source>ACM Digital Library Complete</source><creator>Chen, Quan Ze ; Weld, Daniel S. ; Zhang, Amy X.</creator><creatorcontrib>Chen, Quan Ze ; Weld, Daniel S. ; Zhang, Amy X.</creatorcontrib><description>Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions.</description><identifier>ISSN: 2573-0142</identifier><identifier>EISSN: 2573-0142</identifier><identifier>DOI: 10.1145/3476076</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Collaborative interaction ; Human computer interaction (HCI) ; Human-centered computing ; Interaction paradigms</subject><ispartof>Proceedings of the ACM on human-computer interaction, 2021-10, Vol.5 (CSCW2), p.1-25, Article 335</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a159t-780b6b832d198ca91b4f773b7aa7f5b8836a32c4669672eabdbaf0dadaf1e503</citedby><cites>FETCH-LOGICAL-a159t-780b6b832d198ca91b4f773b7aa7f5b8836a32c4669672eabdbaf0dadaf1e503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3476076$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,76197</link.rule.ids></links><search><creatorcontrib>Chen, Quan Ze</creatorcontrib><creatorcontrib>Weld, Daniel S.</creatorcontrib><creatorcontrib>Zhang, Amy X.</creatorcontrib><title>Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty</title><title>Proceedings of the ACM on human-computer interaction</title><addtitle>ACM PACMHCI</addtitle><description>Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions.</description><subject>Collaborative interaction</subject><subject>Human computer interaction (HCI)</subject><subject>Human-centered computing</subject><subject>Interaction paradigms</subject><issn>2573-0142</issn><issn>2573-0142</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpNkE1LAzEYhIMoWGrx7ik3T6v52CS73sqiVSgoWs_Lm4_F6HYjSbT033elVTzNDPMwh0HonJIrSktxzUsliZJHaMKE4gWhJTv-50_RLKV3QgitBBE1m6CnReit74P5SDe4CUPyKbsh4yaGjU3hKxpn8YuBHiKeD0PIkP1I4Y3Pb_jZ9WP8dvh1MC5m8EPenqGTDvrkZgedotXd7aq5L5aPi4dmviyAijoXqiJa6oozS-vKQE112SnFtQJQndBVxSVwZkopa6mYA201dMSChY46QfgUXe5nTQwpRde1n9GvIW5bStqfK9rDFSN5sSfBrP-g33IHamlZ-w</recordid><startdate>20211018</startdate><enddate>20211018</enddate><creator>Chen, Quan Ze</creator><creator>Weld, Daniel S.</creator><creator>Zhang, Amy X.</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20211018</creationdate><title>Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty</title><author>Chen, Quan Ze ; Weld, Daniel S. ; Zhang, Amy X.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a159t-780b6b832d198ca91b4f773b7aa7f5b8836a32c4669672eabdbaf0dadaf1e503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Collaborative interaction</topic><topic>Human computer interaction (HCI)</topic><topic>Human-centered computing</topic><topic>Interaction paradigms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Quan Ze</creatorcontrib><creatorcontrib>Weld, Daniel S.</creatorcontrib><creatorcontrib>Zhang, Amy X.</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on human-computer interaction</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Quan Ze</au><au>Weld, Daniel S.</au><au>Zhang, Amy X.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty</atitle><jtitle>Proceedings of the ACM on human-computer interaction</jtitle><stitle>ACM PACMHCI</stitle><date>2021-10-18</date><risdate>2021</risdate><volume>5</volume><issue>CSCW2</issue><spage>1</spage><epage>25</epage><pages>1-25</pages><artnum>335</artnum><issn>2573-0142</issn><eissn>2573-0142</eissn><abstract>Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks can improve consistency in domains where interpretation of the scale is not universal, and that (2) representing items with ranges lets us simultaneously capture different sources of uncertainty leading to better estimates of pairwise relationship distributions.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3476076</doi><tpages>25</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2573-0142
ispartof	Proceedings of the ACM on human-computer interaction, 2021-10, Vol.5 (CSCW2), p.1-25, Article 335
issn	2573-0142 2573-0142
language	eng
recordid	cdi_crossref_primary_10_1145_3476076
source	ACM Digital Library Complete
subjects	Collaborative interaction Human computer interaction (HCI) Human-centered computing Interaction paradigms
title	Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T07%3A33%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Goldilocks:%20Consistent%20Crowdsourced%20Scalar%20Annotations%20with%20Relative%20Uncertainty&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20human-computer%20interaction&rft.au=Chen,%20Quan%20Ze&rft.date=2021-10-18&rft.volume=5&rft.issue=CSCW2&rft.spage=1&rft.epage=25&rft.pages=1-25&rft.artnum=335&rft.issn=2573-0142&rft.eissn=2573-0142&rft_id=info:doi/10.1145/3476076&rft_dat=%3Cacm_cross%3E3476076%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true