Crowdsourcing Ground Truth for Medical Relation Extraction

Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have prop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on interactive intelligent systems 2018-07, Vol.8 (2), p.1-20
Hauptverfasser:	Dumitrache, Anca, Aroyo, Lora, Welty, Chris
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	20
container_issue	2
container_start_page	1
container_title	ACM transactions on interactive intelligent systems
container_volume	8
creator	Dumitrache, Anca Aroyo, Lora Welty, Chris
description	Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
doi_str_mv	10.1145/3152889
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3152889</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3152889</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-21290e9047d767895370b316aba087b1ef3cdb20f7950ee5d419fdb6a2120b013</originalsourceid><addsrcrecordid>eNo9j01LxDAYhIMouKyLfyE3T9U3SfPlTcq6K6wIsp5LPrVSG0la1H9vFxfnMs9lZhiELglcE1LzG0Y4VUqfoAUlAipRC3b6z5yfo1Up7zCLc8aZXKDbJqcvX9KUXTe84k1O0-DxPk_jG44p48fgO2d6_Bx6M3ZpwOvvMRt3wAt0Fk1fwuroS_Ryv94322r3tHlo7naVo1yNFSVUQ9BQSy-FVHqeBcuIMNaAkpaEyJy3FKLUHELgviY6eivMHAQLhC3R1V-vy6mUHGL7mbsPk39aAu3hdXt8zX4BfgdH8g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Crowdsourcing Ground Truth for Medical Relation Extraction</title><source>ACM Digital Library Complete</source><creator>Dumitrache, Anca ; Aroyo, Lora ; Welty, Chris</creator><creatorcontrib>Dumitrache, Anca ; Aroyo, Lora ; Welty, Chris</creatorcontrib><description>Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.</description><identifier>ISSN: 2160-6455</identifier><identifier>EISSN: 2160-6463</identifier><identifier>DOI: 10.1145/3152889</identifier><language>eng</language><ispartof>ACM transactions on interactive intelligent systems, 2018-07, Vol.8 (2), p.1-20</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-21290e9047d767895370b316aba087b1ef3cdb20f7950ee5d419fdb6a2120b013</citedby><cites>FETCH-LOGICAL-c258t-21290e9047d767895370b316aba087b1ef3cdb20f7950ee5d419fdb6a2120b013</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Dumitrache, Anca</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><title>Crowdsourcing Ground Truth for Medical Relation Extraction</title><title>ACM transactions on interactive intelligent systems</title><description>Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.</description><issn>2160-6455</issn><issn>2160-6463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNo9j01LxDAYhIMouKyLfyE3T9U3SfPlTcq6K6wIsp5LPrVSG0la1H9vFxfnMs9lZhiELglcE1LzG0Y4VUqfoAUlAipRC3b6z5yfo1Up7zCLc8aZXKDbJqcvX9KUXTe84k1O0-DxPk_jG44p48fgO2d6_Bx6M3ZpwOvvMRt3wAt0Fk1fwuroS_Ryv94322r3tHlo7naVo1yNFSVUQ9BQSy-FVHqeBcuIMNaAkpaEyJy3FKLUHELgviY6eivMHAQLhC3R1V-vy6mUHGL7mbsPk39aAu3hdXt8zX4BfgdH8g</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Dumitrache, Anca</creator><creator>Aroyo, Lora</creator><creator>Welty, Chris</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180701</creationdate><title>Crowdsourcing Ground Truth for Medical Relation Extraction</title><author>Dumitrache, Anca ; Aroyo, Lora ; Welty, Chris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-21290e9047d767895370b316aba087b1ef3cdb20f7950ee5d419fdb6a2120b013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dumitrache, Anca</creatorcontrib><creatorcontrib>Aroyo, Lora</creatorcontrib><creatorcontrib>Welty, Chris</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on interactive intelligent systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dumitrache, Anca</au><au>Aroyo, Lora</au><au>Welty, Chris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Crowdsourcing Ground Truth for Medical Relation Extraction</atitle><jtitle>ACM transactions on interactive intelligent systems</jtitle><date>2018-07-01</date><risdate>2018</risdate><volume>8</volume><issue>2</issue><spage>1</spage><epage>20</epage><pages>1-20</pages><issn>2160-6455</issn><eissn>2160-6463</eissn><abstract>Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.</abstract><doi>10.1145/3152889</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2160-6455
ispartof	ACM transactions on interactive intelligent systems, 2018-07, Vol.8 (2), p.1-20
issn	2160-6455 2160-6463
language	eng
recordid	cdi_crossref_primary_10_1145_3152889
source	ACM Digital Library Complete
title	Crowdsourcing Ground Truth for Medical Relation Extraction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T23%3A02%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Crowdsourcing%20Ground%20Truth%20for%20Medical%20Relation%20Extraction&rft.jtitle=ACM%20transactions%20on%20interactive%20intelligent%20systems&rft.au=Dumitrache,%20Anca&rft.date=2018-07-01&rft.volume=8&rft.issue=2&rft.spage=1&rft.epage=20&rft.pages=1-20&rft.issn=2160-6455&rft.eissn=2160-6463&rft_id=info:doi/10.1145/3152889&rft_dat=%3Ccrossref%3E10_1145_3152889%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true