A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	van Niekerk, Carel, Geishauser, Christian, Heck, Michael, Feng, Shutong, Lin, Hsien-chin, Lubis, Nurul, Ruppik, Benjamin, Vukovic, Renato, Gašić, Milica
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	van Niekerk, Carel Geishauser, Christian Heck, Michael Feng, Shutong Lin, Hsien-chin Lubis, Nurul Ruppik, Benjamin Vukovic, Renato Gašić, Milica
description	Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.
doi_str_mv	10.48550/arxiv.2310.08944
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_08944</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_08944</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2310_089443</originalsourceid><addsrcrecordid>eNqFjsEKgkAURWfTIqoPaNX7Ac1SwZYiRQtb1TZkdN7EA5uxNyr192m1b3Xhcg4cIZabwI-SOA7Wkp_U-9twOIJkF0VTcU0hs0aTQlOhV0qHCtLq0ZGjlqyBk1VYg7YMZ6y157oGuacv1VKPkKNkQ-YG0ijIZTnQmWXGatTnYqJl7XDx25lYHfaX7Oh9QoqG6S75VYxBxSco_E-8AUmGQQI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><source>arXiv.org</source><creator>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</creator><creatorcontrib>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</creatorcontrib><description>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</description><identifier>DOI: 10.48550/arxiv.2310.08944</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.08944$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.08944$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>van Niekerk, Carel</creatorcontrib><creatorcontrib>Geishauser, Christian</creatorcontrib><creatorcontrib>Heck, Michael</creatorcontrib><creatorcontrib>Feng, Shutong</creatorcontrib><creatorcontrib>Lin, Hsien-chin</creatorcontrib><creatorcontrib>Lubis, Nurul</creatorcontrib><creatorcontrib>Ruppik, Benjamin</creatorcontrib><creatorcontrib>Vukovic, Renato</creatorcontrib><creatorcontrib>Gašić, Milica</creatorcontrib><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><description>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKgkAURWfTIqoPaNX7Ac1SwZYiRQtb1TZkdN7EA5uxNyr192m1b3Xhcg4cIZabwI-SOA7Wkp_U-9twOIJkF0VTcU0hs0aTQlOhV0qHCtLq0ZGjlqyBk1VYg7YMZ6y157oGuacv1VKPkKNkQ-YG0ijIZTnQmWXGatTnYqJl7XDx25lYHfaX7Oh9QoqG6S75VYxBxSco_E-8AUmGQQI</recordid><startdate>20231013</startdate><enddate>20231013</enddate><creator>van Niekerk, Carel</creator><creator>Geishauser, Christian</creator><creator>Heck, Michael</creator><creator>Feng, Shutong</creator><creator>Lin, Hsien-chin</creator><creator>Lubis, Nurul</creator><creator>Ruppik, Benjamin</creator><creator>Vukovic, Renato</creator><creator>Gašić, Milica</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231013</creationdate><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><author>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2310_089443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>van Niekerk, Carel</creatorcontrib><creatorcontrib>Geishauser, Christian</creatorcontrib><creatorcontrib>Heck, Michael</creatorcontrib><creatorcontrib>Feng, Shutong</creatorcontrib><creatorcontrib>Lin, Hsien-chin</creatorcontrib><creatorcontrib>Lubis, Nurul</creatorcontrib><creatorcontrib>Ruppik, Benjamin</creatorcontrib><creatorcontrib>Vukovic, Renato</creatorcontrib><creatorcontrib>Gašić, Milica</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>van Niekerk, Carel</au><au>Geishauser, Christian</au><au>Heck, Michael</au><au>Feng, Shutong</au><au>Lin, Hsien-chin</au><au>Lubis, Nurul</au><au>Ruppik, Benjamin</au><au>Vukovic, Renato</au><au>Gašić, Milica</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</atitle><date>2023-10-13</date><risdate>2023</risdate><abstract>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</abstract><doi>10.48550/arxiv.2310.08944</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.08944
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_08944
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Learning
title	A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T11%3A22%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Confidence-based%20Acquisition%20Model%20for%20Self-supervised%20Active%20Learning%20and%20Label%20Correction&rft.au=van%20Niekerk,%20Carel&rft.date=2023-10-13&rft_id=info:doi/10.48550/arxiv.2310.08944&rft_dat=%3Carxiv_GOX%3E2310_08944%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true