A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: van Niekerk, Carel, Geishauser, Christian, Heck, Michael, Feng, Shutong, Lin, Hsien-chin, Lubis, Nurul, Ruppik, Benjamin, Vukovic, Renato, Gašić, Milica
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator van Niekerk, Carel
Geishauser, Christian
Heck, Michael
Feng, Shutong
Lin, Hsien-chin
Lubis, Nurul
Ruppik, Benjamin
Vukovic, Renato
Gašić, Milica
description Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.
doi_str_mv 10.48550/arxiv.2310.08944
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_08944</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_08944</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2310_089443</originalsourceid><addsrcrecordid>eNqFjsEKgkAURWfTIqoPaNX7Ac1SwZYiRQtb1TZkdN7EA5uxNyr192m1b3Xhcg4cIZabwI-SOA7Wkp_U-9twOIJkF0VTcU0hs0aTQlOhV0qHCtLq0ZGjlqyBk1VYg7YMZ6y157oGuacv1VKPkKNkQ-YG0ijIZTnQmWXGatTnYqJl7XDx25lYHfaX7Oh9QoqG6S75VYxBxSco_E-8AUmGQQI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><source>arXiv.org</source><creator>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</creator><creatorcontrib>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</creatorcontrib><description>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</description><identifier>DOI: 10.48550/arxiv.2310.08944</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.08944$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.08944$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>van Niekerk, Carel</creatorcontrib><creatorcontrib>Geishauser, Christian</creatorcontrib><creatorcontrib>Heck, Michael</creatorcontrib><creatorcontrib>Feng, Shutong</creatorcontrib><creatorcontrib>Lin, Hsien-chin</creatorcontrib><creatorcontrib>Lubis, Nurul</creatorcontrib><creatorcontrib>Ruppik, Benjamin</creatorcontrib><creatorcontrib>Vukovic, Renato</creatorcontrib><creatorcontrib>Gašić, Milica</creatorcontrib><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><description>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjsEKgkAURWfTIqoPaNX7Ac1SwZYiRQtb1TZkdN7EA5uxNyr192m1b3Xhcg4cIZabwI-SOA7Wkp_U-9twOIJkF0VTcU0hs0aTQlOhV0qHCtLq0ZGjlqyBk1VYg7YMZ6y157oGuacv1VKPkKNkQ-YG0ijIZTnQmWXGatTnYqJl7XDx25lYHfaX7Oh9QoqG6S75VYxBxSco_E-8AUmGQQI</recordid><startdate>20231013</startdate><enddate>20231013</enddate><creator>van Niekerk, Carel</creator><creator>Geishauser, Christian</creator><creator>Heck, Michael</creator><creator>Feng, Shutong</creator><creator>Lin, Hsien-chin</creator><creator>Lubis, Nurul</creator><creator>Ruppik, Benjamin</creator><creator>Vukovic, Renato</creator><creator>Gašić, Milica</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231013</creationdate><title>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</title><author>van Niekerk, Carel ; Geishauser, Christian ; Heck, Michael ; Feng, Shutong ; Lin, Hsien-chin ; Lubis, Nurul ; Ruppik, Benjamin ; Vukovic, Renato ; Gašić, Milica</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2310_089443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>van Niekerk, Carel</creatorcontrib><creatorcontrib>Geishauser, Christian</creatorcontrib><creatorcontrib>Heck, Michael</creatorcontrib><creatorcontrib>Feng, Shutong</creatorcontrib><creatorcontrib>Lin, Hsien-chin</creatorcontrib><creatorcontrib>Lubis, Nurul</creatorcontrib><creatorcontrib>Ruppik, Benjamin</creatorcontrib><creatorcontrib>Vukovic, Renato</creatorcontrib><creatorcontrib>Gašić, Milica</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>van Niekerk, Carel</au><au>Geishauser, Christian</au><au>Heck, Michael</au><au>Feng, Shutong</au><au>Lin, Hsien-chin</au><au>Lubis, Nurul</au><au>Ruppik, Benjamin</au><au>Vukovic, Renato</au><au>Gašić, Milica</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction</atitle><date>2023-10-13</date><risdate>2023</risdate><abstract>Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present CAMEL (Confidence-based Acquisition Model for Efficient self-supervised active Learning), a pool-based active learning framework tailored to sequential multi-output problems. CAMEL possesses two core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, and (2) it facilitates self-supervision for the remainder of the sequence. By deploying a label correction mechanism, CAMEL can also be utilised for data cleaning. We evaluate CAMEL on two sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMEL significantly outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.</abstract><doi>10.48550/arxiv.2310.08944</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2310.08944
ispartof
issn
language eng
recordid cdi_arxiv_primary_2310_08944
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Learning
title A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T11%3A22%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Confidence-based%20Acquisition%20Model%20for%20Self-supervised%20Active%20Learning%20and%20Label%20Correction&rft.au=van%20Niekerk,%20Carel&rft.date=2023-10-13&rft_id=info:doi/10.48550/arxiv.2310.08944&rft_dat=%3Carxiv_GOX%3E2310_08944%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true