Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Guo, Yingmei, Shou, Linjun, Pei, Jian, Gong, Ming, Xu, Mingxing, Wu, Zhiyong, Jiang, Daxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Guo, Yingmei Shou, Linjun Pei, Jian Gong, Ming Xu, Mingxing Wu, Zhiyong Jiang, Daxin
description	Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.
doi_str_mv	10.48550/arxiv.2109.01583
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2109_01583</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2109_01583</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-61f049c5b8bc7a8e39c271dc400d06a35dcaf8af8a59b912a4079ed9beb683b93</originalsourceid><addsrcrecordid>eNotj89OhDAYxHvxYFYfwJN9AbClFNrjin8T1MOuZ_KVfhAiFNIW4769u6vJJJNJZib5EXLDWZorKdkd-J_hO8040ynjUolL0tcI3g2up52fJ_q2jnFYRqTv8xAOdLv2E7qIlj5ABLrDGGg3e3qPMaKnlZ9DSOrjeoWR7pb5Cx2t4RR7pJ_Oog8RnD0WrshFB2PA63_fkP3T4756SeqP59dqWydQlCIpeMdy3UqjTFuCQqHbrOS2zRmzrAAhbQudOklqo3kGOSs1Wm3QFEoYLTbk9u_2TNosfpjAH5oTcXMmFr-D9VKP</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding</title><source>arXiv.org</source><creator>Guo, Yingmei ; Shou, Linjun ; Pei, Jian ; Gong, Ming ; Xu, Mingxing ; Wu, Zhiyong ; Jiang, Daxin</creator><creatorcontrib>Guo, Yingmei ; Shou, Linjun ; Pei, Jian ; Gong, Ming ; Xu, Mingxing ; Wu, Zhiyong ; Jiang, Daxin</creatorcontrib><description>Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.</description><identifier>DOI: 10.48550/arxiv.2109.01583</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2021-09</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2109.01583$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2109.01583$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Guo, Yingmei</creatorcontrib><creatorcontrib>Shou, Linjun</creatorcontrib><creatorcontrib>Pei, Jian</creatorcontrib><creatorcontrib>Gong, Ming</creatorcontrib><creatorcontrib>Xu, Mingxing</creatorcontrib><creatorcontrib>Wu, Zhiyong</creatorcontrib><creatorcontrib>Jiang, Daxin</creatorcontrib><title>Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding</title><description>Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj89OhDAYxHvxYFYfwJN9AbClFNrjin8T1MOuZ_KVfhAiFNIW4769u6vJJJNJZib5EXLDWZorKdkd-J_hO8040ynjUolL0tcI3g2up52fJ_q2jnFYRqTv8xAOdLv2E7qIlj5ABLrDGGg3e3qPMaKnlZ9DSOrjeoWR7pb5Cx2t4RR7pJ_Oog8RnD0WrshFB2PA63_fkP3T4756SeqP59dqWydQlCIpeMdy3UqjTFuCQqHbrOS2zRmzrAAhbQudOklqo3kGOSs1Wm3QFEoYLTbk9u_2TNosfpjAH5oTcXMmFr-D9VKP</recordid><startdate>20210903</startdate><enddate>20210903</enddate><creator>Guo, Yingmei</creator><creator>Shou, Linjun</creator><creator>Pei, Jian</creator><creator>Gong, Ming</creator><creator>Xu, Mingxing</creator><creator>Wu, Zhiyong</creator><creator>Jiang, Daxin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210903</creationdate><title>Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding</title><author>Guo, Yingmei ; Shou, Linjun ; Pei, Jian ; Gong, Ming ; Xu, Mingxing ; Wu, Zhiyong ; Jiang, Daxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-61f049c5b8bc7a8e39c271dc400d06a35dcaf8af8a59b912a4079ed9beb683b93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Guo, Yingmei</creatorcontrib><creatorcontrib>Shou, Linjun</creatorcontrib><creatorcontrib>Pei, Jian</creatorcontrib><creatorcontrib>Gong, Ming</creatorcontrib><creatorcontrib>Xu, Mingxing</creatorcontrib><creatorcontrib>Wu, Zhiyong</creatorcontrib><creatorcontrib>Jiang, Daxin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Guo, Yingmei</au><au>Shou, Linjun</au><au>Pei, Jian</au><au>Gong, Ming</au><au>Xu, Mingxing</au><au>Wu, Zhiyong</au><au>Jiang, Daxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding</atitle><date>2021-09-03</date><risdate>2021</risdate><abstract>Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.</abstract><doi>10.48550/arxiv.2109.01583</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2109.01583
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2109_01583
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T04%3A18%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20from%20Multiple%20Noisy%20Augmented%20Data%20Sets%20for%20Better%20Cross-Lingual%20Spoken%20Language%20Understanding&rft.au=Guo,%20Yingmei&rft.date=2021-09-03&rft_id=info:doi/10.48550/arxiv.2109.01583&rft_dat=%3Carxiv_GOX%3E2109_01583%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true