Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a re...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gheini, Mozhdeh, Likhomanenko, Tatiana, Sperber, Matthias, Setiawan, Hendra
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gheini, Mozhdeh Likhomanenko, Tatiana Sperber, Matthias Setiawan, Hendra
description	Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.
doi_str_mv	10.48550/arxiv.2212.09982
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2212_09982</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2212_09982</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-d8ef557d755fa72ccef8679f1cf960800e861d074274d9747ef83c01e438a56f3</originalsourceid><addsrcrecordid>eNotj8lOwzAYhH3hgAoPwAm_gIPt2LHNDbVsVaQikQun6K8XaikkkeOwvD005TSaGc1IH0JXjBZCS0lvIH3Hz4JzxgtqjObn6G07xD7j19F7e8BNgn6yKY45Dj2G3p2SDo7-Fr9MfnYDqWHvu9i_46-YD3g3ZzIEsolTTnE_L8sNZLhAZwG6yV_-6wo1D_fN-onUu8fn9V1NoFKcOO2DlMopKQMobq0PulImMBtMRTWlXlfMUSW4Es4oof760lLmRalBVqFcoevT7cLWjil-QPppj4ztwlj-AoRLTNQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data</title><source>arXiv.org</source><creator>Gheini, Mozhdeh ; Likhomanenko, Tatiana ; Sperber, Matthias ; Setiawan, Hendra</creator><creatorcontrib>Gheini, Mozhdeh ; Likhomanenko, Tatiana ; Sperber, Matthias ; Setiawan, Hendra</creatorcontrib><description>Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.</description><identifier>DOI: 10.48550/arxiv.2212.09982</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2022-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2212.09982$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2212.09982$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gheini, Mozhdeh</creatorcontrib><creatorcontrib>Likhomanenko, Tatiana</creatorcontrib><creatorcontrib>Sperber, Matthias</creatorcontrib><creatorcontrib>Setiawan, Hendra</creatorcontrib><title>Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data</title><description>Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8lOwzAYhH3hgAoPwAm_gIPt2LHNDbVsVaQikQun6K8XaikkkeOwvD005TSaGc1IH0JXjBZCS0lvIH3Hz4JzxgtqjObn6G07xD7j19F7e8BNgn6yKY45Dj2G3p2SDo7-Fr9MfnYDqWHvu9i_46-YD3g3ZzIEsolTTnE_L8sNZLhAZwG6yV_-6wo1D_fN-onUu8fn9V1NoFKcOO2DlMopKQMobq0PulImMBtMRTWlXlfMUSW4Es4oof760lLmRalBVqFcoevT7cLWjil-QPppj4ztwlj-AoRLTNQ</recordid><startdate>20221219</startdate><enddate>20221219</enddate><creator>Gheini, Mozhdeh</creator><creator>Likhomanenko, Tatiana</creator><creator>Sperber, Matthias</creator><creator>Setiawan, Hendra</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221219</creationdate><title>Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data</title><author>Gheini, Mozhdeh ; Likhomanenko, Tatiana ; Sperber, Matthias ; Setiawan, Hendra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-d8ef557d755fa72ccef8679f1cf960800e861d074274d9747ef83c01e438a56f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Gheini, Mozhdeh</creatorcontrib><creatorcontrib>Likhomanenko, Tatiana</creatorcontrib><creatorcontrib>Sperber, Matthias</creatorcontrib><creatorcontrib>Setiawan, Hendra</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gheini, Mozhdeh</au><au>Likhomanenko, Tatiana</au><au>Sperber, Matthias</au><au>Setiawan, Hendra</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data</atitle><date>2022-12-19</date><risdate>2022</risdate><abstract>Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in pseudo-label quality degradation. We investigate two categories of remedies that require no additional supervision and target the domain mismatch: pseudo-label filtering and data augmentation. We show that pseudo-label analysis and processing as such results in additional gains on top of the vanilla pseudo-labeling setup resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points.</abstract><doi>10.48550/arxiv.2212.09982</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2212.09982
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2212_09982
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Sound
title	Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T18%3A28%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Joint%20Speech%20Transcription%20and%20Translation:%20Pseudo-Labeling%20with%20Out-of-Distribution%20Data&rft.au=Gheini,%20Mozhdeh&rft.date=2022-12-19&rft_id=info:doi/10.48550/arxiv.2212.09982&rft_dat=%3Carxiv_GOX%3E2212_09982%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true