Sequence-level self-learning with multiple hypotheses

In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised le...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-12
Hauptverfasser:	Kumatani, Kenichi, Dimitriadis, Dimitrios, Gaur, Yashesh, Gmyr, Robert, Eskimez, Sefik Emre, Li, Jinyu, Zeng, Michael
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic speech recognition Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Hypotheses Unsupervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Kumatani, Kenichi Dimitriadis, Dimitrios Gaur, Yashesh Gmyr, Robert Eskimez, Sefik Emre Li, Jinyu Zeng, Michael
description	In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the $n$-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55\% to 10.36\% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.
doi_str_mv	10.48550/arxiv.2112.05826
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2112_05826</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2609873823</sourcerecordid><originalsourceid>FETCH-LOGICAL-a956-3f0a962e596a344108bd4d876b7e8d0f5d6e392e99ecd31bc82a11930983de383</originalsourceid><addsrcrecordid>eNotz81OwzAQBGALCYmq9AE4EYlzgu2NHfuIKn4qVeJA75ETb0gqNwl2UujbY1pOu4fRaD5C7hjNciUEfTT-pztmnDGeUaG4vCILDsBSlXN-Q1Yh7CmlXBZcCFgQ8YFfM_Y1pg6P6JKAromv8X3Xfybf3dQmh9lN3egwaU_jMLUYMNyS68a4gKv_uyS7l-fd-i3dvr9u1k_b1GghU2io0ZKj0NJAnjOqKptbVciqQGVpI6xE0By1xtoCq2rFDWMaqFZgERQsyf2l9mwqR98djD-Vf7bybIuJh0ti9EN0hKncD7Pv46aSy9hTgIr4XwUVUPI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2609873823</pqid></control><display><type>article</type><title>Sequence-level self-learning with multiple hypotheses</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Kumatani, Kenichi ; Dimitriadis, Dimitrios ; Gaur, Yashesh ; Gmyr, Robert ; Eskimez, Sefik Emre ; Li, Jinyu ; Zeng, Michael</creator><creatorcontrib>Kumatani, Kenichi ; Dimitriadis, Dimitrios ; Gaur, Yashesh ; Gmyr, Robert ; Eskimez, Sefik Emre ; Li, Jinyu ; Zeng, Michael</creatorcontrib><description>In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the $n$-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55\% to 10.36\% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2112.05826</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic speech recognition ; Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning ; Hypotheses ; Unsupervised learning</subject><ispartof>arXiv.org, 2021-12</ispartof><rights>2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.21437/Interspeech.2020-2020$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.05826$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kumatani, Kenichi</creatorcontrib><creatorcontrib>Dimitriadis, Dimitrios</creatorcontrib><creatorcontrib>Gaur, Yashesh</creatorcontrib><creatorcontrib>Gmyr, Robert</creatorcontrib><creatorcontrib>Eskimez, Sefik Emre</creatorcontrib><creatorcontrib>Li, Jinyu</creatorcontrib><creatorcontrib>Zeng, Michael</creatorcontrib><title>Sequence-level self-learning with multiple hypotheses</title><title>arXiv.org</title><description>In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the $n$-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55\% to 10.36\% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.</description><subject>Automatic speech recognition</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><subject>Hypotheses</subject><subject>Unsupervised learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotz81OwzAQBGALCYmq9AE4EYlzgu2NHfuIKn4qVeJA75ETb0gqNwl2UujbY1pOu4fRaD5C7hjNciUEfTT-pztmnDGeUaG4vCILDsBSlXN-Q1Yh7CmlXBZcCFgQ8YFfM_Y1pg6P6JKAromv8X3Xfybf3dQmh9lN3egwaU_jMLUYMNyS68a4gKv_uyS7l-fd-i3dvr9u1k_b1GghU2io0ZKj0NJAnjOqKptbVciqQGVpI6xE0By1xtoCq2rFDWMaqFZgERQsyf2l9mwqR98djD-Vf7bybIuJh0ti9EN0hKncD7Pv46aSy9hTgIr4XwUVUPI</recordid><startdate>20211210</startdate><enddate>20211210</enddate><creator>Kumatani, Kenichi</creator><creator>Dimitriadis, Dimitrios</creator><creator>Gaur, Yashesh</creator><creator>Gmyr, Robert</creator><creator>Eskimez, Sefik Emre</creator><creator>Li, Jinyu</creator><creator>Zeng, Michael</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211210</creationdate><title>Sequence-level self-learning with multiple hypotheses</title><author>Kumatani, Kenichi ; Dimitriadis, Dimitrios ; Gaur, Yashesh ; Gmyr, Robert ; Eskimez, Sefik Emre ; Li, Jinyu ; Zeng, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a956-3f0a962e596a344108bd4d876b7e8d0f5d6e392e99ecd31bc82a11930983de383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Automatic speech recognition</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><topic>Hypotheses</topic><topic>Unsupervised learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kumatani, Kenichi</creatorcontrib><creatorcontrib>Dimitriadis, Dimitrios</creatorcontrib><creatorcontrib>Gaur, Yashesh</creatorcontrib><creatorcontrib>Gmyr, Robert</creatorcontrib><creatorcontrib>Eskimez, Sefik Emre</creatorcontrib><creatorcontrib>Li, Jinyu</creatorcontrib><creatorcontrib>Zeng, Michael</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumatani, Kenichi</au><au>Dimitriadis, Dimitrios</au><au>Gaur, Yashesh</au><au>Gmyr, Robert</au><au>Eskimez, Sefik Emre</au><au>Li, Jinyu</au><au>Zeng, Michael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequence-level self-learning with multiple hypotheses</atitle><jtitle>arXiv.org</jtitle><date>2021-12-10</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the \emph{multi-task learning} (MTL) framework where the $n$-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the \emph{hard-decision} errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55\% to 10.36\% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2112.05826</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-12
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2112_05826
source	arXiv.org; Free E- Journals
subjects	Automatic speech recognition Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Hypotheses Unsupervised learning
title	Sequence-level self-learning with multiple hypotheses
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T10%3A25%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequence-level%20self-learning%20with%20multiple%20hypotheses&rft.jtitle=arXiv.org&rft.au=Kumatani,%20Kenichi&rft.date=2021-12-10&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2112.05826&rft_dat=%3Cproquest_arxiv%3E2609873823%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2609873823&rft_id=info:pmid/&rfr_iscdi=true