On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2021-06
Hauptverfasser: Tomanek, Katrin, Beaufays, Françoise, Cattiau, Julie, Chandorkar, Angad, Khe Chai Sim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Tomanek, Katrin
Beaufays, Françoise
Cattiau, Julie
Chandorkar, Angad
Khe Chai Sim
description While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2543583183</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2543583183</sourcerecordid><originalsourceid>FETCH-proquest_journals_25435831833</originalsourceid><addsrcrecordid>eNqNjcEKgkAURYcgSMp_GGgt6IxTbiOLNlGUtBUZnzVi82yetujrk_ADWl0O58CdME9IGQVJLMSM-UR1GIZitRZKSY_dTjZI4W008DM4Qls05lN0Bi3Him_6Dp8DaX5tAfSDX0Dj3ZqfP2IJDfEKHU8NoSvBQTmGCzatiobAH3fOlvtdtj0ErcNXD9TlNfZuOKNcqFiqREaJlP9VXwWQQSg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2543583183</pqid></control><display><type>article</type><title>On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech</title><source>Free E- Journals</source><creator>Tomanek, Katrin ; Beaufays, Françoise ; Cattiau, Julie ; Chandorkar, Angad ; Khe Chai Sim</creator><creatorcontrib>Tomanek, Katrin ; Beaufays, Françoise ; Cattiau, Julie ; Chandorkar, Angad ; Khe Chai Sim</creatorcontrib><description>While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automatic speech recognition ; Copying ; Customization ; Electronic devices ; Performance degradation ; Speech ; Voice communication ; Voice control ; Voice recognition</subject><ispartof>arXiv.org, 2021-06</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Tomanek, Katrin</creatorcontrib><creatorcontrib>Beaufays, Françoise</creatorcontrib><creatorcontrib>Cattiau, Julie</creatorcontrib><creatorcontrib>Chandorkar, Angad</creatorcontrib><creatorcontrib>Khe Chai Sim</creatorcontrib><title>On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech</title><title>arXiv.org</title><description>While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.</description><subject>Automatic speech recognition</subject><subject>Copying</subject><subject>Customization</subject><subject>Electronic devices</subject><subject>Performance degradation</subject><subject>Speech</subject><subject>Voice communication</subject><subject>Voice control</subject><subject>Voice recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjcEKgkAURYcgSMp_GGgt6IxTbiOLNlGUtBUZnzVi82yetujrk_ADWl0O58CdME9IGQVJLMSM-UR1GIZitRZKSY_dTjZI4W008DM4Qls05lN0Bi3Him_6Dp8DaX5tAfSDX0Dj3ZqfP2IJDfEKHU8NoSvBQTmGCzatiobAH3fOlvtdtj0ErcNXD9TlNfZuOKNcqFiqREaJlP9VXwWQQSg</recordid><startdate>20210618</startdate><enddate>20210618</enddate><creator>Tomanek, Katrin</creator><creator>Beaufays, Françoise</creator><creator>Cattiau, Julie</creator><creator>Chandorkar, Angad</creator><creator>Khe Chai Sim</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210618</creationdate><title>On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech</title><author>Tomanek, Katrin ; Beaufays, Françoise ; Cattiau, Julie ; Chandorkar, Angad ; Khe Chai Sim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25435831833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Automatic speech recognition</topic><topic>Copying</topic><topic>Customization</topic><topic>Electronic devices</topic><topic>Performance degradation</topic><topic>Speech</topic><topic>Voice communication</topic><topic>Voice control</topic><topic>Voice recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Tomanek, Katrin</creatorcontrib><creatorcontrib>Beaufays, Françoise</creatorcontrib><creatorcontrib>Cattiau, Julie</creatorcontrib><creatorcontrib>Chandorkar, Angad</creatorcontrib><creatorcontrib>Khe Chai Sim</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tomanek, Katrin</au><au>Beaufays, Françoise</au><au>Cattiau, Julie</au><au>Chandorkar, Angad</au><au>Khe Chai Sim</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech</atitle><jtitle>arXiv.org</jtitle><date>2021-06-18</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-06
issn 2331-8422
language eng
recordid cdi_proquest_journals_2543583183
source Free E- Journals
subjects Automatic speech recognition
Copying
Customization
Electronic devices
Performance degradation
Speech
Voice communication
Voice control
Voice recognition
title On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T05%3A19%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=On-Device%20Personalization%20of%20Automatic%20Speech%20Recognition%20Models%20for%20Disordered%20Speech&rft.jtitle=arXiv.org&rft.au=Tomanek,%20Katrin&rft.date=2021-06-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2543583183%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2543583183&rft_id=info:pmid/&rfr_iscdi=true