UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech. The technology eases communication with speakers affected by the neuromotor disorder and enhances their social inclusion. NED-based (Neural Encoder-Decoder) systems have signifi...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wang, Yuejiao Wu, Xixin Wang, Disong Meng, Lingwei Meng, Helen |
description | Dysarthric speech reconstruction (DSR) systems aim to automatically convert
dysarthric speech into normal-sounding speech. The technology eases
communication with speakers affected by the neuromotor disorder and enhances
their social inclusion. NED-based (Neural Encoder-Decoder) systems have
significantly improved the intelligibility of the reconstructed speech as
compared with GAN-based (Generative Adversarial Network) approaches, but the
approach is still limited by training inefficiency caused by the cascaded
pipeline and auxiliary tasks of the content encoder, which may in turn affect
the quality of reconstruction. Inspired by self-supervised speech
representation learning and discrete speech units, we propose a Unit-DSR
system, which harnesses the powerful domain-adaptation capacity of HuBERT for
training efficiency improvement and utilizes speech units to constrain the
dysarthric content restoration in a discrete linguistic space. Compared with
NED approaches, the Unit-DSR system only consists of a speech unit normalizer
and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded
sub-modules or auxiliary tasks. Results on the UASpeech corpus indicate that
Unit-DSR outperforms competitive baselines in terms of content restoration,
reaching a 28.2% relative average word error rate reduction when compared to
original dysarthric speech, and shows robustness against speed perturbation and
noise. |
doi_str_mv | 10.48550/arxiv.2401.14664 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_14664</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_14664</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-e10e237c755e55da4ba8b14c13f87889ddfb567cffd7e62d5f345afcf4c960cf3</originalsourceid><addsrcrecordid>eNo1z7tOwzAYhmEvDKhwAUz1DSTY8bFsqOVQqSpqk8zRHx-opSapHIMIV49aYPqWV5_0IHRHSc61EOQe4lf4zAtOaE65lPwa7ertuspW5f4Br6YRYjrEYHB5cs4c8N6ZoR9T_DApDD0upzG5Dtdj6N__k7oPCW-H2MExfMM5u0FXHo6ju_3bGaqen6rla7Z5e1kvHzcZSMUzR4krmDJKCCeEBd6Cbik3lHmttF5Y61shlfHeKicLKzzjArzx3CwkMZ7N0Pz39mJqTjF0EKfmbGsuNvYDep9KrA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization</title><source>arXiv.org</source><creator>Wang, Yuejiao ; Wu, Xixin ; Wang, Disong ; Meng, Lingwei ; Meng, Helen</creator><creatorcontrib>Wang, Yuejiao ; Wu, Xixin ; Wang, Disong ; Meng, Lingwei ; Meng, Helen</creatorcontrib><description>Dysarthric speech reconstruction (DSR) systems aim to automatically convert
dysarthric speech into normal-sounding speech. The technology eases
communication with speakers affected by the neuromotor disorder and enhances
their social inclusion. NED-based (Neural Encoder-Decoder) systems have
significantly improved the intelligibility of the reconstructed speech as
compared with GAN-based (Generative Adversarial Network) approaches, but the
approach is still limited by training inefficiency caused by the cascaded
pipeline and auxiliary tasks of the content encoder, which may in turn affect
the quality of reconstruction. Inspired by self-supervised speech
representation learning and discrete speech units, we propose a Unit-DSR
system, which harnesses the powerful domain-adaptation capacity of HuBERT for
training efficiency improvement and utilizes speech units to constrain the
dysarthric content restoration in a discrete linguistic space. Compared with
NED approaches, the Unit-DSR system only consists of a speech unit normalizer
and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded
sub-modules or auxiliary tasks. Results on the UASpeech corpus indicate that
Unit-DSR outperforms competitive baselines in terms of content restoration,
reaching a 28.2% relative average word error rate reduction when compared to
original dysarthric speech, and shows robustness against speed perturbation and
noise.</description><identifier>DOI: 10.48550/arxiv.2401.14664</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2024-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.14664$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.14664$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Yuejiao</creatorcontrib><creatorcontrib>Wu, Xixin</creatorcontrib><creatorcontrib>Wang, Disong</creatorcontrib><creatorcontrib>Meng, Lingwei</creatorcontrib><creatorcontrib>Meng, Helen</creatorcontrib><title>UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization</title><description>Dysarthric speech reconstruction (DSR) systems aim to automatically convert
dysarthric speech into normal-sounding speech. The technology eases
communication with speakers affected by the neuromotor disorder and enhances
their social inclusion. NED-based (Neural Encoder-Decoder) systems have
significantly improved the intelligibility of the reconstructed speech as
compared with GAN-based (Generative Adversarial Network) approaches, but the
approach is still limited by training inefficiency caused by the cascaded
pipeline and auxiliary tasks of the content encoder, which may in turn affect
the quality of reconstruction. Inspired by self-supervised speech
representation learning and discrete speech units, we propose a Unit-DSR
system, which harnesses the powerful domain-adaptation capacity of HuBERT for
training efficiency improvement and utilizes speech units to constrain the
dysarthric content restoration in a discrete linguistic space. Compared with
NED approaches, the Unit-DSR system only consists of a speech unit normalizer
and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded
sub-modules or auxiliary tasks. Results on the UASpeech corpus indicate that
Unit-DSR outperforms competitive baselines in terms of content restoration,
reaching a 28.2% relative average word error rate reduction when compared to
original dysarthric speech, and shows robustness against speed perturbation and
noise.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1z7tOwzAYhmEvDKhwAUz1DSTY8bFsqOVQqSpqk8zRHx-opSapHIMIV49aYPqWV5_0IHRHSc61EOQe4lf4zAtOaE65lPwa7ertuspW5f4Br6YRYjrEYHB5cs4c8N6ZoR9T_DApDD0upzG5Dtdj6N__k7oPCW-H2MExfMM5u0FXHo6ju_3bGaqen6rla7Z5e1kvHzcZSMUzR4krmDJKCCeEBd6Cbik3lHmttF5Y61shlfHeKicLKzzjArzx3CwkMZ7N0Pz39mJqTjF0EKfmbGsuNvYDep9KrA</recordid><startdate>20240126</startdate><enddate>20240126</enddate><creator>Wang, Yuejiao</creator><creator>Wu, Xixin</creator><creator>Wang, Disong</creator><creator>Meng, Lingwei</creator><creator>Meng, Helen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240126</creationdate><title>UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization</title><author>Wang, Yuejiao ; Wu, Xixin ; Wang, Disong ; Meng, Lingwei ; Meng, Helen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-e10e237c755e55da4ba8b14c13f87889ddfb567cffd7e62d5f345afcf4c960cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Yuejiao</creatorcontrib><creatorcontrib>Wu, Xixin</creatorcontrib><creatorcontrib>Wang, Disong</creatorcontrib><creatorcontrib>Meng, Lingwei</creatorcontrib><creatorcontrib>Meng, Helen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Yuejiao</au><au>Wu, Xixin</au><au>Wang, Disong</au><au>Meng, Lingwei</au><au>Meng, Helen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization</atitle><date>2024-01-26</date><risdate>2024</risdate><abstract>Dysarthric speech reconstruction (DSR) systems aim to automatically convert
dysarthric speech into normal-sounding speech. The technology eases
communication with speakers affected by the neuromotor disorder and enhances
their social inclusion. NED-based (Neural Encoder-Decoder) systems have
significantly improved the intelligibility of the reconstructed speech as
compared with GAN-based (Generative Adversarial Network) approaches, but the
approach is still limited by training inefficiency caused by the cascaded
pipeline and auxiliary tasks of the content encoder, which may in turn affect
the quality of reconstruction. Inspired by self-supervised speech
representation learning and discrete speech units, we propose a Unit-DSR
system, which harnesses the powerful domain-adaptation capacity of HuBERT for
training efficiency improvement and utilizes speech units to constrain the
dysarthric content restoration in a discrete linguistic space. Compared with
NED approaches, the Unit-DSR system only consists of a speech unit normalizer
and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded
sub-modules or auxiliary tasks. Results on the UASpeech corpus indicate that
Unit-DSR outperforms competitive baselines in terms of content restoration,
reaching a 28.2% relative average word error rate reduction when compared to
original dysarthric speech, and shows robustness against speed perturbation and
noise.</abstract><doi>10.48550/arxiv.2401.14664</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2401.14664 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2401_14664 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Sound |
title | UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T19%3A09%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UNIT-DSR:%20Dysarthric%20Speech%20Reconstruction%20System%20Using%20Speech%20Unit%20Normalization&rft.au=Wang,%20Yuejiao&rft.date=2024-01-26&rft_id=info:doi/10.48550/arxiv.2401.14664&rft_dat=%3Carxiv_GOX%3E2401_14664%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |