Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition

Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Jeehyun, Choi, Yerin, Song, Tae-Jin, Koo, Myoung-Wan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lee, Jeehyun
Choi, Yerin
Song, Tae-Jin
Koo, Myoung-Wan
description Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)
doi_str_mv 10.48550/arxiv.2402.18923
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_18923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_18923</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2402_189233</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jO0sDQy5mQI98xLLCgoyi8oykwsSVUISCwtTlVwSS1JTS7JzM9T8MxTcKksTiwqySjKTFYILkhNTc5QCC3OzEtX8EksSk_VDU5OzEmFSQSlJuen52WCdPIwsKYl5hSn8kJpbgZ5N9cQZw9dsBPigdblJhZVxoOcEg92ijFhFQDWJT8O</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><source>arXiv.org</source><creator>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</creator><creatorcontrib>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</creatorcontrib><description>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</description><identifier>DOI: 10.48550/arxiv.2402.18923</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.18923$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.18923$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Jeehyun</creatorcontrib><creatorcontrib>Choi, Yerin</creatorcontrib><creatorcontrib>Song, Tae-Jin</creatorcontrib><creatorcontrib>Koo, Myoung-Wan</creatorcontrib><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><description>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jO0sDQy5mQI98xLLCgoyi8oykwsSVUISCwtTlVwSS1JTS7JzM9T8MxTcKksTiwqySjKTFYILkhNTc5QCC3OzEtX8EksSk_VDU5OzEmFSQSlJuen52WCdPIwsKYl5hSn8kJpbgZ5N9cQZw9dsBPigdblJhZVxoOcEg92ijFhFQDWJT8O</recordid><startdate>20240229</startdate><enddate>20240229</enddate><creator>Lee, Jeehyun</creator><creator>Choi, Yerin</creator><creator>Song, Tae-Jin</creator><creator>Koo, Myoung-Wan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240229</creationdate><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><author>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2402_189233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jeehyun</creatorcontrib><creatorcontrib>Choi, Yerin</creatorcontrib><creatorcontrib>Song, Tae-Jin</creatorcontrib><creatorcontrib>Koo, Myoung-Wan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jeehyun</au><au>Choi, Yerin</au><au>Song, Tae-Jin</au><au>Koo, Myoung-Wan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</atitle><date>2024-02-29</date><risdate>2024</risdate><abstract>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</abstract><doi>10.48550/arxiv.2402.18923</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.18923
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_18923
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Sound
title Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T08%3A34%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inappropriate%20Pause%20Detection%20In%20Dysarthric%20Speech%20Using%20Large-Scale%20Speech%20Recognition&rft.au=Lee,%20Jeehyun&rft.date=2024-02-29&rft_id=info:doi/10.48550/arxiv.2402.18923&rft_dat=%3Carxiv_GOX%3E2402_18923%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true