Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition

Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, Jeehyun, Choi, Yerin, Song, Tae-Jin, Koo, Myoung-Wan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lee, Jeehyun Choi, Yerin Song, Tae-Jin Koo, Myoung-Wan
description	Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)
doi_str_mv	10.48550/arxiv.2402.18923
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_18923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_18923</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2402_189233</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jO0sDQy5mQI98xLLCgoyi8oykwsSVUISCwtTlVwSS1JTS7JzM9T8MxTcKksTiwqySjKTFYILkhNTc5QCC3OzEtX8EksSk_VDU5OzEmFSQSlJuen52WCdPIwsKYl5hSn8kJpbgZ5N9cQZw9dsBPigdblJhZVxoOcEg92ijFhFQDWJT8O</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><source>arXiv.org</source><creator>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</creator><creatorcontrib>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</creatorcontrib><description>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</description><identifier>DOI: 10.48550/arxiv.2402.18923</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.18923$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.18923$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Jeehyun</creatorcontrib><creatorcontrib>Choi, Yerin</creatorcontrib><creatorcontrib>Song, Tae-Jin</creatorcontrib><creatorcontrib>Koo, Myoung-Wan</creatorcontrib><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><description>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jO0sDQy5mQI98xLLCgoyi8oykwsSVUISCwtTlVwSS1JTS7JzM9T8MxTcKksTiwqySjKTFYILkhNTc5QCC3OzEtX8EksSk_VDU5OzEmFSQSlJuen52WCdPIwsKYl5hSn8kJpbgZ5N9cQZw9dsBPigdblJhZVxoOcEg92ijFhFQDWJT8O</recordid><startdate>20240229</startdate><enddate>20240229</enddate><creator>Lee, Jeehyun</creator><creator>Choi, Yerin</creator><creator>Song, Tae-Jin</creator><creator>Koo, Myoung-Wan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240229</creationdate><title>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</title><author>Lee, Jeehyun ; Choi, Yerin ; Song, Tae-Jin ; Koo, Myoung-Wan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2402_189233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jeehyun</creatorcontrib><creatorcontrib>Choi, Yerin</creatorcontrib><creatorcontrib>Song, Tae-Jin</creatorcontrib><creatorcontrib>Koo, Myoung-Wan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Jeehyun</au><au>Choi, Yerin</au><au>Song, Tae-Jin</au><au>Koo, Myoung-Wan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition</atitle><date>2024-02-29</date><risdate>2024</risdate><abstract>Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an inappropriate pause prediction layer. First, we treat pause detection as speech recognition, using an automatic speech recognition (ASR) model to convert speech into text with pause tags. According to the newly designed task, we label pause locations at the text level and their appropriateness. We collaborate with speech-language pathologists to establish labeling criteria, ensuring high-quality annotated data. Finally, we extend the ASR model with an inappropriate pause prediction layer for end-to-end inappropriate pause detection. Moreover, we propose a task-tailored metric for evaluating inappropriate pause detection independent of ASR performance. Our experiments show that the proposed method better detects inappropriate pauses in dysarthric speech than baselines. (Inappropriate Pause Error Rate: 14.47%)</abstract><doi>10.48550/arxiv.2402.18923</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2402.18923
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2402_18923
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Sound
title	Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T08%3A34%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inappropriate%20Pause%20Detection%20In%20Dysarthric%20Speech%20Using%20Large-Scale%20Speech%20Recognition&rft.au=Lee,%20Jeehyun&rft.date=2024-02-29&rft_id=info:doi/10.48550/arxiv.2402.18923&rft_dat=%3Carxiv_GOX%3E2402_18923%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true