TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Behre, Piyush, Tan, Sharman, Shah, Amy, Kesavamoorthy, Harini, Chang, Shuangyu, Zuo, Fei, Basoglu, Chris, Pathak, Sayan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Behre, Piyush Tan, Sharman Shah, Amy Kesavamoorthy, Harini Chang, Shuangyu Zuo, Fei Basoglu, Chris Pathak, Sayan
description	Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.
doi_str_mv	10.48550/arxiv.2210.15104
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_15104</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_15104</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-6dd4a19bd4758b81c049d0b3419d672554c370992dc8ba6b27cf924615b59e073</originalsourceid><addsrcrecordid>eNpFj11LwzAYhXPjhUx_gFfmD3QmadI03pWhUxhztL0vb5K3UuiHpl1x_96tE7w6nMPDgYeQB87WMlWKPUH4aea1EOeBK87kLfku88INAZ9pRvfDjC3dHsrIwoie5ggebNM204kuUKD1EGhW5LTAzw77CaZm6Cn0nh6OvZuO194N_vyDM7THf2DEFt2l3ZGbGtoR7_9yRcrXl3LzFu0-tu-bbBdBomWUeC-BG-ulVqlNuWPSeGZjyY1PtFBKulgzY4R3qYXECu1qI2TClVUGmY5X5PF6uzhXX6HpIJyqi3u1uMe_LURUIg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><source>arXiv.org</source><creator>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</creator><creatorcontrib>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</creatorcontrib><description>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</description><identifier>DOI: 10.48550/arxiv.2210.15104</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.15104$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.15104$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Behre, Piyush</creatorcontrib><creatorcontrib>Tan, Sharman</creatorcontrib><creatorcontrib>Shah, Amy</creatorcontrib><creatorcontrib>Kesavamoorthy, Harini</creatorcontrib><creatorcontrib>Chang, Shuangyu</creatorcontrib><creatorcontrib>Zuo, Fei</creatorcontrib><creatorcontrib>Basoglu, Chris</creatorcontrib><creatorcontrib>Pathak, Sayan</creatorcontrib><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><description>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj11LwzAYhXPjhUx_gFfmD3QmadI03pWhUxhztL0vb5K3UuiHpl1x_96tE7w6nMPDgYeQB87WMlWKPUH4aea1EOeBK87kLfku88INAZ9pRvfDjC3dHsrIwoie5ggebNM204kuUKD1EGhW5LTAzw77CaZm6Cn0nh6OvZuO194N_vyDM7THf2DEFt2l3ZGbGtoR7_9yRcrXl3LzFu0-tu-bbBdBomWUeC-BG-ulVqlNuWPSeGZjyY1PtFBKulgzY4R3qYXECu1qI2TClVUGmY5X5PF6uzhXX6HpIJyqi3u1uMe_LURUIg</recordid><startdate>20221026</startdate><enddate>20221026</enddate><creator>Behre, Piyush</creator><creator>Tan, Sharman</creator><creator>Shah, Amy</creator><creator>Kesavamoorthy, Harini</creator><creator>Chang, Shuangyu</creator><creator>Zuo, Fei</creator><creator>Basoglu, Chris</creator><creator>Pathak, Sayan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221026</creationdate><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><author>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-6dd4a19bd4758b81c049d0b3419d672554c370992dc8ba6b27cf924615b59e073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Behre, Piyush</creatorcontrib><creatorcontrib>Tan, Sharman</creatorcontrib><creatorcontrib>Shah, Amy</creatorcontrib><creatorcontrib>Kesavamoorthy, Harini</creatorcontrib><creatorcontrib>Chang, Shuangyu</creatorcontrib><creatorcontrib>Zuo, Fei</creatorcontrib><creatorcontrib>Basoglu, Chris</creatorcontrib><creatorcontrib>Pathak, Sayan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Behre, Piyush</au><au>Tan, Sharman</au><au>Shah, Amy</au><au>Kesavamoorthy, Harini</au><au>Chang, Shuangyu</au><au>Zuo, Fei</au><au>Basoglu, Chris</au><au>Pathak, Sayan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</atitle><date>2022-10-26</date><risdate>2022</risdate><abstract>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</abstract><doi>10.48550/arxiv.2210.15104</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2210.15104
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2210_15104
source	arXiv.org
subjects	Computer Science - Computation and Language
title	TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T23%3A59%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TRScore:%20A%20Novel%20GPT-based%20Readability%20Scorer%20for%20ASR%20Segmentation%20and%20Punctuation%20model%20evaluation%20and%20selection&rft.au=Behre,%20Piyush&rft.date=2022-10-26&rft_id=info:doi/10.48550/arxiv.2210.15104&rft_dat=%3Carxiv_GOX%3E2210_15104%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true