TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Behre, Piyush, Tan, Sharman, Shah, Amy, Kesavamoorthy, Harini, Chang, Shuangyu, Zuo, Fei, Basoglu, Chris, Pathak, Sayan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Behre, Piyush
Tan, Sharman
Shah, Amy
Kesavamoorthy, Harini
Chang, Shuangyu
Zuo, Fei
Basoglu, Chris
Pathak, Sayan
description Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.
doi_str_mv 10.48550/arxiv.2210.15104
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_15104</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_15104</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-6dd4a19bd4758b81c049d0b3419d672554c370992dc8ba6b27cf924615b59e073</originalsourceid><addsrcrecordid>eNpFj11LwzAYhXPjhUx_gFfmD3QmadI03pWhUxhztL0vb5K3UuiHpl1x_96tE7w6nMPDgYeQB87WMlWKPUH4aea1EOeBK87kLfku88INAZ9pRvfDjC3dHsrIwoie5ggebNM204kuUKD1EGhW5LTAzw77CaZm6Cn0nh6OvZuO194N_vyDM7THf2DEFt2l3ZGbGtoR7_9yRcrXl3LzFu0-tu-bbBdBomWUeC-BG-ulVqlNuWPSeGZjyY1PtFBKulgzY4R3qYXECu1qI2TClVUGmY5X5PF6uzhXX6HpIJyqi3u1uMe_LURUIg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><source>arXiv.org</source><creator>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</creator><creatorcontrib>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</creatorcontrib><description>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</description><identifier>DOI: 10.48550/arxiv.2210.15104</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.15104$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.15104$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Behre, Piyush</creatorcontrib><creatorcontrib>Tan, Sharman</creatorcontrib><creatorcontrib>Shah, Amy</creatorcontrib><creatorcontrib>Kesavamoorthy, Harini</creatorcontrib><creatorcontrib>Chang, Shuangyu</creatorcontrib><creatorcontrib>Zuo, Fei</creatorcontrib><creatorcontrib>Basoglu, Chris</creatorcontrib><creatorcontrib>Pathak, Sayan</creatorcontrib><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><description>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj11LwzAYhXPjhUx_gFfmD3QmadI03pWhUxhztL0vb5K3UuiHpl1x_96tE7w6nMPDgYeQB87WMlWKPUH4aea1EOeBK87kLfku88INAZ9pRvfDjC3dHsrIwoie5ggebNM204kuUKD1EGhW5LTAzw77CaZm6Cn0nh6OvZuO194N_vyDM7THf2DEFt2l3ZGbGtoR7_9yRcrXl3LzFu0-tu-bbBdBomWUeC-BG-ulVqlNuWPSeGZjyY1PtFBKulgzY4R3qYXECu1qI2TClVUGmY5X5PF6uzhXX6HpIJyqi3u1uMe_LURUIg</recordid><startdate>20221026</startdate><enddate>20221026</enddate><creator>Behre, Piyush</creator><creator>Tan, Sharman</creator><creator>Shah, Amy</creator><creator>Kesavamoorthy, Harini</creator><creator>Chang, Shuangyu</creator><creator>Zuo, Fei</creator><creator>Basoglu, Chris</creator><creator>Pathak, Sayan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221026</creationdate><title>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</title><author>Behre, Piyush ; Tan, Sharman ; Shah, Amy ; Kesavamoorthy, Harini ; Chang, Shuangyu ; Zuo, Fei ; Basoglu, Chris ; Pathak, Sayan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-6dd4a19bd4758b81c049d0b3419d672554c370992dc8ba6b27cf924615b59e073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Behre, Piyush</creatorcontrib><creatorcontrib>Tan, Sharman</creatorcontrib><creatorcontrib>Shah, Amy</creatorcontrib><creatorcontrib>Kesavamoorthy, Harini</creatorcontrib><creatorcontrib>Chang, Shuangyu</creatorcontrib><creatorcontrib>Zuo, Fei</creatorcontrib><creatorcontrib>Basoglu, Chris</creatorcontrib><creatorcontrib>Pathak, Sayan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Behre, Piyush</au><au>Tan, Sharman</au><au>Shah, Amy</au><au>Kesavamoorthy, Harini</au><au>Chang, Shuangyu</au><au>Zuo, Fei</au><au>Basoglu, Chris</au><au>Pathak, Sayan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection</atitle><date>2022-10-26</date><risdate>2022</risdate><abstract>Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.</abstract><doi>10.48550/arxiv.2210.15104</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2210.15104
ispartof
issn
language eng
recordid cdi_arxiv_primary_2210_15104
source arXiv.org
subjects Computer Science - Computation and Language
title TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T23%3A59%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TRScore:%20A%20Novel%20GPT-based%20Readability%20Scorer%20for%20ASR%20Segmentation%20and%20Punctuation%20model%20evaluation%20and%20selection&rft.au=Behre,%20Piyush&rft.date=2022-10-26&rft_id=info:doi/10.48550/arxiv.2210.15104&rft_dat=%3Carxiv_GOX%3E2210_15104%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true