Large-scale Language Model Rescoring on Long-form Data
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% re...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Chen, Tongzhou Allauzen, Cyril Huang, Yinghui Park, Daniel Rybach, David Huang, W. Ronny Cabrera, Rodrigo Audhkhasi, Kartik Ramabhadran, Bhuvana Moreno, Pedro J Riley, Michael |
description | ICASSP 2023 - 2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on
Automated Speech Recognition (ASR) of YouTube videos, which we use as a source
for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error
Eate (WER) on US English (en-us) and code-switched Indian English (en-in)
long-form ASR test sets and a reduction of up to 30\% relative on Salient Term
Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy
based language model. Improved lattice processing that results in a lattice
with a proper (non-tree) digraph topology and carrying context from the 1-best
hypothesis of the previous segment(s) results in significant wins in rescoring
with LLMs. We also find that the gains in performance from the combination of
LLMs trained on vast quantities of available data (such as C4) and conventional
neural LMs is additive and significantly outperforms a strong first-pass
baseline with a maximum entropy LM.
Copyright 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works. |
doi_str_mv | 10.48550/arxiv.2306.08133 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_08133</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_08133</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-1ee77af17437f25d685c3f44c0f3bb8615fcb0288173c3e54aad8404edb0863b3</originalsourceid><addsrcrecordid>eNotj8tKxDAUQLNxITN-gCvzA6lJb17bYXxCZEBmX27Sm1DoNJKq6N-Lo6uzO5zD2LWSnfbGyFtsX9Nn14O0nfQK4JLZgK2QWBPOxAMu5QML8Zc60sxfaU21TUvhdeGhLkXk2k78Dt9xyy4yzitd_XPDjg_3x_2TCIfH5_0uCLQOhCJyDrNyGlzuzWi9SZC1TjJDjN4qk1OUvffKQQIyGnH0Wmoao_QWImzYzZ_2HD68temE7Xv4HRjOA_ADZKk_GQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Large-scale Language Model Rescoring on Long-form Data</title><source>arXiv.org</source><creator>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</creator><creatorcontrib>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</creatorcontrib><description>ICASSP 2023 - 2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on
Automated Speech Recognition (ASR) of YouTube videos, which we use as a source
for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error
Eate (WER) on US English (en-us) and code-switched Indian English (en-in)
long-form ASR test sets and a reduction of up to 30\% relative on Salient Term
Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy
based language model. Improved lattice processing that results in a lattice
with a proper (non-tree) digraph topology and carrying context from the 1-best
hypothesis of the previous segment(s) results in significant wins in rescoring
with LLMs. We also find that the gains in performance from the combination of
LLMs trained on vast quantities of available data (such as C4) and conventional
neural LMs is additive and significantly outperforms a strong first-pass
baseline with a maximum entropy LM.
Copyright 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works.</description><identifier>DOI: 10.48550/arxiv.2306.08133</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.08133$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.08133$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Tongzhou</creatorcontrib><creatorcontrib>Allauzen, Cyril</creatorcontrib><creatorcontrib>Huang, Yinghui</creatorcontrib><creatorcontrib>Park, Daniel</creatorcontrib><creatorcontrib>Rybach, David</creatorcontrib><creatorcontrib>Huang, W. Ronny</creatorcontrib><creatorcontrib>Cabrera, Rodrigo</creatorcontrib><creatorcontrib>Audhkhasi, Kartik</creatorcontrib><creatorcontrib>Ramabhadran, Bhuvana</creatorcontrib><creatorcontrib>Moreno, Pedro J</creatorcontrib><creatorcontrib>Riley, Michael</creatorcontrib><title>Large-scale Language Model Rescoring on Long-form Data</title><description>ICASSP 2023 - 2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on
Automated Speech Recognition (ASR) of YouTube videos, which we use as a source
for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error
Eate (WER) on US English (en-us) and code-switched Indian English (en-in)
long-form ASR test sets and a reduction of up to 30\% relative on Salient Term
Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy
based language model. Improved lattice processing that results in a lattice
with a proper (non-tree) digraph topology and carrying context from the 1-best
hypothesis of the previous segment(s) results in significant wins in rescoring
with LLMs. We also find that the gains in performance from the combination of
LLMs trained on vast quantities of available data (such as C4) and conventional
neural LMs is additive and significantly outperforms a strong first-pass
baseline with a maximum entropy LM.
Copyright 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tKxDAUQLNxITN-gCvzA6lJb17bYXxCZEBmX27Sm1DoNJKq6N-Lo6uzO5zD2LWSnfbGyFtsX9Nn14O0nfQK4JLZgK2QWBPOxAMu5QML8Zc60sxfaU21TUvhdeGhLkXk2k78Dt9xyy4yzitd_XPDjg_3x_2TCIfH5_0uCLQOhCJyDrNyGlzuzWi9SZC1TjJDjN4qk1OUvffKQQIyGnH0Wmoao_QWImzYzZ_2HD68temE7Xv4HRjOA_ADZKk_GQ</recordid><startdate>20230613</startdate><enddate>20230613</enddate><creator>Chen, Tongzhou</creator><creator>Allauzen, Cyril</creator><creator>Huang, Yinghui</creator><creator>Park, Daniel</creator><creator>Rybach, David</creator><creator>Huang, W. Ronny</creator><creator>Cabrera, Rodrigo</creator><creator>Audhkhasi, Kartik</creator><creator>Ramabhadran, Bhuvana</creator><creator>Moreno, Pedro J</creator><creator>Riley, Michael</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230613</creationdate><title>Large-scale Language Model Rescoring on Long-form Data</title><author>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-1ee77af17437f25d685c3f44c0f3bb8615fcb0288173c3e54aad8404edb0863b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Tongzhou</creatorcontrib><creatorcontrib>Allauzen, Cyril</creatorcontrib><creatorcontrib>Huang, Yinghui</creatorcontrib><creatorcontrib>Park, Daniel</creatorcontrib><creatorcontrib>Rybach, David</creatorcontrib><creatorcontrib>Huang, W. Ronny</creatorcontrib><creatorcontrib>Cabrera, Rodrigo</creatorcontrib><creatorcontrib>Audhkhasi, Kartik</creatorcontrib><creatorcontrib>Ramabhadran, Bhuvana</creatorcontrib><creatorcontrib>Moreno, Pedro J</creatorcontrib><creatorcontrib>Riley, Michael</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Tongzhou</au><au>Allauzen, Cyril</au><au>Huang, Yinghui</au><au>Park, Daniel</au><au>Rybach, David</au><au>Huang, W. Ronny</au><au>Cabrera, Rodrigo</au><au>Audhkhasi, Kartik</au><au>Ramabhadran, Bhuvana</au><au>Moreno, Pedro J</au><au>Riley, Michael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-scale Language Model Rescoring on Long-form Data</atitle><date>2023-06-13</date><risdate>2023</risdate><abstract>ICASSP 2023 - 2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on
Automated Speech Recognition (ASR) of YouTube videos, which we use as a source
for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error
Eate (WER) on US English (en-us) and code-switched Indian English (en-in)
long-form ASR test sets and a reduction of up to 30\% relative on Salient Term
Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy
based language model. Improved lattice processing that results in a lattice
with a proper (non-tree) digraph topology and carrying context from the 1-best
hypothesis of the previous segment(s) results in significant wins in rescoring
with LLMs. We also find that the gains in performance from the combination of
LLMs trained on vast quantities of available data (such as C4) and conventional
neural LMs is additive and significantly outperforms a strong first-pass
baseline with a maximum entropy LM.
Copyright 2023 IEEE. Personal use of this material is permitted. Permission
from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other
works.</abstract><doi>10.48550/arxiv.2306.08133</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2306.08133 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2306_08133 |
source | arXiv.org |
subjects | Computer Science - Computation and Language |
title | Large-scale Language Model Rescoring on Long-form Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T13%3A26%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-scale%20Language%20Model%20Rescoring%20on%20Long-form%20Data&rft.au=Chen,%20Tongzhou&rft.date=2023-06-13&rft_id=info:doi/10.48550/arxiv.2306.08133&rft_dat=%3Carxiv_GOX%3E2306_08133%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |