Large-scale Language Model Rescoring on Long-form Data

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chen, Tongzhou, Allauzen, Cyril, Huang, Yinghui, Park, Daniel, Rybach, David, Huang, W. Ronny, Cabrera, Rodrigo, Audhkhasi, Kartik, Ramabhadran, Bhuvana, Moreno, Pedro J, Riley, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Chen, Tongzhou
Allauzen, Cyril
Huang, Yinghui
Park, Daniel
Rybach, David
Huang, W. Ronny
Cabrera, Rodrigo
Audhkhasi, Kartik
Ramabhadran, Bhuvana
Moreno, Pedro J
Riley, Michael
description ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM. Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
doi_str_mv 10.48550/arxiv.2306.08133
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2306_08133</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2306_08133</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-1ee77af17437f25d685c3f44c0f3bb8615fcb0288173c3e54aad8404edb0863b3</originalsourceid><addsrcrecordid>eNotj8tKxDAUQLNxITN-gCvzA6lJb17bYXxCZEBmX27Sm1DoNJKq6N-Lo6uzO5zD2LWSnfbGyFtsX9Nn14O0nfQK4JLZgK2QWBPOxAMu5QML8Zc60sxfaU21TUvhdeGhLkXk2k78Dt9xyy4yzitd_XPDjg_3x_2TCIfH5_0uCLQOhCJyDrNyGlzuzWi9SZC1TjJDjN4qk1OUvffKQQIyGnH0Wmoao_QWImzYzZ_2HD68temE7Xv4HRjOA_ADZKk_GQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Large-scale Language Model Rescoring on Long-form Data</title><source>arXiv.org</source><creator>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</creator><creatorcontrib>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</creatorcontrib><description>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM. Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.</description><identifier>DOI: 10.48550/arxiv.2306.08133</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2023-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2306.08133$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2306.08133$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Tongzhou</creatorcontrib><creatorcontrib>Allauzen, Cyril</creatorcontrib><creatorcontrib>Huang, Yinghui</creatorcontrib><creatorcontrib>Park, Daniel</creatorcontrib><creatorcontrib>Rybach, David</creatorcontrib><creatorcontrib>Huang, W. Ronny</creatorcontrib><creatorcontrib>Cabrera, Rodrigo</creatorcontrib><creatorcontrib>Audhkhasi, Kartik</creatorcontrib><creatorcontrib>Ramabhadran, Bhuvana</creatorcontrib><creatorcontrib>Moreno, Pedro J</creatorcontrib><creatorcontrib>Riley, Michael</creatorcontrib><title>Large-scale Language Model Rescoring on Long-form Data</title><description>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM. Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tKxDAUQLNxITN-gCvzA6lJb17bYXxCZEBmX27Sm1DoNJKq6N-Lo6uzO5zD2LWSnfbGyFtsX9Nn14O0nfQK4JLZgK2QWBPOxAMu5QML8Zc60sxfaU21TUvhdeGhLkXk2k78Dt9xyy4yzitd_XPDjg_3x_2TCIfH5_0uCLQOhCJyDrNyGlzuzWi9SZC1TjJDjN4qk1OUvffKQQIyGnH0Wmoao_QWImzYzZ_2HD68temE7Xv4HRjOA_ADZKk_GQ</recordid><startdate>20230613</startdate><enddate>20230613</enddate><creator>Chen, Tongzhou</creator><creator>Allauzen, Cyril</creator><creator>Huang, Yinghui</creator><creator>Park, Daniel</creator><creator>Rybach, David</creator><creator>Huang, W. Ronny</creator><creator>Cabrera, Rodrigo</creator><creator>Audhkhasi, Kartik</creator><creator>Ramabhadran, Bhuvana</creator><creator>Moreno, Pedro J</creator><creator>Riley, Michael</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230613</creationdate><title>Large-scale Language Model Rescoring on Long-form Data</title><author>Chen, Tongzhou ; Allauzen, Cyril ; Huang, Yinghui ; Park, Daniel ; Rybach, David ; Huang, W. Ronny ; Cabrera, Rodrigo ; Audhkhasi, Kartik ; Ramabhadran, Bhuvana ; Moreno, Pedro J ; Riley, Michael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-1ee77af17437f25d685c3f44c0f3bb8615fcb0288173c3e54aad8404edb0863b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Tongzhou</creatorcontrib><creatorcontrib>Allauzen, Cyril</creatorcontrib><creatorcontrib>Huang, Yinghui</creatorcontrib><creatorcontrib>Park, Daniel</creatorcontrib><creatorcontrib>Rybach, David</creatorcontrib><creatorcontrib>Huang, W. Ronny</creatorcontrib><creatorcontrib>Cabrera, Rodrigo</creatorcontrib><creatorcontrib>Audhkhasi, Kartik</creatorcontrib><creatorcontrib>Ramabhadran, Bhuvana</creatorcontrib><creatorcontrib>Moreno, Pedro J</creatorcontrib><creatorcontrib>Riley, Michael</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Tongzhou</au><au>Allauzen, Cyril</au><au>Huang, Yinghui</au><au>Park, Daniel</au><au>Rybach, David</au><au>Huang, W. Ronny</au><au>Cabrera, Rodrigo</au><au>Audhkhasi, Kartik</au><au>Ramabhadran, Bhuvana</au><au>Moreno, Pedro J</au><au>Riley, Michael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-scale Language Model Rescoring on Long-form Data</atitle><date>2023-06-13</date><risdate>2023</risdate><abstract>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM. Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.</abstract><doi>10.48550/arxiv.2306.08133</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2306.08133
ispartof
issn
language eng
recordid cdi_arxiv_primary_2306_08133
source arXiv.org
subjects Computer Science - Computation and Language
title Large-scale Language Model Rescoring on Long-form Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T13%3A26%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-scale%20Language%20Model%20Rescoring%20on%20Long-form%20Data&rft.au=Chen,%20Tongzhou&rft.date=2023-06-13&rft_id=info:doi/10.48550/arxiv.2306.08133&rft_dat=%3Carxiv_GOX%3E2306_08133%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true