Lattice-based lightly-supervised acoustic model training
In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approache...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Fainberg, Joachim Klejch, Ondřej Renals, Steve Bell, Peter |
description | In the broadcast domain there is an abundance of related text data and
partial transcriptions, such as closed captions and subtitles. This text data
can be used for lightly supervised training, in which text matching the audio
is selected using an existing speech recognition model. Current approaches to
light supervision typically filter the data based on matching error rates
between the transcriptions and biased decoding hypotheses. In contrast,
semi-supervised training does not require matching text data, instead
generating a hypothesis using a background language model. State-of-the-art
semi-supervised training uses lattice-based supervision with the lattice-free
MMI (LF-MMI) objective function. We propose a technique to combine inaccurate
transcriptions with the lattices generated for semi-supervised training, thus
preserving uncertainty in the lattice where appropriate. We demonstrate that
this combined approach reduces the expected error rates over the lattices, and
reduces the word error rate (WER) on a broadcast task. |
doi_str_mv | 10.48550/arxiv.1905.13150 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1905_13150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1905_13150</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-ea732983cfbb2483579c539cd8c69855710d5e93a1cc3dded74865ade6d082a53</originalsourceid><addsrcrecordid>eNotj81qwzAQhHXpoaR5gJ7iF5Areb2WdCyhbQqGXnI3a-0mFTg_yE5o3r5J2tPAMAzfp9SzNWXtEc0L5Z90Lm0wWFqwaB6Vb2maUhTd0yhcDGn7PQ0XPZ6Oks_pVlE8nMbrpNgdWIZiypT2ab99Ug8bGkaZ_-dMrd_f1suVbr8-PpevrabGGS3koAoe4qbvq9oDuhARQmQfm3BFctYwSgCyMQKzsKt9g8TSsPEVIczU4u_2jt4dc9pRvnQ3he6uAL_3bUF5</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Lattice-based lightly-supervised acoustic model training</title><source>arXiv.org</source><creator>Fainberg, Joachim ; Klejch, Ondřej ; Renals, Steve ; Bell, Peter</creator><creatorcontrib>Fainberg, Joachim ; Klejch, Ondřej ; Renals, Steve ; Bell, Peter</creatorcontrib><description>In the broadcast domain there is an abundance of related text data and
partial transcriptions, such as closed captions and subtitles. This text data
can be used for lightly supervised training, in which text matching the audio
is selected using an existing speech recognition model. Current approaches to
light supervision typically filter the data based on matching error rates
between the transcriptions and biased decoding hypotheses. In contrast,
semi-supervised training does not require matching text data, instead
generating a hypothesis using a background language model. State-of-the-art
semi-supervised training uses lattice-based supervision with the lattice-free
MMI (LF-MMI) objective function. We propose a technique to combine inaccurate
transcriptions with the lattices generated for semi-supervised training, thus
preserving uncertainty in the lattice where appropriate. We demonstrate that
this combined approach reduces the expected error rates over the lattices, and
reduces the word error rate (WER) on a broadcast task.</description><identifier>DOI: 10.48550/arxiv.1905.13150</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Sound</subject><creationdate>2019-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1905.13150$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1905.13150$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Fainberg, Joachim</creatorcontrib><creatorcontrib>Klejch, Ondřej</creatorcontrib><creatorcontrib>Renals, Steve</creatorcontrib><creatorcontrib>Bell, Peter</creatorcontrib><title>Lattice-based lightly-supervised acoustic model training</title><description>In the broadcast domain there is an abundance of related text data and
partial transcriptions, such as closed captions and subtitles. This text data
can be used for lightly supervised training, in which text matching the audio
is selected using an existing speech recognition model. Current approaches to
light supervision typically filter the data based on matching error rates
between the transcriptions and biased decoding hypotheses. In contrast,
semi-supervised training does not require matching text data, instead
generating a hypothesis using a background language model. State-of-the-art
semi-supervised training uses lattice-based supervision with the lattice-free
MMI (LF-MMI) objective function. We propose a technique to combine inaccurate
transcriptions with the lattices generated for semi-supervised training, thus
preserving uncertainty in the lattice where appropriate. We demonstrate that
this combined approach reduces the expected error rates over the lattices, and
reduces the word error rate (WER) on a broadcast task.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81qwzAQhHXpoaR5gJ7iF5Areb2WdCyhbQqGXnI3a-0mFTg_yE5o3r5J2tPAMAzfp9SzNWXtEc0L5Z90Lm0wWFqwaB6Vb2maUhTd0yhcDGn7PQ0XPZ6Oks_pVlE8nMbrpNgdWIZiypT2ab99Ug8bGkaZ_-dMrd_f1suVbr8-PpevrabGGS3koAoe4qbvq9oDuhARQmQfm3BFctYwSgCyMQKzsKt9g8TSsPEVIczU4u_2jt4dc9pRvnQ3he6uAL_3bUF5</recordid><startdate>20190530</startdate><enddate>20190530</enddate><creator>Fainberg, Joachim</creator><creator>Klejch, Ondřej</creator><creator>Renals, Steve</creator><creator>Bell, Peter</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20190530</creationdate><title>Lattice-based lightly-supervised acoustic model training</title><author>Fainberg, Joachim ; Klejch, Ondřej ; Renals, Steve ; Bell, Peter</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-ea732983cfbb2483579c539cd8c69855710d5e93a1cc3dded74865ade6d082a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Fainberg, Joachim</creatorcontrib><creatorcontrib>Klejch, Ondřej</creatorcontrib><creatorcontrib>Renals, Steve</creatorcontrib><creatorcontrib>Bell, Peter</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fainberg, Joachim</au><au>Klejch, Ondřej</au><au>Renals, Steve</au><au>Bell, Peter</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lattice-based lightly-supervised acoustic model training</atitle><date>2019-05-30</date><risdate>2019</risdate><abstract>In the broadcast domain there is an abundance of related text data and
partial transcriptions, such as closed captions and subtitles. This text data
can be used for lightly supervised training, in which text matching the audio
is selected using an existing speech recognition model. Current approaches to
light supervision typically filter the data based on matching error rates
between the transcriptions and biased decoding hypotheses. In contrast,
semi-supervised training does not require matching text data, instead
generating a hypothesis using a background language model. State-of-the-art
semi-supervised training uses lattice-based supervision with the lattice-free
MMI (LF-MMI) objective function. We propose a technique to combine inaccurate
transcriptions with the lattices generated for semi-supervised training, thus
preserving uncertainty in the lattice where appropriate. We demonstrate that
this combined approach reduces the expected error rates over the lattices, and
reduces the word error rate (WER) on a broadcast task.</abstract><doi>10.48550/arxiv.1905.13150</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1905.13150 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1905_13150 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Sound |
title | Lattice-based lightly-supervised acoustic model training |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T21%3A18%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lattice-based%20lightly-supervised%20acoustic%20model%20training&rft.au=Fainberg,%20Joachim&rft.date=2019-05-30&rft_id=info:doi/10.48550/arxiv.1905.13150&rft_dat=%3Carxiv_GOX%3E1905_13150%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |