Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss func...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023, Vol.31, p.1-11
Hauptverfasser:	Zhu, Han, Gao, Dongji, Cheng, Gaofeng, Povey, Daniel, Zhang, Pengyuan, Yan, Yonghong
Format:	Artikel
Sprache:	eng
Schlagworte:	Automatic speech recognition Computational modeling Data models Error correction Error correction & detection Error detection Estimation Filtering Labeling Labelling Labels Noise Noise measurement Performance enhancement pseudo-labeling Semi-supervised learning Speech recognition Thresholds Training Tuning Voice recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11
container_issue
container_start_page	1
container_title	IEEE/ACM transactions on audio, speech, and language processing
container_volume	31
creator	Zhu, Han Gao, Dongji Cheng, Gaofeng Povey, Daniel Zhang, Pengyuan Yan, Yonghong
description	When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.
doi_str_mv	10.1109/TASLP.2023.3306709
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2861454356</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10225353</ieee_id><sourcerecordid>2861454356</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQQBdRsNT-AfEQ8Jy6n0n3GIpfELCYel42m9m6Je3G3aTgvze1FTzNHN4bmIfQLcFzQrB8WBdVuZpTTNmcMZzlWF6gCWVUppJhfvm3U4mv0SzGLcaY4FzKnE9QWbQ9hL3u3QGSVYSh8Wmpa2jdfpNYH5IKdi6thg7CwUVokmLo_W7ETVJ1AOYzeQfjN3vXO7-_QVdWtxFm5zlFH0-P6-VLWr49vy6LMjVUCJnyJuMcoBYLwTOmtWxqTikxDRChMa5NRrTBzFpqgRLWELC5bmSm7UJiximbovvT3S74rwFir7Z-GJ9oo6KLjHDBmchGip4oE3yMAazqgtvp8K0IVsdw6jecOoZT53CjdHeSHAD8EygVTDD2A8RSaeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861454356</pqid></control><display><type>article</type><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</creator><creatorcontrib>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</creatorcontrib><description>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3306709</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Automatic speech recognition ; Computational modeling ; Data models ; Error correction ; Error correction & detection ; Error detection ; Estimation ; Filtering ; Labeling ; Labelling ; Labels ; Noise ; Noise measurement ; Performance enhancement ; pseudo-labeling ; Semi-supervised learning ; Speech recognition ; Thresholds ; Training ; Tuning ; Voice recognition</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.1-11</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</citedby><cites>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</cites><orcidid>0000-0001-6838-5160 ; 0000-0001-6907-5770 ; 0000-0002-2102-6061 ; 0009-0002-5060-4454 ; 0009-0006-8885-3084 ; 0000-0002-0611-3634</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10225353$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10225353$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhu, Han</creatorcontrib><creatorcontrib>Gao, Dongji</creatorcontrib><creatorcontrib>Cheng, Gaofeng</creatorcontrib><creatorcontrib>Povey, Daniel</creatorcontrib><creatorcontrib>Zhang, Pengyuan</creatorcontrib><creatorcontrib>Yan, Yonghong</creatorcontrib><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</description><subject>Automatic speech recognition</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Error correction</subject><subject>Error correction & detection</subject><subject>Error detection</subject><subject>Estimation</subject><subject>Filtering</subject><subject>Labeling</subject><subject>Labelling</subject><subject>Labels</subject><subject>Noise</subject><subject>Noise measurement</subject><subject>Performance enhancement</subject><subject>pseudo-labeling</subject><subject>Semi-supervised learning</subject><subject>Speech recognition</subject><subject>Thresholds</subject><subject>Training</subject><subject>Tuning</subject><subject>Voice recognition</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AQQBdRsNT-AfEQ8Jy6n0n3GIpfELCYel42m9m6Je3G3aTgvze1FTzNHN4bmIfQLcFzQrB8WBdVuZpTTNmcMZzlWF6gCWVUppJhfvm3U4mv0SzGLcaY4FzKnE9QWbQ9hL3u3QGSVYSh8Wmpa2jdfpNYH5IKdi6thg7CwUVokmLo_W7ETVJ1AOYzeQfjN3vXO7-_QVdWtxFm5zlFH0-P6-VLWr49vy6LMjVUCJnyJuMcoBYLwTOmtWxqTikxDRChMa5NRrTBzFpqgRLWELC5bmSm7UJiximbovvT3S74rwFir7Z-GJ9oo6KLjHDBmchGip4oE3yMAazqgtvp8K0IVsdw6jecOoZT53CjdHeSHAD8EygVTDD2A8RSaeA</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Zhu, Han</creator><creator>Gao, Dongji</creator><creator>Cheng, Gaofeng</creator><creator>Povey, Daniel</creator><creator>Zhang, Pengyuan</creator><creator>Yan, Yonghong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6838-5160</orcidid><orcidid>https://orcid.org/0000-0001-6907-5770</orcidid><orcidid>https://orcid.org/0000-0002-2102-6061</orcidid><orcidid>https://orcid.org/0009-0002-5060-4454</orcidid><orcidid>https://orcid.org/0009-0006-8885-3084</orcidid><orcidid>https://orcid.org/0000-0002-0611-3634</orcidid></search><sort><creationdate>2023</creationdate><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><author>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automatic speech recognition</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Error correction</topic><topic>Error correction & detection</topic><topic>Error detection</topic><topic>Estimation</topic><topic>Filtering</topic><topic>Labeling</topic><topic>Labelling</topic><topic>Labels</topic><topic>Noise</topic><topic>Noise measurement</topic><topic>Performance enhancement</topic><topic>pseudo-labeling</topic><topic>Semi-supervised learning</topic><topic>Speech recognition</topic><topic>Thresholds</topic><topic>Training</topic><topic>Tuning</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Han</creatorcontrib><creatorcontrib>Gao, Dongji</creatorcontrib><creatorcontrib>Cheng, Gaofeng</creatorcontrib><creatorcontrib>Povey, Daniel</creatorcontrib><creatorcontrib>Zhang, Pengyuan</creatorcontrib><creatorcontrib>Yan, Yonghong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Han</au><au>Gao, Dongji</au><au>Cheng, Gaofeng</au><au>Povey, Daniel</au><au>Zhang, Pengyuan</au><au>Yan, Yonghong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023</date><risdate>2023</risdate><volume>31</volume><spage>1</spage><epage>11</epage><pages>1-11</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3306709</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-6838-5160</orcidid><orcidid>https://orcid.org/0000-0001-6907-5770</orcidid><orcidid>https://orcid.org/0000-0002-2102-6061</orcidid><orcidid>https://orcid.org/0009-0002-5060-4454</orcidid><orcidid>https://orcid.org/0009-0006-8885-3084</orcidid><orcidid>https://orcid.org/0000-0002-0611-3634</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2329-9290
ispartof	IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.1-11
issn	2329-9290 2329-9304
language	eng
recordid	cdi_proquest_journals_2861454356
source	IEEE Electronic Library (IEL)
subjects	Automatic speech recognition Computational modeling Data models Error correction Error correction & detection Error detection Estimation Filtering Labeling Labelling Labels Noise Noise measurement Performance enhancement pseudo-labeling Semi-supervised learning Speech recognition Thresholds Training Tuning Voice recognition
title	Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A22%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Alternative%20Pseudo-Labeling%20for%20Semi-Supervised%20Automatic%20Speech%20Recognition&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Zhu,%20Han&rft.date=2023&rft.volume=31&rft.spage=1&rft.epage=11&rft.pages=1-11&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3306709&rft_dat=%3Cproquest_RIE%3E2861454356%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861454356&rft_id=info:pmid/&rft_ieee_id=10225353&rfr_iscdi=true