Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss func...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2023, Vol.31, p.1-11 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE/ACM transactions on audio, speech, and language processing |
container_volume | 31 |
creator | Zhu, Han Gao, Dongji Cheng, Gaofeng Povey, Daniel Zhang, Pengyuan Yan, Yonghong |
description | When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages. |
doi_str_mv | 10.1109/TASLP.2023.3306709 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2861454356</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10225353</ieee_id><sourcerecordid>2861454356</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQQBdRsNT-AfEQ8Jy6n0n3GIpfELCYel42m9m6Je3G3aTgvze1FTzNHN4bmIfQLcFzQrB8WBdVuZpTTNmcMZzlWF6gCWVUppJhfvm3U4mv0SzGLcaY4FzKnE9QWbQ9hL3u3QGSVYSh8Wmpa2jdfpNYH5IKdi6thg7CwUVokmLo_W7ETVJ1AOYzeQfjN3vXO7-_QVdWtxFm5zlFH0-P6-VLWr49vy6LMjVUCJnyJuMcoBYLwTOmtWxqTikxDRChMa5NRrTBzFpqgRLWELC5bmSm7UJiximbovvT3S74rwFir7Z-GJ9oo6KLjHDBmchGip4oE3yMAazqgtvp8K0IVsdw6jecOoZT53CjdHeSHAD8EygVTDD2A8RSaeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861454356</pqid></control><display><type>article</type><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</creator><creatorcontrib>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</creatorcontrib><description>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2023.3306709</identifier><identifier>CODEN: ITASFA</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Automatic speech recognition ; Computational modeling ; Data models ; Error correction ; Error correction & detection ; Error detection ; Estimation ; Filtering ; Labeling ; Labelling ; Labels ; Noise ; Noise measurement ; Performance enhancement ; pseudo-labeling ; Semi-supervised learning ; Speech recognition ; Thresholds ; Training ; Tuning ; Voice recognition</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.1-11</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</citedby><cites>FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</cites><orcidid>0000-0001-6838-5160 ; 0000-0001-6907-5770 ; 0000-0002-2102-6061 ; 0009-0002-5060-4454 ; 0009-0006-8885-3084 ; 0000-0002-0611-3634</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10225353$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10225353$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhu, Han</creatorcontrib><creatorcontrib>Gao, Dongji</creatorcontrib><creatorcontrib>Cheng, Gaofeng</creatorcontrib><creatorcontrib>Povey, Daniel</creatorcontrib><creatorcontrib>Zhang, Pengyuan</creatorcontrib><creatorcontrib>Yan, Yonghong</creatorcontrib><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</description><subject>Automatic speech recognition</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Error correction</subject><subject>Error correction & detection</subject><subject>Error detection</subject><subject>Estimation</subject><subject>Filtering</subject><subject>Labeling</subject><subject>Labelling</subject><subject>Labels</subject><subject>Noise</subject><subject>Noise measurement</subject><subject>Performance enhancement</subject><subject>pseudo-labeling</subject><subject>Semi-supervised learning</subject><subject>Speech recognition</subject><subject>Thresholds</subject><subject>Training</subject><subject>Tuning</subject><subject>Voice recognition</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AQQBdRsNT-AfEQ8Jy6n0n3GIpfELCYel42m9m6Je3G3aTgvze1FTzNHN4bmIfQLcFzQrB8WBdVuZpTTNmcMZzlWF6gCWVUppJhfvm3U4mv0SzGLcaY4FzKnE9QWbQ9hL3u3QGSVYSh8Wmpa2jdfpNYH5IKdi6thg7CwUVokmLo_W7ETVJ1AOYzeQfjN3vXO7-_QVdWtxFm5zlFH0-P6-VLWr49vy6LMjVUCJnyJuMcoBYLwTOmtWxqTikxDRChMa5NRrTBzFpqgRLWELC5bmSm7UJiximbovvT3S74rwFir7Z-GJ9oo6KLjHDBmchGip4oE3yMAazqgtvp8K0IVsdw6jecOoZT53CjdHeSHAD8EygVTDD2A8RSaeA</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Zhu, Han</creator><creator>Gao, Dongji</creator><creator>Cheng, Gaofeng</creator><creator>Povey, Daniel</creator><creator>Zhang, Pengyuan</creator><creator>Yan, Yonghong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6838-5160</orcidid><orcidid>https://orcid.org/0000-0001-6907-5770</orcidid><orcidid>https://orcid.org/0000-0002-2102-6061</orcidid><orcidid>https://orcid.org/0009-0002-5060-4454</orcidid><orcidid>https://orcid.org/0009-0006-8885-3084</orcidid><orcidid>https://orcid.org/0000-0002-0611-3634</orcidid></search><sort><creationdate>2023</creationdate><title>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</title><author>Zhu, Han ; Gao, Dongji ; Cheng, Gaofeng ; Povey, Daniel ; Zhang, Pengyuan ; Yan, Yonghong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2559-4d644eeb585463aa9db4221cde15a00bc61ac03ff2fe213d1ef7ad96af8903423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automatic speech recognition</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Error correction</topic><topic>Error correction & detection</topic><topic>Error detection</topic><topic>Estimation</topic><topic>Filtering</topic><topic>Labeling</topic><topic>Labelling</topic><topic>Labels</topic><topic>Noise</topic><topic>Noise measurement</topic><topic>Performance enhancement</topic><topic>pseudo-labeling</topic><topic>Semi-supervised learning</topic><topic>Speech recognition</topic><topic>Thresholds</topic><topic>Training</topic><topic>Tuning</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Han</creatorcontrib><creatorcontrib>Gao, Dongji</creatorcontrib><creatorcontrib>Cheng, Gaofeng</creatorcontrib><creatorcontrib>Povey, Daniel</creatorcontrib><creatorcontrib>Zhang, Pengyuan</creatorcontrib><creatorcontrib>Yan, Yonghong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Han</au><au>Gao, Dongji</au><au>Cheng, Gaofeng</au><au>Povey, Daniel</au><au>Zhang, Pengyuan</au><au>Yan, Yonghong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2023</date><risdate>2023</risdate><volume>31</volume><spage>1</spage><epage>11</epage><pages>1-11</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASFA</coden><abstract>When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. Experiments demonstrate that alternative pseudo-labeling outperforms existing pseudo-labeling approaches on datasets in various domains and languages.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2023.3306709</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-6838-5160</orcidid><orcidid>https://orcid.org/0000-0001-6907-5770</orcidid><orcidid>https://orcid.org/0000-0002-2102-6061</orcidid><orcidid>https://orcid.org/0009-0002-5060-4454</orcidid><orcidid>https://orcid.org/0009-0006-8885-3084</orcidid><orcidid>https://orcid.org/0000-0002-0611-3634</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2329-9290 |
ispartof | IEEE/ACM transactions on audio, speech, and language processing, 2023, Vol.31, p.1-11 |
issn | 2329-9290 2329-9304 |
language | eng |
recordid | cdi_proquest_journals_2861454356 |
source | IEEE Electronic Library (IEL) |
subjects | Automatic speech recognition Computational modeling Data models Error correction Error correction & detection Error detection Estimation Filtering Labeling Labelling Labels Noise Noise measurement Performance enhancement pseudo-labeling Semi-supervised learning Speech recognition Thresholds Training Tuning Voice recognition |
title | Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A22%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Alternative%20Pseudo-Labeling%20for%20Semi-Supervised%20Automatic%20Speech%20Recognition&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Zhu,%20Han&rft.date=2023&rft.volume=31&rft.spage=1&rft.epage=11&rft.pages=1-11&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASFA&rft_id=info:doi/10.1109/TASLP.2023.3306709&rft_dat=%3Cproquest_RIE%3E2861454356%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861454356&rft_id=info:pmid/&rft_ieee_id=10225353&rfr_iscdi=true |