Noisy student teacher training for robust keyword spotting

Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base inst...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhu, Pai, Moreno, Ignacio Lopez, Subrahmanya, Niranjan, Park, Hyun Jin
Format:	Patent
Sprache:	eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhu, Pai Moreno, Ignacio Lopez Subrahmanya, Niranjan Park, Hyun Jin
description	Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US12027162B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US12027162B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US12027162B23</originalsourceid><addsrcrecordid>eNrjZLDyy88srlQoLilNSc0rUShJTUzOSC1SKClKzMzLzEtXSMsvUijKTyotLlHITq0szy9KUSguyC8pAcrxMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL40GBDIwMjc0MzIycjY2LUAADE2y_T</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Noisy student teacher training for robust keyword spotting</title><source>esp@cenet</source><creator>Zhu, Pai ; Moreno, Ignacio Lopez ; Subrahmanya, Niranjan ; Park, Hyun Jin</creator><creatorcontrib>Zhu, Pai ; Moreno, Ignacio Lopez ; Subrahmanya, Niranjan ; Park, Hyun Jin</creatorcontrib><description>Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.</description><language>eng</language><subject>ACOUSTICS ; CALCULATING ; COMPUTING ; COUNTING ; ELECTRIC DIGITAL DATA PROCESSING ; MUSICAL INSTRUMENTS ; PHYSICS ; SPEECH ANALYSIS OR SYNTHESIS ; SPEECH OR AUDIO CODING OR DECODING ; SPEECH OR VOICE PROCESSING ; SPEECH RECOGNITION</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240702&DB=EPODOC&CC=US&NR=12027162B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240702&DB=EPODOC&CC=US&NR=12027162B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Zhu, Pai</creatorcontrib><creatorcontrib>Moreno, Ignacio Lopez</creatorcontrib><creatorcontrib>Subrahmanya, Niranjan</creatorcontrib><creatorcontrib>Park, Hyun Jin</creatorcontrib><title>Noisy student teacher training for robust keyword spotting</title><description>Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.</description><subject>ACOUSTICS</subject><subject>CALCULATING</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>ELECTRIC DIGITAL DATA PROCESSING</subject><subject>MUSICAL INSTRUMENTS</subject><subject>PHYSICS</subject><subject>SPEECH ANALYSIS OR SYNTHESIS</subject><subject>SPEECH OR AUDIO CODING OR DECODING</subject><subject>SPEECH OR VOICE PROCESSING</subject><subject>SPEECH RECOGNITION</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLDyy88srlQoLilNSc0rUShJTUzOSC1SKClKzMzLzEtXSMsvUijKTyotLlHITq0szy9KUSguyC8pAcrxMLCmJeYUp_JCaW4GRTfXEGcP3dSC_PjU4oLE5NS81JL40GBDIwMjc0MzIycjY2LUAADE2y_T</recordid><startdate>20240702</startdate><enddate>20240702</enddate><creator>Zhu, Pai</creator><creator>Moreno, Ignacio Lopez</creator><creator>Subrahmanya, Niranjan</creator><creator>Park, Hyun Jin</creator><scope>EVB</scope></search><sort><creationdate>20240702</creationdate><title>Noisy student teacher training for robust keyword spotting</title><author>Zhu, Pai ; Moreno, Ignacio Lopez ; Subrahmanya, Niranjan ; Park, Hyun Jin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US12027162B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2024</creationdate><topic>ACOUSTICS</topic><topic>CALCULATING</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>ELECTRIC DIGITAL DATA PROCESSING</topic><topic>MUSICAL INSTRUMENTS</topic><topic>PHYSICS</topic><topic>SPEECH ANALYSIS OR SYNTHESIS</topic><topic>SPEECH OR AUDIO CODING OR DECODING</topic><topic>SPEECH OR VOICE PROCESSING</topic><topic>SPEECH RECOGNITION</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Pai</creatorcontrib><creatorcontrib>Moreno, Ignacio Lopez</creatorcontrib><creatorcontrib>Subrahmanya, Niranjan</creatorcontrib><creatorcontrib>Park, Hyun Jin</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhu, Pai</au><au>Moreno, Ignacio Lopez</au><au>Subrahmanya, Niranjan</au><au>Park, Hyun Jin</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Noisy student teacher training for robust keyword spotting</title><date>2024-07-02</date><risdate>2024</risdate><abstract>Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US12027162B2
source	esp@cenet
subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
title	Noisy student teacher training for robust keyword spotting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A24%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Zhu,%20Pai&rft.date=2024-07-02&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS12027162B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true