Text-Aware Adapter for Few-Shot Keyword Spotting
Recent advances in flexible keyword spotting (KWS) with text enrollment allow users to personalize keywords without uttering them during enrollment. However, there is still room for improvement in target keyword performance. In this work, we propose a novel few-shot transfer learning method, called...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Jung, Youngmoon Lee, Jinyoung Lee, Seungjin Jung, Myunghun Lee, Yong-Hyeok Cho, Hoon-Young |
description | Recent advances in flexible keyword spotting (KWS) with text enrollment allow
users to personalize keywords without uttering them during enrollment. However,
there is still room for improvement in target keyword performance. In this
work, we propose a novel few-shot transfer learning method, called text-aware
adapter (TA-adapter), designed to enhance a pre-trained flexible KWS model for
specific keywords with limited speech samples. To adapt the acoustic encoder,
we leverage a jointly pre-trained text encoder to generate a text embedding
that acts as a representative vector for the keyword. By fine-tuning only a
small portion of the network while keeping the core components' weights intact,
the TA-adapter proves highly efficient for few-shot KWS, enabling a seamless
return to the original pre-trained model. In our experiments, the TA-adapter
demonstrated significant performance improvements across 35 distinct keywords
from the Google Speech Commands V2 dataset, with only a 0.14% increase in the
total number of parameters. |
doi_str_mv | 10.48550/arxiv.2412.18142 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_18142</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_18142</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_181423</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jO0MDQx4mQwCEmtKNF1LE8sSlVwTEksKEktUkjLL1JwSy3XDc7IL1HwTq0szy9KUQguyC8pycxL52FgTUvMKU7lhdLcDPJuriHOHrpgs-MLijJzE4sq40F2xIPtMCasAgAMlDBD</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Text-Aware Adapter for Few-Shot Keyword Spotting</title><source>arXiv.org</source><creator>Jung, Youngmoon ; Lee, Jinyoung ; Lee, Seungjin ; Jung, Myunghun ; Lee, Yong-Hyeok ; Cho, Hoon-Young</creator><creatorcontrib>Jung, Youngmoon ; Lee, Jinyoung ; Lee, Seungjin ; Jung, Myunghun ; Lee, Yong-Hyeok ; Cho, Hoon-Young</creatorcontrib><description>Recent advances in flexible keyword spotting (KWS) with text enrollment allow
users to personalize keywords without uttering them during enrollment. However,
there is still room for improvement in target keyword performance. In this
work, we propose a novel few-shot transfer learning method, called text-aware
adapter (TA-adapter), designed to enhance a pre-trained flexible KWS model for
specific keywords with limited speech samples. To adapt the acoustic encoder,
we leverage a jointly pre-trained text encoder to generate a text embedding
that acts as a representative vector for the keyword. By fine-tuning only a
small portion of the network while keeping the core components' weights intact,
the TA-adapter proves highly efficient for few-shot KWS, enabling a seamless
return to the original pre-trained model. In our experiments, the TA-adapter
demonstrated significant performance improvements across 35 distinct keywords
from the Google Speech Commands V2 dataset, with only a 0.14% increase in the
total number of parameters.</description><identifier>DOI: 10.48550/arxiv.2412.18142</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.18142$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.18142$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jung, Youngmoon</creatorcontrib><creatorcontrib>Lee, Jinyoung</creatorcontrib><creatorcontrib>Lee, Seungjin</creatorcontrib><creatorcontrib>Jung, Myunghun</creatorcontrib><creatorcontrib>Lee, Yong-Hyeok</creatorcontrib><creatorcontrib>Cho, Hoon-Young</creatorcontrib><title>Text-Aware Adapter for Few-Shot Keyword Spotting</title><description>Recent advances in flexible keyword spotting (KWS) with text enrollment allow
users to personalize keywords without uttering them during enrollment. However,
there is still room for improvement in target keyword performance. In this
work, we propose a novel few-shot transfer learning method, called text-aware
adapter (TA-adapter), designed to enhance a pre-trained flexible KWS model for
specific keywords with limited speech samples. To adapt the acoustic encoder,
we leverage a jointly pre-trained text encoder to generate a text embedding
that acts as a representative vector for the keyword. By fine-tuning only a
small portion of the network while keeping the core components' weights intact,
the TA-adapter proves highly efficient for few-shot KWS, enabling a seamless
return to the original pre-trained model. In our experiments, the TA-adapter
demonstrated significant performance improvements across 35 distinct keywords
from the Google Speech Commands V2 dataset, with only a 0.14% increase in the
total number of parameters.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jO0MDQx4mQwCEmtKNF1LE8sSlVwTEksKEktUkjLL1JwSy3XDc7IL1HwTq0szy9KUQguyC8pycxL52FgTUvMKU7lhdLcDPJuriHOHrpgs-MLijJzE4sq40F2xIPtMCasAgAMlDBD</recordid><startdate>20241223</startdate><enddate>20241223</enddate><creator>Jung, Youngmoon</creator><creator>Lee, Jinyoung</creator><creator>Lee, Seungjin</creator><creator>Jung, Myunghun</creator><creator>Lee, Yong-Hyeok</creator><creator>Cho, Hoon-Young</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241223</creationdate><title>Text-Aware Adapter for Few-Shot Keyword Spotting</title><author>Jung, Youngmoon ; Lee, Jinyoung ; Lee, Seungjin ; Jung, Myunghun ; Lee, Yong-Hyeok ; Cho, Hoon-Young</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_181423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Jung, Youngmoon</creatorcontrib><creatorcontrib>Lee, Jinyoung</creatorcontrib><creatorcontrib>Lee, Seungjin</creatorcontrib><creatorcontrib>Jung, Myunghun</creatorcontrib><creatorcontrib>Lee, Yong-Hyeok</creatorcontrib><creatorcontrib>Cho, Hoon-Young</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jung, Youngmoon</au><au>Lee, Jinyoung</au><au>Lee, Seungjin</au><au>Jung, Myunghun</au><au>Lee, Yong-Hyeok</au><au>Cho, Hoon-Young</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Text-Aware Adapter for Few-Shot Keyword Spotting</atitle><date>2024-12-23</date><risdate>2024</risdate><abstract>Recent advances in flexible keyword spotting (KWS) with text enrollment allow
users to personalize keywords without uttering them during enrollment. However,
there is still room for improvement in target keyword performance. In this
work, we propose a novel few-shot transfer learning method, called text-aware
adapter (TA-adapter), designed to enhance a pre-trained flexible KWS model for
specific keywords with limited speech samples. To adapt the acoustic encoder,
we leverage a jointly pre-trained text encoder to generate a text embedding
that acts as a representative vector for the keyword. By fine-tuning only a
small portion of the network while keeping the core components' weights intact,
the TA-adapter proves highly efficient for few-shot KWS, enabling a seamless
return to the original pre-trained model. In our experiments, the TA-adapter
demonstrated significant performance improvements across 35 distinct keywords
from the Google Speech Commands V2 dataset, with only a 0.14% increase in the
total number of parameters.</abstract><doi>10.48550/arxiv.2412.18142</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.18142 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_18142 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence |
title | Text-Aware Adapter for Few-Shot Keyword Spotting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T19%3A36%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Text-Aware%20Adapter%20for%20Few-Shot%20Keyword%20Spotting&rft.au=Jung,%20Youngmoon&rft.date=2024-12-23&rft_id=info:doi/10.48550/arxiv.2412.18142&rft_dat=%3Carxiv_GOX%3E2412_18142%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |