A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages

In this paper, we present a new keyword spotting technique. A critical issue in keyword spotting is the explicit modeling of the non-keyword portions. To date, most keyword spotters use a set of Hidden Markov Models (HMM) to represent the non-keyword portions. A widely used approach is to split the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2005-04, Vol.45 (4), p.373-386
Hauptverfasser:	Heracleous, Panikos, Shimizu, Tohru
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Exact sciences and technology Garbage hidden Markov models Information, signal and communications theory Keyword spotting Signal processing Speech processing Telecommunications and information theory Telephone speech Two-pass
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	386
container_issue	4
container_start_page	373
container_title	Speech communication
container_volume	45
creator	Heracleous, Panikos Shimizu, Tohru
description	In this paper, we present a new keyword spotting technique. A critical issue in keyword spotting is the explicit modeling of the non-keyword portions. To date, most keyword spotters use a set of Hidden Markov Models (HMM) to represent the non-keyword portions. A widely used approach is to split the training data into keyword and non-keyword data. The keywords are represented by HMMs trained using the keyword speech, and the garbage models are trained using the non-keyword speech. The main disadvantage of this method is the task dependence. Another approach is to use a common set of acoustic models for both keywords and garbage models. However, this method faces a major problem. In a keyword spotter, the garbage models are usually connected to allow any sequence. Therefore, the keywords are also included in these sequences. When the same training data are used for keyword and garbage models, the garbage models also cover the keywords. In order to overcome these problems, we propose a new method for modeling the non-keyword intervals. In our method, the garbage models are phonemic HMMs trained using a speech corpus of a language other than—but acoustically similar to—the target language. In our work, the target language is Japanese and, due to the high similarity, English was chosen as the ‘garbage language’ for training the garbage models. Using English garbage models—instead of Japanese—our method achieves higher performance, compared with when Japanese garbage models are used. Moreover, parameter tuning (e.g., word insertion penalty) does not have a serious effect on the performance when English garbage models are used. Using clean telephone speech test data and a vocabulary of 100 keywords, we achieved a 7.9% equal error rate which is a very promising result. In this paper we also introduce results obtained using several sizes of vocabulary, and we investigate the selection of the most appropriate garbage model set. In addition to the Japanese keyword spotting system, we also introduce results of an English keyword spotter. By using Japanese garbage models—instead of English—we achieved significant improvement. Using telephone speech test data and a vocabulary of 25 keywords the achieved Figure of Merit (FOM) was 74.7% compared to 68.9% when English garbage models were used.
doi_str_mv	10.1016/j.specom.2004.10.016
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85616024</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639304001475</els_id><sourcerecordid>29502483</sourcerecordid><originalsourceid>FETCH-LOGICAL-c378t-538bdce8225f5a197b6d810eab8a6db10378a581eda39e526cdb824611a3fc873</originalsourceid><addsrcrecordid>eNqNkT-P1DAQxS0EEsvBN6BwA10W_0kcp0E6nThAOokGamtiTxYvSRzs7MJ9eybaQ3RAZes3b55n_Bh7KcVeCmneHPdlQZ-mvRKiJrQn-IjtpG1V1UqrHrMdkbYyutNP2bNSjoKE1qodO1_zOZ1x5LAsOYH_yoeU-ZQCjnE-UG2uvuH9j5QDj_OK-QxjoRsH_huXJa1U4PhzGVNcty7w6VTW6HmJUxwhE8XC08BHmA8nOGB5zp4M5IQvHs4r9uX23eebD9Xdp_cfb67vKq9bu1aNtn3waJVqhgZk1_YmWCkQegsm9FKQChorMYDusFHGh96q2kgJevC21Vfs9cWXlvt-wrK6KRaPIw2CNKOzjZFGqPo_hFrbtrb_FKquIT-rSVhfhD6nUjIObslxgnzvpHBbbO7oLrG5LbaNEqS2Vw_-UDyMQ4bZx_Kn15iuNvW22duLDun7zhGzKz7i7DHEjH51IcW_P_QLTfyyKQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>29502483</pqid></control><display><type>article</type><title>A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Heracleous, Panikos ; Shimizu, Tohru</creator><creatorcontrib>Heracleous, Panikos ; Shimizu, Tohru</creatorcontrib><description>In this paper, we present a new keyword spotting technique. A critical issue in keyword spotting is the explicit modeling of the non-keyword portions. To date, most keyword spotters use a set of Hidden Markov Models (HMM) to represent the non-keyword portions. A widely used approach is to split the training data into keyword and non-keyword data. The keywords are represented by HMMs trained using the keyword speech, and the garbage models are trained using the non-keyword speech. The main disadvantage of this method is the task dependence. Another approach is to use a common set of acoustic models for both keywords and garbage models. However, this method faces a major problem. In a keyword spotter, the garbage models are usually connected to allow any sequence. Therefore, the keywords are also included in these sequences. When the same training data are used for keyword and garbage models, the garbage models also cover the keywords. In order to overcome these problems, we propose a new method for modeling the non-keyword intervals. In our method, the garbage models are phonemic HMMs trained using a speech corpus of a language other than—but acoustically similar to—the target language. In our work, the target language is Japanese and, due to the high similarity, English was chosen as the ‘garbage language’ for training the garbage models. Using English garbage models—instead of Japanese—our method achieves higher performance, compared with when Japanese garbage models are used. Moreover, parameter tuning (e.g., word insertion penalty) does not have a serious effect on the performance when English garbage models are used. Using clean telephone speech test data and a vocabulary of 100 keywords, we achieved a 7.9% equal error rate which is a very promising result. In this paper we also introduce results obtained using several sizes of vocabulary, and we investigate the selection of the most appropriate garbage model set. In addition to the Japanese keyword spotting system, we also introduce results of an English keyword spotter. By using Japanese garbage models—instead of English—we achieved significant improvement. Using telephone speech test data and a vocabulary of 25 keywords the achieved Figure of Merit (FOM) was 74.7% compared to 68.9% when English garbage models were used.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2004.10.016</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Exact sciences and technology ; Garbage hidden Markov models ; Information, signal and communications theory ; Keyword spotting ; Signal processing ; Speech processing ; Telecommunications and information theory ; Telephone speech ; Two-pass</subject><ispartof>Speech communication, 2005-04, Vol.45 (4), p.373-386</ispartof><rights>2005 Elsevier B.V.</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c378t-538bdce8225f5a197b6d810eab8a6db10378a581eda39e526cdb824611a3fc873</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2004.10.016$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16694647$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Heracleous, Panikos</creatorcontrib><creatorcontrib>Shimizu, Tohru</creatorcontrib><title>A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages</title><title>Speech communication</title><description>In this paper, we present a new keyword spotting technique. A critical issue in keyword spotting is the explicit modeling of the non-keyword portions. To date, most keyword spotters use a set of Hidden Markov Models (HMM) to represent the non-keyword portions. A widely used approach is to split the training data into keyword and non-keyword data. The keywords are represented by HMMs trained using the keyword speech, and the garbage models are trained using the non-keyword speech. The main disadvantage of this method is the task dependence. Another approach is to use a common set of acoustic models for both keywords and garbage models. However, this method faces a major problem. In a keyword spotter, the garbage models are usually connected to allow any sequence. Therefore, the keywords are also included in these sequences. When the same training data are used for keyword and garbage models, the garbage models also cover the keywords. In order to overcome these problems, we propose a new method for modeling the non-keyword intervals. In our method, the garbage models are phonemic HMMs trained using a speech corpus of a language other than—but acoustically similar to—the target language. In our work, the target language is Japanese and, due to the high similarity, English was chosen as the ‘garbage language’ for training the garbage models. Using English garbage models—instead of Japanese—our method achieves higher performance, compared with when Japanese garbage models are used. Moreover, parameter tuning (e.g., word insertion penalty) does not have a serious effect on the performance when English garbage models are used. Using clean telephone speech test data and a vocabulary of 100 keywords, we achieved a 7.9% equal error rate which is a very promising result. In this paper we also introduce results obtained using several sizes of vocabulary, and we investigate the selection of the most appropriate garbage model set. In addition to the Japanese keyword spotting system, we also introduce results of an English keyword spotter. By using Japanese garbage models—instead of English—we achieved significant improvement. Using telephone speech test data and a vocabulary of 25 keywords the achieved Figure of Merit (FOM) was 74.7% compared to 68.9% when English garbage models were used.</description><subject>Applied sciences</subject><subject>Exact sciences and technology</subject><subject>Garbage hidden Markov models</subject><subject>Information, signal and communications theory</subject><subject>Keyword spotting</subject><subject>Signal processing</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>Telephone speech</subject><subject>Two-pass</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNqNkT-P1DAQxS0EEsvBN6BwA10W_0kcp0E6nThAOokGamtiTxYvSRzs7MJ9eybaQ3RAZes3b55n_Bh7KcVeCmneHPdlQZ-mvRKiJrQn-IjtpG1V1UqrHrMdkbYyutNP2bNSjoKE1qodO1_zOZ1x5LAsOYH_yoeU-ZQCjnE-UG2uvuH9j5QDj_OK-QxjoRsH_huXJa1U4PhzGVNcty7w6VTW6HmJUxwhE8XC08BHmA8nOGB5zp4M5IQvHs4r9uX23eebD9Xdp_cfb67vKq9bu1aNtn3waJVqhgZk1_YmWCkQegsm9FKQChorMYDusFHGh96q2kgJevC21Vfs9cWXlvt-wrK6KRaPIw2CNKOzjZFGqPo_hFrbtrb_FKquIT-rSVhfhD6nUjIObslxgnzvpHBbbO7oLrG5LbaNEqS2Vw_-UDyMQ4bZx_Kn15iuNvW22duLDun7zhGzKz7i7DHEjH51IcW_P_QLTfyyKQ</recordid><startdate>20050401</startdate><enddate>20050401</enddate><creator>Heracleous, Panikos</creator><creator>Shimizu, Tohru</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20050401</creationdate><title>A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages</title><author>Heracleous, Panikos ; Shimizu, Tohru</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c378t-538bdce8225f5a197b6d810eab8a6db10378a581eda39e526cdb824611a3fc873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Exact sciences and technology</topic><topic>Garbage hidden Markov models</topic><topic>Information, signal and communications theory</topic><topic>Keyword spotting</topic><topic>Signal processing</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>Telephone speech</topic><topic>Two-pass</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heracleous, Panikos</creatorcontrib><creatorcontrib>Shimizu, Tohru</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Heracleous, Panikos</au><au>Shimizu, Tohru</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages</atitle><jtitle>Speech communication</jtitle><date>2005-04-01</date><risdate>2005</risdate><volume>45</volume><issue>4</issue><spage>373</spage><epage>386</epage><pages>373-386</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>In this paper, we present a new keyword spotting technique. A critical issue in keyword spotting is the explicit modeling of the non-keyword portions. To date, most keyword spotters use a set of Hidden Markov Models (HMM) to represent the non-keyword portions. A widely used approach is to split the training data into keyword and non-keyword data. The keywords are represented by HMMs trained using the keyword speech, and the garbage models are trained using the non-keyword speech. The main disadvantage of this method is the task dependence. Another approach is to use a common set of acoustic models for both keywords and garbage models. However, this method faces a major problem. In a keyword spotter, the garbage models are usually connected to allow any sequence. Therefore, the keywords are also included in these sequences. When the same training data are used for keyword and garbage models, the garbage models also cover the keywords. In order to overcome these problems, we propose a new method for modeling the non-keyword intervals. In our method, the garbage models are phonemic HMMs trained using a speech corpus of a language other than—but acoustically similar to—the target language. In our work, the target language is Japanese and, due to the high similarity, English was chosen as the ‘garbage language’ for training the garbage models. Using English garbage models—instead of Japanese—our method achieves higher performance, compared with when Japanese garbage models are used. Moreover, parameter tuning (e.g., word insertion penalty) does not have a serious effect on the performance when English garbage models are used. Using clean telephone speech test data and a vocabulary of 100 keywords, we achieved a 7.9% equal error rate which is a very promising result. In this paper we also introduce results obtained using several sizes of vocabulary, and we investigate the selection of the most appropriate garbage model set. In addition to the Japanese keyword spotting system, we also introduce results of an English keyword spotter. By using Japanese garbage models—instead of English—we achieved significant improvement. Using telephone speech test data and a vocabulary of 25 keywords the achieved Figure of Merit (FOM) was 74.7% compared to 68.9% when English garbage models were used.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2004.10.016</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-6393
ispartof	Speech communication, 2005-04, Vol.45 (4), p.373-386
issn	0167-6393 1872-7182
language	eng
recordid	cdi_proquest_miscellaneous_85616024
source	ScienceDirect Journals (5 years ago - present)
subjects	Applied sciences Exact sciences and technology Garbage hidden Markov models Information, signal and communications theory Keyword spotting Signal processing Speech processing Telecommunications and information theory Telephone speech Two-pass
title	A novel approach for modeling non-keyword intervals in a keyword spotter exploiting acoustic similarities of languages
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T04%3A27%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20approach%20for%20modeling%20non-keyword%20intervals%20in%20a%20keyword%20spotter%20exploiting%20acoustic%20similarities%20of%20languages&rft.jtitle=Speech%20communication&rft.au=Heracleous,%20Panikos&rft.date=2005-04-01&rft.volume=45&rft.issue=4&rft.spage=373&rft.epage=386&rft.pages=373-386&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2004.10.016&rft_dat=%3Cproquest_cross%3E29502483%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=29502483&rft_id=info:pmid/&rft_els_id=S0167639304001475&rfr_iscdi=true