Towards Link Characterization From Content: Recovering Distributions From Classifier Output

In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such esti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2008-05, Vol.16 (4), p.847-858
Hauptverfasser:	Grothendieck, John, Gorin, Allen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Classifiers Diseases Error correction Errors Estimates Exact sciences and technology Hoses Humans Information, signal and communications theory Knowledge acquisition Monte Carlo methods Natural languages Numerical analysis Pattern classification Pattern recognition Reduction Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speaker recognition Speech Speech processing Streaming media Telecommunications and information theory Testing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	858
container_issue	4
container_start_page	847
container_title	IEEE transactions on audio, speech, and language processing
container_volume	16
creator	Grothendieck, John Gorin, Allen
description	In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. We describe a numerical method, the Metropolis-Hastings (M-H) algorithm, which provides a Bayes estimator for the distribution. We experimentally evaluate this algorithm for a speaker recognition task, demonstrating a fivefold reduction in root mean squared error.
doi_str_mv	10.1109/TASL.2008.920060
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASL_2008_920060</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4489998</ieee_id><sourcerecordid>1671299654</sourcerecordid><originalsourceid>FETCH-LOGICAL-c384t-1db3f05a1ce957853173ad6dfd50e59aa1ed658335a28e4a9cd803b2e94b69643</originalsourceid><addsrcrecordid>eNp9kc1r20AQxUVooGnSeyEXEWjpxc6u9kM7vQW3SQqGQOueeljGq1G7iax1dlcJ7V9fCRsfeshlZuD95sHjFcU7zuacM7hcXX1fzivGzBzGqdlRccKVMrMaKvnqcHP9uniT0j1jUmjJT4qfq_CMsUnl0vcP5eI3RnSZov-L2Ye-vI5hUy5Cn6nPn8pv5MLTKPa_ys8-5ejXw0SlPdZhSr71FMu7IW-HfFYct9glervfp8WP6y-rxe1seXfzdXG1nDlhZJ7xZi1appA7AlUbJXgtsNFN2yhGChA5NVoZIRRWhiSCawwT64pArjVoKU6LDzvfbQyPA6VsNz456jrsKQzJCj1mlQJG8OOLINc1rwC0mjwv_kPvwxD7MYYFXksGmqsRYjvIxZBSpNZuo99g_GM5s1MrdmrFTq3YXSvjy_u9LyaHXRuxdz4d_ipWQa01H7nzHeeJ6CBLaQDAiH8Om5Vt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917409615</pqid></control><display><type>article</type><title>Towards Link Characterization From Content: Recovering Distributions From Classifier Output</title><source>IEEE Electronic Library (IEL)</source><creator>Grothendieck, John ; Gorin, Allen</creator><creatorcontrib>Grothendieck, John ; Gorin, Allen</creatorcontrib><description>In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. We describe a numerical method, the Metropolis-Hastings (M-H) algorithm, which provides a Bayes estimator for the distribution. We experimentally evaluate this algorithm for a speaker recognition task, demonstrating a fivefold reduction in root mean squared error.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2008.920060</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Algorithms ; Applied sciences ; Classifiers ; Diseases ; Error correction ; Errors ; Estimates ; Exact sciences and technology ; Hoses ; Humans ; Information, signal and communications theory ; Knowledge acquisition ; Monte Carlo methods ; Natural languages ; Numerical analysis ; Pattern classification ; Pattern recognition ; Reduction ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; Speaker recognition ; Speech ; Speech processing ; Streaming media ; Telecommunications and information theory ; Testing</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2008-05, Vol.16 (4), p.847-858</ispartof><rights>2008 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c384t-1db3f05a1ce957853173ad6dfd50e59aa1ed658335a28e4a9cd803b2e94b69643</citedby><cites>FETCH-LOGICAL-c384t-1db3f05a1ce957853173ad6dfd50e59aa1ed658335a28e4a9cd803b2e94b69643</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4489998$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4489998$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20297661$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Grothendieck, John</creatorcontrib><creatorcontrib>Gorin, Allen</creatorcontrib><title>Towards Link Characterization From Content: Recovering Distributions From Classifier Output</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. We describe a numerical method, the Metropolis-Hastings (M-H) algorithm, which provides a Bayes estimator for the distribution. We experimentally evaluate this algorithm for a speaker recognition task, demonstrating a fivefold reduction in root mean squared error.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Classifiers</subject><subject>Diseases</subject><subject>Error correction</subject><subject>Errors</subject><subject>Estimates</subject><subject>Exact sciences and technology</subject><subject>Hoses</subject><subject>Humans</subject><subject>Information, signal and communications theory</subject><subject>Knowledge acquisition</subject><subject>Monte Carlo methods</subject><subject>Natural languages</subject><subject>Numerical analysis</subject><subject>Pattern classification</subject><subject>Pattern recognition</subject><subject>Reduction</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>Speaker recognition</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Streaming media</subject><subject>Telecommunications and information theory</subject><subject>Testing</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp9kc1r20AQxUVooGnSeyEXEWjpxc6u9kM7vQW3SQqGQOueeljGq1G7iax1dlcJ7V9fCRsfeshlZuD95sHjFcU7zuacM7hcXX1fzivGzBzGqdlRccKVMrMaKvnqcHP9uniT0j1jUmjJT4qfq_CMsUnl0vcP5eI3RnSZov-L2Ye-vI5hUy5Cn6nPn8pv5MLTKPa_ys8-5ejXw0SlPdZhSr71FMu7IW-HfFYct9glervfp8WP6y-rxe1seXfzdXG1nDlhZJ7xZi1appA7AlUbJXgtsNFN2yhGChA5NVoZIRRWhiSCawwT64pArjVoKU6LDzvfbQyPA6VsNz456jrsKQzJCj1mlQJG8OOLINc1rwC0mjwv_kPvwxD7MYYFXksGmqsRYjvIxZBSpNZuo99g_GM5s1MrdmrFTq3YXSvjy_u9LyaHXRuxdz4d_ipWQa01H7nzHeeJ6CBLaQDAiH8Om5Vt</recordid><startdate>20080501</startdate><enddate>20080501</enddate><creator>Grothendieck, John</creator><creator>Gorin, Allen</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080501</creationdate><title>Towards Link Characterization From Content: Recovering Distributions From Classifier Output</title><author>Grothendieck, John ; Gorin, Allen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c384t-1db3f05a1ce957853173ad6dfd50e59aa1ed658335a28e4a9cd803b2e94b69643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Classifiers</topic><topic>Diseases</topic><topic>Error correction</topic><topic>Errors</topic><topic>Estimates</topic><topic>Exact sciences and technology</topic><topic>Hoses</topic><topic>Humans</topic><topic>Information, signal and communications theory</topic><topic>Knowledge acquisition</topic><topic>Monte Carlo methods</topic><topic>Natural languages</topic><topic>Numerical analysis</topic><topic>Pattern classification</topic><topic>Pattern recognition</topic><topic>Reduction</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>Speaker recognition</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Streaming media</topic><topic>Telecommunications and information theory</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grothendieck, John</creatorcontrib><creatorcontrib>Gorin, Allen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Grothendieck, John</au><au>Gorin, Allen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Link Characterization From Content: Recovering Distributions From Classifier Output</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2008-05-01</date><risdate>2008</risdate><volume>16</volume><issue>4</issue><spage>847</spage><epage>858</epage><pages>847-858</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>In processing large volumes of speech and language data, we are often interested in the distribution of languages, speakers, topics, etc. For large data sets, these distributions are typically estimated at a given point in time using pattern classification technology. It is well known that such estimates can be highly biased, especially for rare classes. While these biases have been addressed in some applications, they have thus far been ignored in the speech and language literature. This neglect causes significant error for low-frequency classes. Correcting this biased distribution involves exploiting uncertain knowledge of the classifier error patterns. We describe a numerical method, the Metropolis-Hastings (M-H) algorithm, which provides a Bayes estimator for the distribution. We experimentally evaluate this algorithm for a speaker recognition task, demonstrating a fivefold reduction in root mean squared error.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2008.920060</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1558-7916
ispartof	IEEE transactions on audio, speech, and language processing, 2008-05, Vol.16 (4), p.847-858
issn	1558-7916 2329-9290 1558-7924 2329-9304
language	eng
recordid	cdi_crossref_primary_10_1109_TASL_2008_920060
source	IEEE Electronic Library (IEL)
subjects	Algorithms Applied sciences Classifiers Diseases Error correction Errors Estimates Exact sciences and technology Hoses Humans Information, signal and communications theory Knowledge acquisition Monte Carlo methods Natural languages Numerical analysis Pattern classification Pattern recognition Reduction Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speaker recognition Speech Speech processing Streaming media Telecommunications and information theory Testing
title	Towards Link Characterization From Content: Recovering Distributions From Classifier Output
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T12%3A53%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Link%20Characterization%20From%20Content:%20Recovering%20Distributions%20From%20Classifier%20Output&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Grothendieck,%20John&rft.date=2008-05-01&rft.volume=16&rft.issue=4&rft.spage=847&rft.epage=858&rft.pages=847-858&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2008.920060&rft_dat=%3Cproquest_RIE%3E1671299654%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917409615&rft_id=info:pmid/&rft_ieee_id=4489998&rfr_iscdi=true