Human and Machine Speaker Recognition Based on Short Trivial Events

Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. Howev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Miao, Kang, Xiaofei, Wang, Yanqing, Li, Lantian, Tang, Zhiyuan, Dai, Haisheng, Wang, Dong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Neural and Evolutionary Computing Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Miao Kang, Xiaofei Wang, Yanqing Li, Lantian Tang, Zhiyuan Dai, Haisheng Wang, Dong
description	Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.
doi_str_mv	10.48550/arxiv.1711.05443
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1711_05443</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1711_05443</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-c0fa408946cb574ef40f77e888cbba10af2187e852e5aba6c1b8f24c7406b2153</originalsourceid><addsrcrecordid>eNotz8FuwjAQBFBfOFTQD-ip_oGkdrKOzREiWiqBKpXco7VZFwtwkJNG7d-X0p5m5jLSY-xBihyMUuIJ01cYc6mlzIUCKO9Yvf48Y-QY93yL7hAi8d2F8EiJv5PrPmIYQhf5Enva82vZHbo08CaFMeCJr0aKQz9jE4-nnu7_c8qa51VTr7PN28trvdhkWOkyc8IjCDOHylmlgTwIrzUZY5y1KAX6QprrVgUptFg5aY0vwGkQlS2kKqfs8e_2pmgvKZwxfbe_mvamKX8AoG5EVA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Human and Machine Speaker Recognition Based on Short Trivial Events</title><source>arXiv.org</source><creator>Zhang, Miao ; Kang, Xiaofei ; Wang, Yanqing ; Li, Lantian ; Tang, Zhiyuan ; Dai, Haisheng ; Wang, Dong</creator><creatorcontrib>Zhang, Miao ; Kang, Xiaofei ; Wang, Yanqing ; Li, Lantian ; Tang, Zhiyuan ; Dai, Haisheng ; Wang, Dong</creatorcontrib><description>Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.</description><identifier>DOI: 10.48550/arxiv.1711.05443</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Neural and Evolutionary Computing ; Computer Science - Sound</subject><creationdate>2017-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1711.05443$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1711.05443$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Kang, Xiaofei</creatorcontrib><creatorcontrib>Wang, Yanqing</creatorcontrib><creatorcontrib>Li, Lantian</creatorcontrib><creatorcontrib>Tang, Zhiyuan</creatorcontrib><creatorcontrib>Dai, Haisheng</creatorcontrib><creatorcontrib>Wang, Dong</creatorcontrib><title>Human and Machine Speaker Recognition Based on Short Trivial Events</title><description>Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8FuwjAQBFBfOFTQD-ip_oGkdrKOzREiWiqBKpXco7VZFwtwkJNG7d-X0p5m5jLSY-xBihyMUuIJ01cYc6mlzIUCKO9Yvf48Y-QY93yL7hAi8d2F8EiJv5PrPmIYQhf5Enva82vZHbo08CaFMeCJr0aKQz9jE4-nnu7_c8qa51VTr7PN28trvdhkWOkyc8IjCDOHylmlgTwIrzUZY5y1KAX6QprrVgUptFg5aY0vwGkQlS2kKqfs8e_2pmgvKZwxfbe_mvamKX8AoG5EVA</recordid><startdate>20171115</startdate><enddate>20171115</enddate><creator>Zhang, Miao</creator><creator>Kang, Xiaofei</creator><creator>Wang, Yanqing</creator><creator>Li, Lantian</creator><creator>Tang, Zhiyuan</creator><creator>Dai, Haisheng</creator><creator>Wang, Dong</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20171115</creationdate><title>Human and Machine Speaker Recognition Based on Short Trivial Events</title><author>Zhang, Miao ; Kang, Xiaofei ; Wang, Yanqing ; Li, Lantian ; Tang, Zhiyuan ; Dai, Haisheng ; Wang, Dong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-c0fa408946cb574ef40f77e888cbba10af2187e852e5aba6c1b8f24c7406b2153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Miao</creatorcontrib><creatorcontrib>Kang, Xiaofei</creatorcontrib><creatorcontrib>Wang, Yanqing</creatorcontrib><creatorcontrib>Li, Lantian</creatorcontrib><creatorcontrib>Tang, Zhiyuan</creatorcontrib><creatorcontrib>Dai, Haisheng</creatorcontrib><creatorcontrib>Wang, Dong</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Miao</au><au>Kang, Xiaofei</au><au>Wang, Yanqing</au><au>Li, Lantian</au><au>Tang, Zhiyuan</au><au>Dai, Haisheng</au><au>Wang, Dong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Human and Machine Speaker Recognition Based on Short Trivial Events</atitle><date>2017-11-15</date><risdate>2017</risdate><abstract>Trivial events are ubiquitous in human to human conversations, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and unclear, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, which leads to acceptable equal error rates (EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, 'hmm' seems more speaker discriminative.</abstract><doi>10.48550/arxiv.1711.05443</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1711.05443
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1711_05443
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Neural and Evolutionary Computing Computer Science - Sound
title	Human and Machine Speaker Recognition Based on Short Trivial Events
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T21%3A26%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Human%20and%20Machine%20Speaker%20Recognition%20Based%20on%20Short%20Trivial%20Events&rft.au=Zhang,%20Miao&rft.date=2017-11-15&rft_id=info:doi/10.48550/arxiv.1711.05443&rft_dat=%3Carxiv_GOX%3E1711_05443%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true