A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition

A novel speech feature fusion algorithm with independent vector analysis (IVA) and parallel convolutional neural network (PCNN) is proposed for text-independent speaker recognition. Firstly, some different feature types, such as the time domain (TD) features and the frequency domain (FD) features, c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ma, Biao, Xu, Chengben, Zhang, Ye
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ma, Biao Xu, Chengben Zhang, Ye
description	A novel speech feature fusion algorithm with independent vector analysis (IVA) and parallel convolutional neural network (PCNN) is proposed for text-independent speaker recognition. Firstly, some different feature types, such as the time domain (TD) features and the frequency domain (FD) features, can be extracted from a speaker's speech, and the TD and the FD features can be considered as the linear mixtures of independent feature components (IFCs) with an unknown mixing system. To estimate the IFCs, the TD and the FD features of the speaker's speech are concatenated to build the TD and the FD feature matrix, respectively. Then, a feature tensor of the speaker's speech is obtained by paralleling the TD and the FD feature matrix. To enhance the dependence on different feature types and remove the redundancies of the same feature type, the independent vector analysis (IVA) can be used to estimate the IFC matrices of TD and FD features with the feature tensor. The IFC matrices are utilized as the input of the PCNN to extract the deep features of the TD and FD features, respectively. The deep features can be integrated to obtain the fusion feature of the speaker's speech. Finally, the fusion feature of the speaker's speech is employed as the input of a deep convolutional neural network (DCNN) classifier for speaker recognition. The experimental results show the effectiveness and performances of the proposed speaker recognition system.
doi_str_mv	10.48550/arxiv.2212.00329
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2212_00329</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2212_00329</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-c14728c33a4630bc3394e703933615a12332640e49e7996ea1a41d21bf37c1233</originalsourceid><addsrcrecordid>eNotj0FOwzAURL1hgQoHYFVfIMH2d-x6GVUEKlWAIPvIdX9aq2kcuW5Vbk9S2MyMNJqRHiFPnOVyURTs2carv-RCcJEzBsLck8-SvocLdvR7QHR7WqFN54i0Op986GnZ7UL0aX-kbYi0xmvKVv0WBxylT9PIHjDSL3Rh1_s0Th7IXWu7Ez7--4zU1Uu9fMvWH6-rZbnOrNImc1xqsXAAVipgmzEYiZqBAVC8sFwACCUZSoPaGIWWW8m3gm9a0G5qZ2T-d3tDaobojzb-NBNac0ODX4D4R6g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition</title><source>arXiv.org</source><creator>Ma, Biao ; Xu, Chengben ; Zhang, Ye</creator><creatorcontrib>Ma, Biao ; Xu, Chengben ; Zhang, Ye</creatorcontrib><description>A novel speech feature fusion algorithm with independent vector analysis (IVA) and parallel convolutional neural network (PCNN) is proposed for text-independent speaker recognition. Firstly, some different feature types, such as the time domain (TD) features and the frequency domain (FD) features, can be extracted from a speaker's speech, and the TD and the FD features can be considered as the linear mixtures of independent feature components (IFCs) with an unknown mixing system. To estimate the IFCs, the TD and the FD features of the speaker's speech are concatenated to build the TD and the FD feature matrix, respectively. Then, a feature tensor of the speaker's speech is obtained by paralleling the TD and the FD feature matrix. To enhance the dependence on different feature types and remove the redundancies of the same feature type, the independent vector analysis (IVA) can be used to estimate the IFC matrices of TD and FD features with the feature tensor. The IFC matrices are utilized as the input of the PCNN to extract the deep features of the TD and FD features, respectively. The deep features can be integrated to obtain the fusion feature of the speaker's speech. Finally, the fusion feature of the speaker's speech is employed as the input of a deep convolutional neural network (DCNN) classifier for speaker recognition. The experimental results show the effectiveness and performances of the proposed speaker recognition system.</description><identifier>DOI: 10.48550/arxiv.2212.00329</identifier><language>eng</language><creationdate>2022-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2212.00329$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2212.00329$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Biao</creatorcontrib><creatorcontrib>Xu, Chengben</creatorcontrib><creatorcontrib>Zhang, Ye</creatorcontrib><title>A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition</title><description>A novel speech feature fusion algorithm with independent vector analysis (IVA) and parallel convolutional neural network (PCNN) is proposed for text-independent speaker recognition. Firstly, some different feature types, such as the time domain (TD) features and the frequency domain (FD) features, can be extracted from a speaker's speech, and the TD and the FD features can be considered as the linear mixtures of independent feature components (IFCs) with an unknown mixing system. To estimate the IFCs, the TD and the FD features of the speaker's speech are concatenated to build the TD and the FD feature matrix, respectively. Then, a feature tensor of the speaker's speech is obtained by paralleling the TD and the FD feature matrix. To enhance the dependence on different feature types and remove the redundancies of the same feature type, the independent vector analysis (IVA) can be used to estimate the IFC matrices of TD and FD features with the feature tensor. The IFC matrices are utilized as the input of the PCNN to extract the deep features of the TD and FD features, respectively. The deep features can be integrated to obtain the fusion feature of the speaker's speech. Finally, the fusion feature of the speaker's speech is employed as the input of a deep convolutional neural network (DCNN) classifier for speaker recognition. The experimental results show the effectiveness and performances of the proposed speaker recognition system.</description><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj0FOwzAURL1hgQoHYFVfIMH2d-x6GVUEKlWAIPvIdX9aq2kcuW5Vbk9S2MyMNJqRHiFPnOVyURTs2carv-RCcJEzBsLck8-SvocLdvR7QHR7WqFN54i0Op986GnZ7UL0aX-kbYi0xmvKVv0WBxylT9PIHjDSL3Rh1_s0Th7IXWu7Ez7--4zU1Uu9fMvWH6-rZbnOrNImc1xqsXAAVipgmzEYiZqBAVC8sFwACCUZSoPaGIWWW8m3gm9a0G5qZ2T-d3tDaobojzb-NBNac0ODX4D4R6g</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Ma, Biao</creator><creator>Xu, Chengben</creator><creator>Zhang, Ye</creator><scope>GOX</scope></search><sort><creationdate>20221201</creationdate><title>A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition</title><author>Ma, Biao ; Xu, Chengben ; Zhang, Ye</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-c14728c33a4630bc3394e703933615a12332640e49e7996ea1a41d21bf37c1233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Ma, Biao</creatorcontrib><creatorcontrib>Xu, Chengben</creatorcontrib><creatorcontrib>Zhang, Ye</creatorcontrib><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, Biao</au><au>Xu, Chengben</au><au>Zhang, Ye</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition</atitle><date>2022-12-01</date><risdate>2022</risdate><abstract>A novel speech feature fusion algorithm with independent vector analysis (IVA) and parallel convolutional neural network (PCNN) is proposed for text-independent speaker recognition. Firstly, some different feature types, such as the time domain (TD) features and the frequency domain (FD) features, can be extracted from a speaker's speech, and the TD and the FD features can be considered as the linear mixtures of independent feature components (IFCs) with an unknown mixing system. To estimate the IFCs, the TD and the FD features of the speaker's speech are concatenated to build the TD and the FD feature matrix, respectively. Then, a feature tensor of the speaker's speech is obtained by paralleling the TD and the FD feature matrix. To enhance the dependence on different feature types and remove the redundancies of the same feature type, the independent vector analysis (IVA) can be used to estimate the IFC matrices of TD and FD features with the feature tensor. The IFC matrices are utilized as the input of the PCNN to extract the deep features of the TD and FD features, respectively. The deep features can be integrated to obtain the fusion feature of the speaker's speech. Finally, the fusion feature of the speaker's speech is employed as the input of a deep convolutional neural network (DCNN) classifier for speaker recognition. The experimental results show the effectiveness and performances of the proposed speaker recognition system.</abstract><doi>10.48550/arxiv.2212.00329</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2212.00329
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2212_00329
source	arXiv.org
title	A Novel Speech Feature Fusion Algorithm for Text-Independent Speaker Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T12%3A29%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Novel%20Speech%20Feature%20Fusion%20Algorithm%20for%20Text-Independent%20Speaker%20Recognition&rft.au=Ma,%20Biao&rft.date=2022-12-01&rft_id=info:doi/10.48550/arxiv.2212.00329&rft_dat=%3Carxiv_GOX%3E2212_00329%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true