Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks

This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is it...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Khademian, Mahdi, Homayounpour, Mohammad Mehdi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Khademian, Mahdi Homayounpour, Mohammad Mehdi
description	This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement is substantial when we consider the simplicity of joint-state posterior extraction provided by deep neural networks.
doi_str_mv	10.48550/arxiv.1707.02661
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1707_02661</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1707_02661</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-ddc9abef4639ef476c6dfa30f81af93d1e00f4fac9126d67eabeb18f4b8b96123</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwKr-gQQ7Se1kiUrDQ6VUavfRjX1dLEIc2Q6PvycNbGY00mg0h5AbztKiXK3YLfhv-5lyyWTKMiH4JbE1Qhw90mdn-5gcIkSkexcieus83YRoPyBa11Pb0xpUdN5CRw8Donqje-8UhmD7E31xGrtAxzncIw50h6OfqjuMX86_hytyYaALeP3vC3KsN8f1Y7J9fXha320TEJInWqsKWjSFyKtJpVBCG8iZKTmYKtccGTOFAVXxTGghcSq3vDRFW7aV4Fm-IMu_2Zm1Gfz03_80Z-ZmZs5_Ac5SVFQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks</title><source>arXiv.org</source><creator>Khademian, Mahdi ; Homayounpour, Mohammad Mehdi</creator><creatorcontrib>Khademian, Mahdi ; Homayounpour, Mohammad Mehdi</creatorcontrib><description>This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement is substantial when we consider the simplicity of joint-state posterior extraction provided by deep neural networks.</description><identifier>DOI: 10.48550/arxiv.1707.02661</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2017-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1707.02661$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1707.02661$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Khademian, Mahdi</creatorcontrib><creatorcontrib>Homayounpour, Mohammad Mehdi</creatorcontrib><title>Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks</title><description>This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement is substantial when we consider the simplicity of joint-state posterior extraction provided by deep neural networks.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwKr-gQQ7Se1kiUrDQ6VUavfRjX1dLEIc2Q6PvycNbGY00mg0h5AbztKiXK3YLfhv-5lyyWTKMiH4JbE1Qhw90mdn-5gcIkSkexcieus83YRoPyBa11Pb0xpUdN5CRw8Donqje-8UhmD7E31xGrtAxzncIw50h6OfqjuMX86_hytyYaALeP3vC3KsN8f1Y7J9fXha320TEJInWqsKWjSFyKtJpVBCG8iZKTmYKtccGTOFAVXxTGghcSq3vDRFW7aV4Fm-IMu_2Zm1Gfz03_80Z-ZmZs5_Ac5SVFQ</recordid><startdate>20170709</startdate><enddate>20170709</enddate><creator>Khademian, Mahdi</creator><creator>Homayounpour, Mohammad Mehdi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20170709</creationdate><title>Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks</title><author>Khademian, Mahdi ; Homayounpour, Mohammad Mehdi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-ddc9abef4639ef476c6dfa30f81af93d1e00f4fac9126d67eabeb18f4b8b96123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Khademian, Mahdi</creatorcontrib><creatorcontrib>Homayounpour, Mohammad Mehdi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Khademian, Mahdi</au><au>Homayounpour, Mohammad Mehdi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks</atitle><date>2017-07-09</date><risdate>2017</risdate><abstract>This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement is substantial when we consider the simplicity of joint-state posterior extraction provided by deep neural networks.</abstract><doi>10.48550/arxiv.1707.02661</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1707.02661
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1707_02661
source	arXiv.org
subjects	Computer Science - Sound
title	Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A59%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Feature%20Joint-State%20Posterior%20Estimation%20in%20Factorial%20Speech%20Processing%20Models%20using%20Deep%20Neural%20Networks&rft.au=Khademian,%20Mahdi&rft.date=2017-07-09&rft_id=info:doi/10.48550/arxiv.1707.02661&rft_dat=%3Carxiv_GOX%3E1707_02661%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true