Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks
This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is it...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a new method for calculating joint-state posteriors of
mixed-audio features using deep neural networks to be used in factorial speech
processing models. The joint-state posterior information is required in
factorial models to perform joint-decoding. The novelty of this work is its
architecture which enables the network to infer joint-state posteriors from the
pairs of state posteriors of stereo features. This paper defines an objective
function to solve an underdetermined system of equations, which is used by the
network for extracting joint-state posteriors. It develops the required
expressions for fine-tuning the network in a unified way. The experiments
compare the proposed network decoding results to those of the vector Taylor
series method and show 2.3% absolute performance improvement in the monaural
speech separation and recognition challenge. This achievement is substantial
when we consider the simplicity of joint-state posterior extraction provided by
deep neural networks. |
---|---|
DOI: | 10.48550/arxiv.1707.02661 |