Encoding navigable speech sources: An analysis by synthesis approach

This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xiguang Zheng, Ritz, C., Jiangtao Xi
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can be compressed with an existing speech coder. The individual speech sources can be separated from the received compressed mixture, which allows the listener to determine the active sources and their spatial locations at the reproduction site. The approach was applied to the compression of a series of speech soundfields created from multiple clean speech sentences and real meeting recordings, where each sound-field contained four participants with up to three simultaneous speech sources. At a total bit rate of 48 kbps, the perceptual quality of each decoded speech source, as judged by subjective listening tests, was found to be significantly better than either a non-a-by-s approach or separate encoding of each source at the same overall total bit rate. Subjective listening tests also confirm that the quality of the spatialised speech scene is maintained as well.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2012.6287902