Encoding Multiple Audio Objects Using Intra-Object Sparsity

Preserving audio scenes in the form of audio objects has become common in recent years. Object-based audio techniques provide more flexibility for personalized rendering as well as a more accurate audio object trajectory. For encoding and transmitting multiple audio objects in a lossy manner, a new...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2015-06, Vol.23 (6), p.1082-1095
Hauptverfasser: Maoshen Jia, Ziyu Yang, Changchun Bao, Xiguang Zheng, Ritz, Christian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Preserving audio scenes in the form of audio objects has become common in recent years. Object-based audio techniques provide more flexibility for personalized rendering as well as a more accurate audio object trajectory. For encoding and transmitting multiple audio objects in a lossy manner, a new compression framework for multiple simultaneously occurring audio objects is presented in this work. The proposed encoding approach is based on the intra-object sparsity (approximate k-sparsity). After establishing a quantitative measure of approximate k-sparsity, statistical analysis is employed to validate the proposed intra-object sparsity of audio objects. By exploring this intra-object sparsity, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. This downmix signal can be further compressed by legacy audio codecs. Meanwhile, the side information is transmitted in a lossless manner. The objective and subjective evaluations revealed that the proposed compression framework achieved better perceptual quality compared to an existing technique where up to eight audio objects are considered. The subjective evaluations also confirmed that the proposed approach is able to achieve scalable transmission according to the bandwidth while preserving the perceptual quality of both the individual audio objects and the spatial audio scenes.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2015.2419980