DRED: Deep REDundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder
Despite recent advancements in packet loss concealment (PLC) using deep learning techniques, packet loss remains a significant challenge in real-time speech communication. Redundancy has been used in the past to recover the missing information during losses. However, conventional redundancy techniqu...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite recent advancements in packet loss concealment (PLC) using deep
learning techniques, packet loss remains a significant challenge in real-time
speech communication. Redundancy has been used in the past to recover the
missing information during losses. However, conventional redundancy techniques
are limited in the maximum loss duration they can cover and are often
unsuitable for burst packet loss. We propose a new approach based on a
rate-distortion-optimized variational autoencoder (RDO-VAE), allowing us to
optimize a deep speech compression algorithm for the task of encoding large
amounts of redundancy at very low bitrate. The proposed Deep REDundancy (DRED)
algorithm can transmit up to 50x redundancy using less than 32 kb/s. Results
show that DRED outperforms the existing Opus codec redundancy. We also
demonstrate its benefits when operating in the context of WebRTC. |
---|---|
DOI: | 10.48550/arxiv.2212.04453 |