FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing f...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper introduces FlowMAC, a novel neural audio codec for high-quality
general audio compression at low bit rates based on conditional flow matching
(CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder.
At inference time the decoder integrates a continuous normalizing flow via an
ODE solver to generate a high-quality mel spectrogram. This is the first time
that a CFM-based approach is applied to general audio coding, enabling a
scalable, simple and memory efficient training. Our subjective evaluations show
that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based
and DDPM-based neural audio codecs at double the bit rate. Moreover, FlowMAC
offers a tunable inference pipeline, which permits to trade off complexity and
quality. This enables real-time coding on CPU, while maintaining high
perceptual quality. |
---|---|
DOI: | 10.48550/arxiv.2409.17635 |