Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models
This paper presents an unsupervised method for single-channel blind dereverberation and room impulse response (RIR) estimation, called BUDDy. The algorithm is rooted in Bayesian posterior sampling: it combines a likelihood model enforcing fidelity to the reverberant measurement, and an anechoic spee...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents an unsupervised method for single-channel blind
dereverberation and room impulse response (RIR) estimation, called BUDDy. The
algorithm is rooted in Bayesian posterior sampling: it combines a likelihood
model enforcing fidelity to the reverberant measurement, and an anechoic speech
prior implemented by an unconditional diffusion model. We design a parametric
filter representing the RIR, with exponential decay for each frequency subband.
Room acoustics estimation and speech dereverberation are jointly carried out,
as the filter parameters are iteratively estimated and the speech utterance
refined along the reverse diffusion trajectory. In a blind scenario where the
room impulse response is unknown, BUDDy successfully performs speech
dereverberation in various acoustic scenarios, significantly outperforming
other blind unsupervised baselines. Unlike supervised methods, which often
struggle to generalize, BUDDy seamlessly adapts to different acoustic
conditions. This paper extends our previous work by offering new experimental
results and insights into the algorithm's performance and versatility. We first
investigate the robustness of informed dereverberation methods to RIR
estimation errors, to motivate the joint acoustic estimation and
dereverberation paradigm. Then, we demonstrate the adaptability of our method
to high-resolution singing voice dereverberation, study its performance in RIR
estimation, and conduct subjective evaluation experiments to validate the
perceptual quality of the results, among other contributions. Audio samples and
code can be found online. |
---|---|
DOI: | 10.48550/arxiv.2408.07472 |