Direct Spatial-Fourier Regression of HRIRs from Multi-Elevation Continuous-Azimuth Recordings

Individual head-related impulse responses (HRIRs) have been recognized as a key to creating high-fidelity virtual auditory spaces. Thus, fast and comprehensive acquisition of individual HRIRs has been a subject of continued research. Traditional stop-and-go measurement at discrete angles is time con...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2020, Vol.28, p.1129-1142
Hauptverfasser: Urbanietz, Christoph, Enzner, Gerald
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Individual head-related impulse responses (HRIRs) have been recognized as a key to creating high-fidelity virtual auditory spaces. Thus, fast and comprehensive acquisition of individual HRIRs has been a subject of continued research. Traditional stop-and-go measurement at discrete angles is time consuming and additionally requires spatial interpolation that has been tackled by mapping discrete HRIR tables to a spatial Fourier format. Uniformly continuous-azimuth recording with a moving apparatus, on the other hand, reduces acquisition time, but is noise-limited due to a very short observation time per angle. In the interest of both fast acquisition and high accuracy, in this paper, we propose direct retrieval of a spatial Fourier format from continuous-azimuth recordings at multiple simultaneous discrete elevations. Specifically, we fit a generative continuous-azimuth model of the recorded signal, based on the spatial Fourier representation, to the continuous recordings of individuals by least-squares. In this approach, the model is meant to entirely capture the spatial variation of the HRIR in azimuth, while the duration of the recording then systematically controls the noise rejection. The proposed time-domain treatment is free of block artifacts, but is numerically demanding. We outline how to take the special structure of the involved covariance matrices into account. Experimental results with simulated data and real recordings demonstrate that the HRIR performance in terms of binaural cues and reproducibility benefits from the proposed algorithm. Our method is hence practical in terms of low measurement time and high performance, while benefiting from increased computational power of current computers.
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2020.2982291