Singing voice identification using spectral envelope estimation

In this paper, we present a spectrum-based system for singer identification that operates for the ideal case in which audio samples contain only the singer's voice. Our method begins with the computation of a robust estimate of the spectral envelope called the composite transfer function (CTF)....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on speech and audio processing 2004-03, Vol.12 (2), p.100-109
Hauptverfasser:	Bartsch, M.A., Wakefield, G.H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Amplitude estimation Applied sciences Classifiers Detection, estimation, filtering, equalization, prediction Envelopes Exact sciences and technology Filtering theory Filters Frequency estimation Frequency ranges Information, signal and communications theory Instruments Music information retrieval Musicians & conductors Robustness Signal and communications theory Signal processing Signal, noise Spectra Speech processing Studies Telecommunications and information theory Testing Timbre Transfer functions Voice Vowels
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we present a spectrum-based system for singer identification that operates for the ideal case in which audio samples contain only the singer's voice. Our method begins with the computation of a robust estimate of the spectral envelope called the composite transfer function (CTF). The CTF is derived from the instantaneous amplitude and frequency of the sinusoidal partials which make up the vocal signal. Unlike traditional source-filter theory , the CTF does not explicitly separate the spectral characteristics of the vocal source and the vocal tract filter. The principal components of the CTFs are used as features for a quadratic classifier to identify singers. The approach is validated on a database containing samples from twelve classically trained singers. In cross validation experiments, test set accuracies of approximately 95% are found for a baseline case. The classifier's performance is not degraded when different vowels are included in classifier training and evaluation. Restricting the frequency range of the CTFs and using a test set containing samples extracted from solo performances of Italian arias reduces the test set accuracy to 70-80%.
ISSN:	1063-6676 2329-9290 1558-2353 2329-9304
DOI:	10.1109/TSA.2003.822637