Multi-band summary correlogram-based pitch detection for noisy speech

•We propose a multi-band summary correlogram (MBSC)-based pitch detector for noisy speech.•A set of comb-filters is applied to each subband signal/envelope stream.•Novel channel and stream-weighting schemes are used to enhance the max. MBSC peak.•Noise-robust voiced/unvoiced detection is achieved wi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2013-09, Vol.55 (7-8), p.841-856
Hauptverfasser:	Tan, Lee Ngee, Alwan, Abeer
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Comb-filter Constants Correlogram Error detection Multi-band Narrowband Noise-robust PDA Pitch detection Pitch estimation Speech Streams Summaries
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We propose a multi-band summary correlogram (MBSC)-based pitch detector for noisy speech.•A set of comb-filters is applied to each subband signal/envelope stream.•Novel channel and stream-weighting schemes are used to enhance the max. MBSC peak.•Noise-robust voiced/unvoiced detection is achieved with a constant threshold scheme.•The MBSC has the lowest error rates on average among the algorithms evaluated. A multi-band summary correlogram (MBSC)-based pitch detection algorithm (PDA) is proposed. The PDA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSC’s peaks at the most likely pitch period. These peak-enhancement schemes include comb-filter channel-weighting to yield each individual subband’s summary correlogram (SC) stream, and stream-reliability-weighting to combine these SCs into a single MBSC. V/UV detection is performed by applying a constant threshold on the maximum peak of the enhanced MBSC. Narrowband noisy speech sampled at 8kHz are generated from Keele (development set) and CSTR – Centre for Speech Technology Research-(evaluation set) corpora. Both 4-kHz fullband speech, and G.712-filtered telephone speech are simulated. When evaluated solely on pitch estimation accuracy, assuming voicing detection is perfect, the proposed algorithm has the lowest gross pitch error for noisy speech in the evaluation set among the algorithms evaluated (RAPT, YIN, etc.). The proposed PDA also achieves the lowest average pitch detection error, when both pitch estimation and voicing detection errors are taken into account.
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2013.03.001