Subjective speech-to-noise ratio as a measure of speech quality for digital waveform coders

The ultimate performance measure for evaluating voice communication systems is the subjective quality of the received speech. Modern digital speech-coding techniques achieve high intelligibility and significant transmission economies. The high level of speech intelligibility is a necessary but insuf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 1982-10, Vol.72 (4), p.1136-1144
Hauptverfasser: Nakatsui, M, Mermelstein, P
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The ultimate performance measure for evaluating voice communication systems is the subjective quality of the received speech. Modern digital speech-coding techniques achieve high intelligibility and significant transmission economies. The high level of speech intelligibility is a necessary but insufficient condition for user acceptance of the systems. Quality, as well, must meet acceptability criteria. However, no adequate single measure of overall speech quality has yet been developed. This work takes a utilitarian approach in attempting to satisfy the urgent requirement for a practical measurement method. The subjective speech-to-noise-ratio (SNR), derived from the forced-choice pair-comparison test using the psychometric analysis procedure commonly used in the method of constants, is evaluated. A speech signal degraded by varying amounts of multiplicative white noise is selected as the reference system in the test. Seven types of digital speech coders are simulated and evaluated in this study, including log-PCM, ADM, ADPCM coders with variable or fixed predictor, APC, residual-excited and pitch-excited LP coders (RELP and LPC). Thirteen configurations of these coders covering the transmission bit rates of 2.4 to 64 kb/s are included. Pair-comparison tests were conducted in two separate sessions 14 months apart using different groups of speakers and listeners. The subjective SNR estimated from 13 coder configurations ranges from 7 to 40 dB and well represents overall speech quality in a single dimension. No significant speaker and listener variation is found for a wide range of waveform coders. The subjective SNR estimate is found to be highly reproducible with different speakers and listeners. Arbitrary selection of as few as five listeners yields a stable subjective SNR estimate for the waveform coders. On the other hand, highly significant listener variation is found for the narrow-band digital vocoders (RELP and LPC). This listener variability reflects a limitation of the measure that may prevent its extension to vocoded speech whose distortions differ significantly from those of the reference speech.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.388323