Detection of interactive voice response (IVR) in phone call records
Separation of pre-recorded messages (Interactive Voice Response, IVR) from live speech fragments in real-time plays a significant role in speech emotion recognition (SER) systems, unwanted calls filtering, automatic detection of answering machine responses, reduction of stored record sizes, voice ma...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2020-12, Vol.23 (4), p.907-915 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Separation of pre-recorded messages (Interactive Voice Response, IVR) from live speech fragments in real-time plays a significant role in speech emotion recognition (SER) systems, unwanted calls filtering, automatic detection of answering machine responses, reduction of stored record sizes, voice mail spam filtration, etc. The problem complexity is that, unlike with silent, music, and noise fragments studied by the conventional voice activity recognition (VAD), IVR usually contains speech. Three classifiers for live speech fragments detection in phone call records are considered: based on the support vector machine (SVM), gradient boosting (XGBoost) and convolutional neural network (CNN). The Geneva Minimalistic Acoustic Parameter Set for XGBoost and SVM, and log-spectrograms and gammatonegrams for CNN were used for feature representation of audio fragments. Experiments with a dataset of phone calls demonstrate comparable quality (around 0.96 according to the F1-averaged measure) of the considered algorithms with CNN having a advantage (0.98). |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-020-09754-3 |