Fusion Multistyle Training for Speaker Identification of Disguised Speech

Determining the speaker of a given speech utterance from a group of people is referred to as speaker identification. When voice disguising is done by a person, which is commonly seen in crime scenes, a mismatch between the training and the test speech data occurs, referred to as mismatched problem....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2019-02, Vol.104 (3), p.895-905
Hauptverfasser: Prasad, Swati, Prasad, Ramjee
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Determining the speaker of a given speech utterance from a group of people is referred to as speaker identification. When voice disguising is done by a person, which is commonly seen in crime scenes, a mismatch between the training and the test speech data occurs, referred to as mismatched problem. It markedly decreases the performance of the speaker identification system. To address this mismatched problem, various multistyle training strategies and a fusion method were previously studied by the authors. This paper further investigates the performance of three multiple-model methods at the decision level for this mismatched problem and compare its performance with the previously studied multistyle training strategies. It is found that the fusion of the two multistyle training strategies, outperformed all other single style training and the multiple-model methods investigated on an average across the different test speech data. This fusion multistyle training technique can be easily employed in a security conscious organization, where monitoring of the employees are required.
ISSN:0929-6212
1572-834X
DOI:10.1007/s11277-018-6057-y