Fusion Multistyle Training for Speaker Identification of Disguised Speech
Determining the speaker of a given speech utterance from a group of people is referred to as speaker identification. When voice disguising is done by a person, which is commonly seen in crime scenes, a mismatch between the training and the test speech data occurs, referred to as mismatched problem....
Gespeichert in:
Veröffentlicht in: | Wireless personal communications 2019-02, Vol.104 (3), p.895-905 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Determining the speaker of a given speech utterance from a group of people is referred to as speaker identification. When voice disguising is done by a person, which is commonly seen in crime scenes, a mismatch between the training and the test speech data occurs, referred to as mismatched problem. It markedly decreases the performance of the speaker identification system. To address this mismatched problem, various multistyle training strategies and a fusion method were previously studied by the authors. This paper further investigates the performance of three multiple-model methods at the decision level for this mismatched problem and compare its performance with the previously studied multistyle training strategies. It is found that the fusion of the two multistyle training strategies, outperformed all other single style training and the multiple-model methods investigated on an average across the different test speech data. This fusion multistyle training technique can be easily employed in a security conscious organization, where monitoring of the employees are required. |
---|---|
ISSN: | 0929-6212 1572-834X |
DOI: | 10.1007/s11277-018-6057-y |