On evaluation trials in speaker verification
Evaluation trials are crucial to measure performance of speaker verification systems. However, the design of trials that can faithfully reflect system performance and accurately distinguish between different systems remains an open issue. In this paper, we focus on a particular problem: the impact o...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024, Vol.54 (1), p.113-130 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Evaluation trials are crucial to measure performance of speaker verification systems. However, the design of trials that can faithfully reflect system performance and accurately distinguish between different systems remains an open issue. In this paper, we focus on a particular problem: the impact of trials that are easy to solve for the majority of systems. We show that these ‘easy trials’ not only report over-optimistic absolute performance, but also lead to biased relative performance in system comparisons when they are asymmetrically distributed. This motivated the idea of mining ‘hard trials’, i.e., trials that are regarded to be difficult by current representative techniques. Three approaches to retrieving hard trials will be reported, and the properties of the retrieved hard trials are studied, from the perspectives of both machines and humans. Finally, a novel visualization tool which we name a Config-Performance (C-P) map is proposed. In this map, the value at each location represents the performance with a particular proportion of easy and hard trials, thus offering a global view of the system in various test conditions. The identified hard trials and the code of the C-P map tool have been released at
http://lilt.cslt.org/trials/demo/
. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-023-05071-9 |