Balancing false alarms and hits in Spoken Term Detection
This paper presents methods to improve retrieval of Out-Of-Vocabulary (OOV) terms in a Spoken Term Detection (STD) system. We demonstrate that automated tagging of OOV regions helps to reduce false alarms while incorporating phonetic confusability increases the hits. Additional features that boost t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents methods to improve retrieval of Out-Of-Vocabulary (OOV) terms in a Spoken Term Detection (STD) system. We demonstrate that automated tagging of OOV regions helps to reduce false alarms while incorporating phonetic confusability increases the hits. Additional features that boost the probability of a hit in accordance with the number of neighboring hits for the same query and query-length normalization also improve the overall performance of the spoken-term detection system. We show that these methods can be combined effectively to provide a relative improvement of 21% in Average Term Weighted Value (ATWV) on a 100-hour corpus with 1290 OOV-only queries and 2% relative on the NIST 2006 STD task, where only 16 of the 1107 queries were OOV terms. Lastly, we present results to show that the proposed methods are general enough to work well in query-by-example based spoken-term detection, and in mismatched situations when the representation of the index being searched through and the queries are not generated by the same system. |
---|---|
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2010.5494966 |