Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition

In recent years, server-based automatic speech recognition (ASR) systems have become ubiquitous, and unprecedented amounts of speech data are now available for system training. The availability of such training data has greatly improved ASR accuracy, but how to maximize the ASR performance in new do...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Acoustical Science and Technology 2016, Vol.37(2), pp.55-65
Hauptverfasser: Ichikawa, Osamu, Rennie, Steven J., Fukuda, Takashi, Willett, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, server-based automatic speech recognition (ASR) systems have become ubiquitous, and unprecedented amounts of speech data are now available for system training. The availability of such training data has greatly improved ASR accuracy, but how to maximize the ASR performance in new domains or domains where ASR systems currently fail (thus limiting data availability) is still an important open question. In this paper, we propose a framework for mapping large speech corpora to different acoustic environments, so that such data can be transformed to build high-quality acoustic models for other acoustic domains. In our experiments using a large corpus, our proposed method reduced errors by 18.6%.
ISSN:1346-3969
1347-5177
DOI:10.1250/ast.37.55