Margin-Based Discriminative Training for String Recognition

Typical training criteria for string recognition like for example minimum phone error (MPE) and maximum mutual information (MMI) in speech recognition are based on a (regularized) loss function. In contrast, large-margin classifiers-the de-facto standard in machine learning-maximize the separation m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in signal processing 2010-12, Vol.4 (6), p.917-925
Hauptverfasser:	Heigold, G, Dreuw, P, Hahn, S, Schlüter, R, Ney, H
Format:	Artikel
Sprache:	eng
Schlagworte:	Approximation methods Criteria Handwriting recognition Hidden Markov models large-vocabulary continuous speech recognition margin-based training Optimization Parameter estimation part-of-speech tagging Recognition Speech recognition Strings Studies Support vector machines Tasks Training Voice recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Typical training criteria for string recognition like for example minimum phone error (MPE) and maximum mutual information (MMI) in speech recognition are based on a (regularized) loss function. In contrast, large-margin classifiers-the de-facto standard in machine learning-maximize the separation margin. An additional loss term penalizes misclassified samples. This paper shows how typical training criteria like for example MPE or MMI can be extended to incorporate the margin concept, and that such modified training criteria are smooth approximations to support vector machines with the respective loss function. The proposed approach takes advantage of the generalization bounds of large-margin classifiers while keeping the efficient framework for conventional discriminative training. This allows us to directly evaluate the utility of the margin term for string recognition. Experimental results are presented using the proposed modified training criteria for different tasks from speech recognition (including large-vocabulary continuous speech recognition tasks trained on up to 1500-h audio data), part-of-speech tagging, and handwriting recognition.
ISSN:	1932-4553 1941-0484
DOI:	10.1109/JSTSP.2010.2076110