An alternative scheme for perplexity estimation and its assessment for the evaluation of language models

Language models are usually evaluated on test texts using the perplexity derived from the model likelihood function computed on these texts (test set perplexity). In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for estimatin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2001-01, Vol.15 (1), p.1-13
Hauptverfasser:	Bimbot, Frédéric, El-Bèze, Marc, Igounet, Stéphane, Jardino, Michèle, Smaili, Kamel, Zitouni, Imed
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Formal languages and grammars Linguistics Mathematics and linguistics Other
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Language models are usually evaluated on test texts using the perplexity derived from the model likelihood function computed on these texts (test set perplexity). In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for estimating the test set perplexity. The method is derived from the Shannon game and based on a gambling approach on the next word to come in a truncated sentence. We also study the entropy bounds proposed by Shannon and based on the rank of the correct answer, in order to estimate a perplexity interval for non-probabilistic language models. The relevance of the approach is validated on an example. We then report the results of a preliminary comparative evaluation using the proposed scheme.
ISSN:	0885-2308 1095-8363
DOI:	10.1006/csla.2000.0150