A Statistical Corpus-Based Term Extractor

Term extraction is an important problem in natural language processing. In this paper, we propose a language independent statistical corpusbased term extraction algorithm. In previous approaches, evaluation has been subjective, at best relying on a lexicographer’s judgement. We evaluate the quality...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Pantel, Patrick, Lin, Dekang
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Term extraction is an important problem in natural language processing. In this paper, we propose a language independent statistical corpusbased term extraction algorithm. In previous approaches, evaluation has been subjective, at best relying on a lexicographer’s judgement. We evaluate the quality of our term extractor by assessing its predictiveness on an unseen corpus using perplexity. Second, we evaluate the precision and recall of our extractor by comparing the Chinese words in a segmented corpus with the words extracted by our system.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-45153-6_4