Hierarchical iterative and self-supervised method for concept-word acquisition from large-scale Chinese corpora

This paper proposes a hierarchical iterative and self-supervised method (HISS) to acquire concept words from a large-scale, un-segmented Chinese corpus. It has two levels of iteration: the EM-CLS algorithm and the Viterbi-C/S algorithm constitute the inner iteration for generating concept words, and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Guogang Tian, Cungen Cao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes a hierarchical iterative and self-supervised method (HISS) to acquire concept words from a large-scale, un-segmented Chinese corpus. It has two levels of iteration: the EM-CLS algorithm and the Viterbi-C/S algorithm constitute the inner iteration for generating concept words, and the concept word validation constitutes the outer iteration together with the concept word generation. Through multiple iterations, it integrates the concept word generation and validation into a uniform acquisition process. In the process of acquisition, the HISS method can cope with the problem of over-segmentation, over-combination and data sparseness. The experimental result shows that the HISS method is valid for concept word acquisition that can simultaneously increase the precision and recall rate of concept word acquisition.
DOI:10.1109/NLPKE.2005.1598754