Tibetan word segmentation evaluation set construction method based on local word list

The invention belongs to the technical field of Tibetan natural language processing, and relates to a Tibetan word segmentation evaluation set construction method based on a local vocabulary, which comprises the following steps: on the basis of an evaluation set established manually or by virtue of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: CAI RANGZHUOMA
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention belongs to the technical field of Tibetan natural language processing, and relates to a Tibetan word segmentation evaluation set construction method based on a local vocabulary, which comprises the following steps: on the basis of an evaluation set established manually or by virtue of a dictionary, firstly, recognizing compressed words in Tibetan evaluation sentences based on a Tibetan evaluation sentence local vocabulary; the method comprises the following steps of: firstly, collecting a compressed word of a Tibetan evaluation sentence, adding the compressed word into a local word list of the Tibetan evaluation sentence, then automatically establishing a word index word list according to a word sequence and the local word list of the Tibetan evaluation sentence, and then constructing all evaluation answers of the Tibetan evaluation sentence under different granularities according to the word sequence and the word index word list of the Tibetan evaluation sentence for Tibetan word segmentation e