Character-based method for extracting features of raren words

The invention relates to a character-based method for extracting features of raren words, and belongs to the technical field of natural language processing and machine learning. Because the Lao language materials are few and the Lao language form and structure are complex, the words are sparse and t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHOU LANJIANG, ZHANG JIAN'AN, TANG WEN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a character-based method for extracting features of raren words, and belongs to the technical field of natural language processing and machine learning. Because the Lao language materials are few and the Lao language form and structure are complex, the words are sparse and the unregistered words are many. Generally, a traditional NLP technology forms an input model vectorbased on words or combination of words and characters. When the method is applied to the Lambda language, the problems that word features are difficult to extract and no unregistered word vector exists exist. In order to solve the problems, the invention provides a method for extracting a character vector by using a convolutional neural network based on the character vector. The advantage based onthe character vector is that pre-trained word vectors and other information do not need to be used. The method can effectively extract the features of the Laolun words, so that the method has certainresearch significance. 本发明