Character-based method for extracting features of raren words
The invention relates to a character-based method for extracting features of raren words, and belongs to the technical field of natural language processing and machine learning. Because the Lao language materials are few and the Lao language form and structure are complex, the words are sparse and t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a character-based method for extracting features of raren words, and belongs to the technical field of natural language processing and machine learning. Because the Lao language materials are few and the Lao language form and structure are complex, the words are sparse and the unregistered words are many. Generally, a traditional NLP technology forms an input model vectorbased on words or combination of words and characters. When the method is applied to the Lambda language, the problems that word features are difficult to extract and no unregistered word vector exists exist. In order to solve the problems, the invention provides a method for extracting a character vector by using a convolutional neural network based on the character vector. The advantage based onthe character vector is that pre-trained word vectors and other information do not need to be used. The method can effectively extract the features of the Laolun words, so that the method has certainresearch significance.
本发明 |
---|