CONSOLIDATING VOCABULARY FOR AUTOMATED TEXT PROCESSING
A method includes providing a corpus of text, and using suffix manipulation to obtain a stem for at least some tokens in the corpus. The method also includes using the respective stem for each token of the at least some tokens to form groups of the at least some tokens. In addition, the method inclu...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method includes providing a corpus of text, and using suffix manipulation to obtain a stem for at least some tokens in the corpus. The method also includes using the respective stem for each token of the at least some tokens to form groups of the at least some tokens. In addition, the method includes using the groups of tokens to select lemmas for at least some of the tokens in the groups of tokens. |
---|