CONSOLIDATING VOCABULARY FOR AUTOMATED TEXT PROCESSING

A method includes providing a corpus of text, and using suffix manipulation to obtain a stem for at least some tokens in the corpus. The method also includes using the respective stem for each token of the at least some tokens to form groups of the at least some tokens. In addition, the method inclu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SUBRAMANIAN GOPI, DESAI KALPIT VIKRAMBHAI
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method includes providing a corpus of text, and using suffix manipulation to obtain a stem for at least some tokens in the corpus. The method also includes using the respective stem for each token of the at least some tokens to form groups of the at least some tokens. In addition, the method includes using the groups of tokens to select lemmas for at least some of the tokens in the groups of tokens.