SYSTEMS AND METHODS FOR WORD FILTERING IN LANGUAGE MODELS

At least some aspects of the present disclosure direct to a system having one or more processors and memories for word filtering. The one or more memories are configured to store a plurality of documents; and store a domain dictionary. The one or more processors are configured to generate a set of t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wolniewicz, Richard H, Peterson, Kelly S
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:At least some aspects of the present disclosure direct to a system having one or more processors and memories for word filtering. The one or more memories are configured to store a plurality of documents; and store a domain dictionary. The one or more processors are configured to generate a set of tokens for each of the plurality of documents and separate the set of tokens into a subset of dictionary tokens and a subset of non-dictionary tokens using the domain dictionary; The one or more processors are further configured to filter the subset of non-dictionary tokens to produce a subset of filtered non-dictionary tokens, where each of the filtered non-dictionary tokens has an occurrence frequency greater than a predefined threshold.