SYSTEMS AND METHODS FOR WORD FILTERING IN LANGUAGE MODELS
At least some aspects of the present disclosure direct to a system having one or more processors and memories for word filtering. The one or more memories are configured to store a plurality of documents; and store a domain dictionary. The one or more processors are configured to generate a set of t...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | At least some aspects of the present disclosure direct to a system having one or more processors and memories for word filtering. The one or more memories are configured to store a plurality of documents; and store a domain dictionary. The one or more processors are configured to generate a set of tokens for each of the plurality of documents and separate the set of tokens into a subset of dictionary tokens and a subset of non-dictionary tokens using the domain dictionary; The one or more processors are further configured to filter the subset of non-dictionary tokens to produce a subset of filtered non-dictionary tokens, where each of the filtered non-dictionary tokens has an occurrence frequency greater than a predefined threshold. |
---|