Method for generation of an N-word phrase dictionary from a text corpus
A structure and method for automatically creating a dictionary for clustering text documents performs a first pass for each of the documents to determine a frequency of each word in each of the documents, creates a Hashtable of most frequently occurring words in the documents, performs a second pass...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A structure and method for automatically creating a dictionary for clustering text documents performs a first pass for each of the documents to determine a frequency of each word in each of the documents, creates a Hashtable of most frequently occurring words in the documents, performs a second pass for each of the documents to determine a frequency of phrases in each of the documents that contain only words in the Hashtable, adds the most frequently occurring phrases to the Hashtable, and outputting outputs the most frequently occurring words and the most frequently occurring phrases as the dictionary. |
---|