Method for generation of an N-word phrase dictionary from a text corpus

A structure and method for automatically creating a dictionary for clustering text documents performs a first pass for each of the documents to determine a frequency of each word in each of the documents, creates a Hashtable of most frequently occurring words in the documents, performs a second pass...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: KREULEN JEFFREY THOMAS, SPANGLER WILLIAM SCOTT
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A structure and method for automatically creating a dictionary for clustering text documents performs a first pass for each of the documents to determine a frequency of each word in each of the documents, creates a Hashtable of most frequently occurring words in the documents, performs a second pass for each of the documents to determine a frequency of phrases in each of the documents that contain only words in the Hashtable, adds the most frequently occurring phrases to the Hashtable, and outputting outputs the most frequently occurring words and the most frequently occurring phrases as the dictionary.