TOPIC EXTRACTION USING CLAUSE SEGMENTATION AND HIGH-FREQUENCY WORDS
The disclosed embodiments provide a system for processing data. During operation, the system obtains a set of clauses in a first set of content items comprising unstructured data. Next, the system obtains a set of stop words comprising high-frequency words that occur in a second set of content items...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The disclosed embodiments provide a system for processing data. During operation, the system obtains a set of clauses in a first set of content items comprising unstructured data. Next, the system obtains a set of stop words comprising high-frequency words that occur in a second set of content items. The system then automatically extracts a set of topics from the set of clauses by generating a set of n-grams from the set of clauses and excluding a first n-gram in the set of n-grams from the set of topics when the first n-gram contains a word in the set of stop words in a pre-specified position of the first n-gram. Finally, the system displays the set of topics to a user to improve understanding of the first set of content items by the user without requiring the user to manually analyze the first set of content items. |
---|