DOCUMENT CLASSIFICATION SYSTEM, DOCUMENT CLASSIFICATION PROGRAM, AND DOCUMENT CLASSIFICATION METHOD

PROBLEM TO BE SOLVED: To provide a document classification system which classifies text documents into the respective categories by learning a classification rule by machine learning without requiring specification of a keyword or the like to the respective categories, and by which a user easily und...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: KURODA TSUYOSHI, KAMIBAYASHI KO, YAJIMA TATSUNOSUKE, MAKI JUNICHIRO, MURATA TERUYUKI
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:PROBLEM TO BE SOLVED: To provide a document classification system which classifies text documents into the respective categories by learning a classification rule by machine learning without requiring specification of a keyword or the like to the respective categories, and by which a user easily understands a reason why a classification result is obtained. SOLUTION: The document classification system has: a language processing part 10 which performs language processing to the respective text documents to be decomposed into words; a manual classification part 30 which specifies a text document to be used as teacher data on the basis of an instruction from the user; a learning part 40 which calculates learning models for each word by the machine learning on the basis of the teacher data; an automatic classification part 50 which calculates classification scores for each category for the text document for classification on the basis of the respective words included in the learning model and the text document for classification, and classifies the text document for classification to a category whose classification score becomes maximum; and an interface part 60 which presents classification results of the respective text document to the respective categories, and the classification scores for each category for the respective text documents to the user. COPYRIGHT: (C)2011,JPO&INPIT