INFORMATION CLASSIFYING SYSTEM
PROBLEM TO BE SOLVED: To classify information while paying attention to both the content and form of information to be classified. SOLUTION: As a sample document to be provided to a document managing server 1, there are text document, document file and document image. Each document is transformed to...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | PROBLEM TO BE SOLVED: To classify information while paying attention to both the content and form of information to be classified. SOLUTION: As a sample document to be provided to a document managing server 1, there are text document, document file and document image. Each document is transformed to a formatted document by medium transformation. Content features and form features are extracted from the formatted document. In the extraction of content features, the frequency vector of a weighted word is generated from the kind or appearance frequency of a word to appear in the text document and defined as the content feature of a category. In the extraction of form features, common attribute area information in a page is generated and defined as the form feature of the category. The content feature and the form feature are verified again and a feature vector is calculated for determining whether the category depends on the content feature or on the form feature. Similarly to category learning, the medium transformation is performed to the document of a classification object and the document is transformed to a formatted document. The content features and the form features are extracted from the formatted document and the content features of respective categories and respective documents and the form features of respective categories and respective documents are respectively compared. |
---|