MULTI-CLASSIFICATION DEVICE AND METHOD USING LSP

The present invention relates to a multiple classification apparatus and a multiple classification method for documents which classify a single document into a plurality of categories by using a lexico-semantic pattern (LSP) reconstituting a meaning category of words constituting a sentence. The mul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: JANG, JUN HWAN, YUN, DO HYUN, LEE, JAE AN, GO, JUN HO, KIM, HYUN TAE
Format: Patent
Sprache:eng ; kor
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present invention relates to a multiple classification apparatus and a multiple classification method for documents which classify a single document into a plurality of categories by using a lexico-semantic pattern (LSP) reconstituting a meaning category of words constituting a sentence. The multiple classification apparatus of the present invention comprises: a preprocessing unit defining a LSP consisting of morphemes, syllables, and phrases, storing the LSP in a database, defining a concept, which is a group of a plurality of hierarchically structured LSPs, and storing the concept in the database; an analysis unit analyzing the morphemes of a sentence included in a document to be analyzed, and matching the morphemes to the LSPs to calculate a phrase analysis result; and a classifying unit matching the phrase analysis result depending on document classification rules to extract at least one document classification category from the document to be analyzed. The present invention is able to greatly improve accuracy of document classification. 본 발명은 문장을 구성하는 단어들의 의미범주를 재구성한 어휘의미패턴을 이용하여 하나의 문서를 복수의 카테고리로 분류하는 문서의 다중분류 장치 및 방법에 관한 것이다. 본 발명은 형태소, 음절, 및 어절로 이루어진 어휘의미패턴(Lexico-semantic pattern: LSP)을 정의하여 데이터베이스에 저장하고, 계층적으로 구조화된 복수의 어휘의미패턴의 집단인 컨셉을 정의하여 데이터베이스에 저장하는 전처리유닛, 분석대상문서에 포함된 문장을 형태소 분석하고 상기 어휘의미패턴에 매칭시켜 구문분석결과를 연산하는 분석유닛, 및 상기 구문분석결과를 문서분류규칙에 따라 매칭하여 분석대상문서의 문서분류를 적어도 하나 이상 추출하는 분류유닛을 포함한다.