SIMILAR DOCUMENT RETRIEVAL SYSTEM, SIMILAR DOCUMENT RETRIEVAL METHOD AND PROGRAM
PROBLEM TO BE SOLVED: To provide a technology for accurately detecting a document which is similar to a certain document from a document database regardless of the existence of any word on which characteristic contents shown by a document are not accurately reflected. SOLUTION: At the time of retrie...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | PROBLEM TO BE SOLVED: To provide a technology for accurately detecting a document which is similar to a certain document from a document database regardless of the existence of any word on which characteristic contents shown by a document are not accurately reflected. SOLUTION: At the time of retrieving a similar document, an IDF value showing the degree of appearance of words appearing in a plurality of documents included in a database for learning is compared with a predetermined threshold, so that frequently appearing words can be detected. As to the frequently appearing words, any TFIDF value is not calculated, and the featured vectors of a reference document with specified TFIDF values for respective words appearing in the reference document as components are calculated. Furthermore, similarity between the reference document and each of documents included in a document DB162 is calculated by using the featured vectors of the reference document and the featured vectors of each of the documents included in the document DB162. Then, the document similar to the reference document is detected and outputted based on the similarity. COPYRIGHT: (C)2006,JPO&NCIPI |
---|