Intelligent subject matter classification and retrieval
This paper describes a research project entitled Expert Patent Search Assistant (EPSA) under contract with Consumer and Corporate Affairs Canada. The project developed a method of automating the expertise required to navigate the Canadian Patent Classification Scheme, and to retrieve classifications...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper describes a research project entitled Expert Patent Search Assistant (EPSA) under contract with Consumer and Corporate Affairs Canada. The project developed a method of automating the expertise required to navigate the Canadian Patent Classification Scheme, and to retrieve classifications appropriate to user information requests. These requests are of the type received by the Canadian Patent Office from the general public. A member of the general public is an individual with no specialized training in either classification or searching. Classification retrieval is inherently a problem of determining the closeness of two attribute vectors. The first vector is the list of words supplied by the user. The second vector describes for each classification in the classification scheme a list of words derived from the patents in the respective classification. Each word of the second vector can have an associated numeric value that is a measure of the relevance of the word to each classification. A method of determining the relevance of a list of words to a respective classification using machine extractable information inherent in the patents and the classifications was explored. The information inherent in a patent document comprised the patent title, abstract, and various word counts. The information inherent in the classification comprised the cross-references between classifications, clusters or groupings of similar classifications, and the relationships of classifications within the hierarchy in the classification. The minimum requirements for obtaining the relevance of all words to a respective classification were explored. These requirements included the minimum number of patents per classification as well as the minimum quantity and quality of the derived words. A method of navigating the table in accordance with the rules for navigating and using the patent classification schemes was developed. The vector of relevant words for classifications was implemented as two tables. The first table represented the closeness between classifications. The second table comprised rows representing classifications and the columns representing words. The intersection of rows and columns represented a measure of the relevance of a word to a respective classification. The method determined the closeness of the first vector for each classification and retrieved appropriate classifications in accordance with the rules for classification retrieval. The method took in |
---|---|
DOI: | 10.1109/CCECE.1993.332247 |