Automatic categorisation applications at the European patent office

The first major use of natural language processing techniques in the European patent office (EPO) is described. This relates to automating the task of initially classifying newly filed applications with sufficient accuracy to enable reliable routing to the examiner(s) who work in the appropriate tec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:World patent information 2002, Vol.24 (3), p.187-196
Hauptverfasser: Krier, Marc, Zaccà, Francesco
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The first major use of natural language processing techniques in the European patent office (EPO) is described. This relates to automating the task of initially classifying newly filed applications with sufficient accuracy to enable reliable routing to the examiner(s) who work in the appropriate technical areas. Precision levels of the order of 80% are required. To achieve this, matters like recall levels, the problems of rarely occurring technical fields, the options for `training material' for the software––using existing fully classified documents, the accuracy of OCR scans of the incoming applications, the use of full texts or just abstracts, and confidence levels for the results are considered. The results are presented in relation to their level of success in precision and recall at various organisational levels at the EPO, i.e. at the highest (cluster) level, at directorate, and technical examiner levels. As another measure of applicability, confusion matrices are also presented. The authors also outline some of the other potential uses of categorisation and linguistic techniques within the work of the EPO, such as routing and partial classifying of both patent and non-patent literature, identifying potentially relevant citations, extracting bibliographic data of patents cited in incoming applications, document-relevance ranking systems and the creation of cross-lingual dictionaries.
ISSN:0172-2190
1874-690X
DOI:10.1016/S0172-2190(02)00026-1