Analysis of IPC classification codes frequency in patents concerning "in situ" remediation technologies

The patent dataset analysed is based on search criteria aimed at retrieving patent documents dealing with "in situ" remediation technologies. The dataset has been created in the context of the Horizon2020 funded project "Posidon" (https://www.posidonproject.eu/). According to the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Riccardo Priore
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The patent dataset analysed is based on search criteria aimed at retrieving patent documents dealing with "in situ" remediation technologies. The dataset has been created in the context of the Horizon2020 funded project "Posidon" (https://www.posidonproject.eu/). According to the European Environment Information and Observation Network for soil (EIONET-SOIL), the number of estimated potential soil contaminated sites is more than 2.5 million , of which about 14 % (340 000 sites) are highly likely to be contaminated, and hence in need of remediation measures. In terms of budget, the management of contaminated sites is estimated to cost around 6 billion Euros (€) annually. The aim of the project is to foster the development of innovative technical solutions through pre-commercial procurement selection procedures. The initial elucidation of the prior art, based on an extensive analysis of patent documents is fundamental. As Patlib centre staff members, also enrolled in the "monitoring board" of Posidon, we produce evidence that there is a considerable amount of predivulgation of decontamination technologies applicable for "in situ" reclamation of contaminated soil and/or water emerging from patent documents. Since we are especially interested in identifying the trends of the technologies that score the highest frequency of citation within the patent dataset, we illustrate one way of "unpacking" the patent dataset by identifying recurrent patterns of IPC classification codes. To this purpose, the IPC classification codes characteristic of each patent family of the dataset are analysed by isolating and clustering through subsequent stages the patent documents sharing specific IPC subgroups, main groups and subclasses patterns. During each phase the t-distributed stochastic neighbor embedding (tSNE) algorithm is applied to an array of patent families depending on presence/absence of IPC subgroups or main groups or subclasses, chosen among those most frequent in the dataset. Therefore, following the first round of clustering, those patent documents sharing specific IPC subgroups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the second analytic phase by means of tSNE, therefore those patent documents sharing specific IPC main groups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the final clustering by means of tSNE in order to separate the patent documents
DOI:10.17632/gk24h42jty.1