Machine learning and natural language processing on the patent corpus: Data, tools, and new measures

Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of economics & management strategy 2018-09, Vol.27 (3), p.535-553
Hauptverfasser: Balsmeier, Benjamin, Assaf, Mohamad, Chesebro, Tyler, Fierro, Gabe, Johnson, Kevin, Johnson, Scott, Li, Guan‐Cheng, Lück, Sonja, O'Reagan, Doug, Yeh, Bill, Zang, Guangzheng, Fleming, Lee
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years. Data and documentation can be found at https://console.cloud.google.com/launcher/partners/patents-public-data.
ISSN:1058-6407
1530-9134
DOI:10.1111/jems.12259