Research on Information Extraction of Technical Documents and Construction of Domain Knowledge Graph

With the rapid development of knowledge graph related technologies, domain knowledge graph has become a research hotspot in academia and industry. However, the domain knowledge graph for technical documents is not mature enough, and the semantic information implicit in unstructured technical documen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.168087-168098
Hauptverfasser: Zhao, Huaxuan, Pan, Yueling, Yang, Feng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid development of knowledge graph related technologies, domain knowledge graph has become a research hotspot in academia and industry. However, the domain knowledge graph for technical documents is not mature enough, and the semantic information implicit in unstructured technical documents has not been fully tapped. Combining the characteristics of technical documents, the paper proposes a TextCNN-based topic information extraction model and constructs a domain knowledge graph for technical documents. It uses the graph database Neo4j for knowledge storage and visualization. The information extraction model based on TextCNN can automatically extract the subject information of the document and the summary information such as title, ID, status, meeting, organization, etc. Experiments show that the model has high accuracy on the technical document dataset, which can effectively reduce the cost of manual annotation and data collation. At the same time, knowledge graph visualization can facilitate scientific researchers to search, track and update technical documents, which can show the evolution of technology more clearly.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3024070