A generic hybrid method combining rules and machine learning to automate domain independent ontology population
Knowledge management has become a cornerstone of decision support and system engineering. Knowledge acquisition has traditionally been performed manually, and the trend now is to automate knowledge extraction from the huge amount of information contained in daily produced data. This article proposes...
Gespeichert in:
Veröffentlicht in: | Engineering applications of artificial intelligence 2024-07, Vol.133 (Part F), p.108571, Article 108571 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Knowledge management has become a cornerstone of decision support and system engineering. Knowledge acquisition has traditionally been performed manually, and the trend now is to automate knowledge extraction from the huge amount of information contained in daily produced data. This article proposes a contribution in the artificial intelligence domain through a hybrid approach for the discovery of concept-instance couples to populate an ontology. The proposed framework combines automated domain-independent rule-based extraction for unsupervised relation extraction and semantic-oriented machine learning techniques for knowledge base enrichment. In the engineering field, another contribution resides in the generic aspect of the framework, leading to the possibility to populate ontologies and automatically build knowledge bases in various domains. The case study supporting this framework and its technical implementation show that the proposed method can be applied identically (1) to different data sources and (2) with different ontologies, regardless of the domain or subdomain they describe or the structure they have. Changing these inputs can be done without affecting the performance of the rule-based extraction, which is around 60% in terms of precision. Three different matching methods are also presented. Their ability to match new instances to their corresponding ontological class (or concept) is evaluated through a case study on biochemistry annotated textual data. The best matching method achieves an average precision score of 70% and an average recall of 74%.
•Generic syntactic patterns achieve 60% precision across diverse data sources.•Coupling rules results with semantic models enhances knowledge base enrichment.•Generic embeddings like RoBERTa show promise in ontology instance detection tasks. |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2024.108571 |