Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web

Ontology design and the process of populating a data-set with knowledge following the chosen or developed ontology to fit the principles of Semantic Web and Linked Open Data is a time-consuming and iterative process, requiring either expert knowledge or a set of tools for data scraping from web. A v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Lapaev, Maxim
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	concept matching Data mining dataset refinement heterogeneous data extraction knowledge aquisition knowledge extraction Manuals Ontologies ontology learning ontology population Semantic Web semi-structured data extraction Thesauri thesaurus-driven concept matching web-scraping
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Ontology design and the process of populating a data-set with knowledge following the chosen or developed ontology to fit the principles of Semantic Web and Linked Open Data is a time-consuming and iterative process, requiring either expert knowledge or a set of tools for data scraping from web. A valid and consistent ontology and knowledge withing the data-set require unification of concepts which means overcoming ambiguity and synonymy of terms which become individuals of ontology. In this paper we spot on techniques used for organising a Russian food product data-set under a light-weight FOOD Ontology and concept matching in particular. Main approaches to data-set concept unification, synonymic term matching and ways to collect dictionaries for matcher are mentioned. The tool for catalogue-like semi-structured resources parsing and thesaurus extraction is developed and introduced for the task of on-the-fly concept matching.
ISSN:	2305-7254 2305-7254 2343-0737
DOI:	10.1109/FRUCT-ISPIT.2016.7561521