A domain categorisation of vocabularies based on a deep learning classifier

The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This informat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of information science 2023-06, Vol.49 (3), p.699-710
Hauptverfasser:	Nogales, Alberto, Sicilia, Miguel-Angel, García-Tejedor, Álvaro J
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Classifiers Deep learning Domains Linked Data Machine learning Neural networks Ontology Open data Recurrent neural networks Vocabularies & taxonomies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	710
container_issue	3
container_start_page	699
container_title	Journal of information science
container_volume	49
creator	Nogales, Alberto Sicilia, Miguel-Angel García-Tejedor, Álvaro J
description	The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.
doi_str_mv	10.1177/01655515211018170
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2823914484</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_01655515211018170</sage_id><sourcerecordid>2823914484</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-f4f7a4e599227975d6981009c064fcb6fa8a007f66b17477a4ab555b9e59fd193</originalsourceid><addsrcrecordid>eNp1kE9LAzEUxIMoWKsfwFvA89a83fzZHEtRKxa86Hl5m01KynZTk13Bb29KBQ_i6R3mN_OYIeQW2AJAqXsGUggBogRgUINiZ2QGikMheS3OyeyoF0fgklyltGOMCV3xGXlZ0i7s0Q_U4Gi3IfqEow8DDY5-BoPt1GP0NtEWk-1oFpB21h5obzEOfthS02NK3nkbr8mFwz7Zm587J--PD2-rdbF5fXpeLTeFqYQYC8edQm6F1mWptBKd1DUwpg2T3JlWOqyRMeWkbHMDlVlsc7dWZ4vrQFdzcnfKPcTwMdk0NrswxSG_bMq6rDRwXvNMwYkyMaQUrWsO0e8xfjXAmuNmzZ_Nsmdx8iTc2t_U_w3fhYNqZw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823914484</pqid></control><display><type>article</type><title>A domain categorisation of vocabularies based on a deep learning classifier</title><source>SAGE Complete</source><creator>Nogales, Alberto ; Sicilia, Miguel-Angel ; García-Tejedor, Álvaro J</creator><creatorcontrib>Nogales, Alberto ; Sicilia, Miguel-Angel ; García-Tejedor, Álvaro J</creatorcontrib><description>The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.</description><identifier>ISSN: 0165-5515</identifier><identifier>EISSN: 1741-6485</identifier><identifier>DOI: 10.1177/01655515211018170</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Artificial neural networks ; Classifiers ; Deep learning ; Domains ; Linked Data ; Machine learning ; Neural networks ; Ontology ; Open data ; Recurrent neural networks ; Vocabularies & taxonomies</subject><ispartof>Journal of information science, 2023-06, Vol.49 (3), p.699-710</ispartof><rights>The Author(s) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-f4f7a4e599227975d6981009c064fcb6fa8a007f66b17477a4ab555b9e59fd193</citedby><cites>FETCH-LOGICAL-c355t-f4f7a4e599227975d6981009c064fcb6fa8a007f66b17477a4ab555b9e59fd193</cites><orcidid>0000-0003-4951-8102</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/01655515211018170$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/01655515211018170$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,776,780,21799,27903,27904,43600,43601</link.rule.ids></links><search><creatorcontrib>Nogales, Alberto</creatorcontrib><creatorcontrib>Sicilia, Miguel-Angel</creatorcontrib><creatorcontrib>García-Tejedor, Álvaro J</creatorcontrib><title>A domain categorisation of vocabularies based on a deep learning classifier</title><title>Journal of information science</title><description>The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.</description><subject>Artificial neural networks</subject><subject>Classifiers</subject><subject>Deep learning</subject><subject>Domains</subject><subject>Linked Data</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Ontology</subject><subject>Open data</subject><subject>Recurrent neural networks</subject><subject>Vocabularies & taxonomies</subject><issn>0165-5515</issn><issn>1741-6485</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LAzEUxIMoWKsfwFvA89a83fzZHEtRKxa86Hl5m01KynZTk13Bb29KBQ_i6R3mN_OYIeQW2AJAqXsGUggBogRgUINiZ2QGikMheS3OyeyoF0fgklyltGOMCV3xGXlZ0i7s0Q_U4Gi3IfqEow8DDY5-BoPt1GP0NtEWk-1oFpB21h5obzEOfthS02NK3nkbr8mFwz7Zm587J--PD2-rdbF5fXpeLTeFqYQYC8edQm6F1mWptBKd1DUwpg2T3JlWOqyRMeWkbHMDlVlsc7dWZ4vrQFdzcnfKPcTwMdk0NrswxSG_bMq6rDRwXvNMwYkyMaQUrWsO0e8xfjXAmuNmzZ_Nsmdx8iTc2t_U_w3fhYNqZw</recordid><startdate>202306</startdate><enddate>202306</enddate><creator>Nogales, Alberto</creator><creator>Sicilia, Miguel-Angel</creator><creator>García-Tejedor, Álvaro J</creator><general>SAGE Publications</general><general>Bowker-Saur Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4951-8102</orcidid></search><sort><creationdate>202306</creationdate><title>A domain categorisation of vocabularies based on a deep learning classifier</title><author>Nogales, Alberto ; Sicilia, Miguel-Angel ; García-Tejedor, Álvaro J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-f4f7a4e599227975d6981009c064fcb6fa8a007f66b17477a4ab555b9e59fd193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Classifiers</topic><topic>Deep learning</topic><topic>Domains</topic><topic>Linked Data</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Ontology</topic><topic>Open data</topic><topic>Recurrent neural networks</topic><topic>Vocabularies & taxonomies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nogales, Alberto</creatorcontrib><creatorcontrib>Sicilia, Miguel-Angel</creatorcontrib><creatorcontrib>García-Tejedor, Álvaro J</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of information science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nogales, Alberto</au><au>Sicilia, Miguel-Angel</au><au>García-Tejedor, Álvaro J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A domain categorisation of vocabularies based on a deep learning classifier</atitle><jtitle>Journal of information science</jtitle><date>2023-06</date><risdate>2023</risdate><volume>49</volume><issue>3</issue><spage>699</spage><epage>710</epage><pages>699-710</pages><issn>0165-5515</issn><eissn>1741-6485</eissn><abstract>The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/01655515211018170</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-4951-8102</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0165-5515
ispartof	Journal of information science, 2023-06, Vol.49 (3), p.699-710
issn	0165-5515 1741-6485
language	eng
recordid	cdi_proquest_journals_2823914484
source	SAGE Complete
subjects	Artificial neural networks Classifiers Deep learning Domains Linked Data Machine learning Neural networks Ontology Open data Recurrent neural networks Vocabularies & taxonomies
title	A domain categorisation of vocabularies based on a deep learning classifier
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T13%3A47%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20domain%20categorisation%20of%20vocabularies%20based%20on%20a%20deep%20learning%20classifier&rft.jtitle=Journal%20of%20information%20science&rft.au=Nogales,%20Alberto&rft.date=2023-06&rft.volume=49&rft.issue=3&rft.spage=699&rft.epage=710&rft.pages=699-710&rft.issn=0165-5515&rft.eissn=1741-6485&rft_id=info:doi/10.1177/01655515211018170&rft_dat=%3Cproquest_cross%3E2823914484%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823914484&rft_id=info:pmid/&rft_sage_id=10.1177_01655515211018170&rfr_iscdi=true