Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Natural language engineering 2022-11, Vol.28 (6), p.733-761
Hauptverfasser:	Yang, Dongqiang, Yin, Yanqin
Format:	Artikel
Sprache:	eng
Schlagworte:	Knowledge bases (artificial intelligence) Knowledge management Language modeling Learning transfer Lexical semantics Modelling Natural language processing Ontology Polysemy Retrofitting Semantic relations Semantics Similarity Similarity measures Taxonomy Vector spaces Word frequency Word sense disambiguation Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	761
container_issue	6
container_start_page	733
container_title	Natural language engineering
container_volume	28
creator	Yang, Dongqiang Yin, Yanqin
description	Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.
doi_str_mv	10.1017/S1351324921000279
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2738850435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324921000279</cupid><sourcerecordid>2738850435</sourcerecordid><originalsourceid>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2738850435</pqid></control><display><type>article</type><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><source>Cambridge University Press Journals Complete</source><creator>Yang, Dongqiang ; Yin, Yanqin</creator><creatorcontrib>Yang, Dongqiang ; Yin, Yanqin</creatorcontrib><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324921000279</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Knowledge bases (artificial intelligence) ; Knowledge management ; Language modeling ; Learning transfer ; Lexical semantics ; Modelling ; Natural language processing ; Ontology ; Polysemy ; Retrofitting ; Semantic relations ; Semantics ; Similarity ; Similarity measures ; Taxonomy ; Vector spaces ; Word frequency ; Word sense disambiguation ; Words (language)</subject><ispartof>Natural language engineering, 2022-11, Vol.28 (6), p.733-761</ispartof><rights>The Author(s), 2021. Published by Cambridge University Press</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</cites><orcidid>0000-0002-3053-6610</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324921000279/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,776,780,27903,27904,55607</link.rule.ids></links><search><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge management</subject><subject>Language modeling</subject><subject>Learning transfer</subject><subject>Lexical semantics</subject><subject>Modelling</subject><subject>Natural language processing</subject><subject>Ontology</subject><subject>Polysemy</subject><subject>Retrofitting</subject><subject>Semantic relations</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Taxonomy</subject><subject>Vector spaces</subject><subject>Word frequency</subject><subject>Word sense disambiguation</subject><subject>Words (language)</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Yang, Dongqiang</creator><creator>Yin, Yanqin</creator><general>Cambridge University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></search><sort><creationdate>20221101</creationdate><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><author>Yang, Dongqiang ; Yin, Yanqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge management</topic><topic>Language modeling</topic><topic>Learning transfer</topic><topic>Lexical semantics</topic><topic>Modelling</topic><topic>Natural language processing</topic><topic>Ontology</topic><topic>Polysemy</topic><topic>Retrofitting</topic><topic>Semantic relations</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Taxonomy</topic><topic>Vector spaces</topic><topic>Word frequency</topic><topic>Word sense disambiguation</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>ProQuest Psychology</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Dongqiang</au><au>Yin, Yanqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2022-11-01</date><risdate>2022</risdate><volume>28</volume><issue>6</issue><spage>733</spage><epage>761</epage><pages>733-761</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324921000279</doi><tpages>29</tpages><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1351-3249
ispartof	Natural language engineering, 2022-11, Vol.28 (6), p.733-761
issn	1351-3249 1469-8110
language	eng
recordid	cdi_proquest_journals_2738850435
source	Cambridge University Press Journals Complete
subjects	Knowledge bases (artificial intelligence) Knowledge management Language modeling Learning transfer Lexical semantics Modelling Natural language processing Ontology Polysemy Retrofitting Semantic relations Semantics Similarity Similarity measures Taxonomy Vector spaces Word frequency Word sense disambiguation Words (language)
title	Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T14%3A03%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20of%20taxonomic%20and%20neural%20embedding%20methods%20for%20calculating%20semantic%20similarity&rft.jtitle=Natural%20language%20engineering&rft.au=Yang,%20Dongqiang&rft.date=2022-11-01&rft.volume=28&rft.issue=6&rft.spage=733&rft.epage=761&rft.pages=733-761&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324921000279&rft_dat=%3Cproquest_cross%3E2738850435%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2738850435&rft_id=info:pmid/&rft_cupid=10_1017_S1351324921000279&rfr_iscdi=true