Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Natural language engineering 2022-11, Vol.28 (6), p.733-761
Hauptverfasser: Yang, Dongqiang, Yin, Yanqin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 761
container_issue 6
container_start_page 733
container_title Natural language engineering
container_volume 28
creator Yang, Dongqiang
Yin, Yanqin
description Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.
doi_str_mv 10.1017/S1351324921000279
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2738850435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324921000279</cupid><sourcerecordid>2738850435</sourcerecordid><originalsourceid>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2738850435</pqid></control><display><type>article</type><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><source>Cambridge University Press Journals Complete</source><creator>Yang, Dongqiang ; Yin, Yanqin</creator><creatorcontrib>Yang, Dongqiang ; Yin, Yanqin</creatorcontrib><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324921000279</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Knowledge bases (artificial intelligence) ; Knowledge management ; Language modeling ; Learning transfer ; Lexical semantics ; Modelling ; Natural language processing ; Ontology ; Polysemy ; Retrofitting ; Semantic relations ; Semantics ; Similarity ; Similarity measures ; Taxonomy ; Vector spaces ; Word frequency ; Word sense disambiguation ; Words (language)</subject><ispartof>Natural language engineering, 2022-11, Vol.28 (6), p.733-761</ispartof><rights>The Author(s), 2021. Published by Cambridge University Press</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</cites><orcidid>0000-0002-3053-6610</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324921000279/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,776,780,27903,27904,55607</link.rule.ids></links><search><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge management</subject><subject>Language modeling</subject><subject>Learning transfer</subject><subject>Lexical semantics</subject><subject>Modelling</subject><subject>Natural language processing</subject><subject>Ontology</subject><subject>Polysemy</subject><subject>Retrofitting</subject><subject>Semantic relations</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Taxonomy</subject><subject>Vector spaces</subject><subject>Word frequency</subject><subject>Word sense disambiguation</subject><subject>Words (language)</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Yang, Dongqiang</creator><creator>Yin, Yanqin</creator><general>Cambridge University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></search><sort><creationdate>20221101</creationdate><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><author>Yang, Dongqiang ; Yin, Yanqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge management</topic><topic>Language modeling</topic><topic>Learning transfer</topic><topic>Lexical semantics</topic><topic>Modelling</topic><topic>Natural language processing</topic><topic>Ontology</topic><topic>Polysemy</topic><topic>Retrofitting</topic><topic>Semantic relations</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Taxonomy</topic><topic>Vector spaces</topic><topic>Word frequency</topic><topic>Word sense disambiguation</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>ProQuest Psychology</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Dongqiang</au><au>Yin, Yanqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2022-11-01</date><risdate>2022</risdate><volume>28</volume><issue>6</issue><spage>733</spage><epage>761</epage><pages>733-761</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324921000279</doi><tpages>29</tpages><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1351-3249
ispartof Natural language engineering, 2022-11, Vol.28 (6), p.733-761
issn 1351-3249
1469-8110
language eng
recordid cdi_proquest_journals_2738850435
source Cambridge University Press Journals Complete
subjects Knowledge bases (artificial intelligence)
Knowledge management
Language modeling
Learning transfer
Lexical semantics
Modelling
Natural language processing
Ontology
Polysemy
Retrofitting
Semantic relations
Semantics
Similarity
Similarity measures
Taxonomy
Vector spaces
Word frequency
Word sense disambiguation
Words (language)
title Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T14%3A03%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20of%20taxonomic%20and%20neural%20embedding%20methods%20for%20calculating%20semantic%20similarity&rft.jtitle=Natural%20language%20engineering&rft.au=Yang,%20Dongqiang&rft.date=2022-11-01&rft.volume=28&rft.issue=6&rft.spage=733&rft.epage=761&rft.pages=733-761&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324921000279&rft_dat=%3Cproquest_cross%3E2738850435%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2738850435&rft_id=info:pmid/&rft_cupid=10_1017_S1351324921000279&rfr_iscdi=true