Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues t...
Gespeichert in:
Veröffentlicht in: | Natural language engineering 2022-11, Vol.28 (6), p.733-761 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 761 |
---|---|
container_issue | 6 |
container_start_page | 733 |
container_title | Natural language engineering |
container_volume | 28 |
creator | Yang, Dongqiang Yin, Yanqin |
description | Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity. |
doi_str_mv | 10.1017/S1351324921000279 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2738850435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324921000279</cupid><sourcerecordid>2738850435</sourcerecordid><originalsourceid>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</originalsourceid><addsrcrecordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2738850435</pqid></control><display><type>article</type><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><source>Cambridge University Press Journals Complete</source><creator>Yang, Dongqiang ; Yin, Yanqin</creator><creatorcontrib>Yang, Dongqiang ; Yin, Yanqin</creatorcontrib><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324921000279</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Knowledge bases (artificial intelligence) ; Knowledge management ; Language modeling ; Learning transfer ; Lexical semantics ; Modelling ; Natural language processing ; Ontology ; Polysemy ; Retrofitting ; Semantic relations ; Semantics ; Similarity ; Similarity measures ; Taxonomy ; Vector spaces ; Word frequency ; Word sense disambiguation ; Words (language)</subject><ispartof>Natural language engineering, 2022-11, Vol.28 (6), p.733-761</ispartof><rights>The Author(s), 2021. Published by Cambridge University Press</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</cites><orcidid>0000-0002-3053-6610</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324921000279/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,314,776,780,27903,27904,55607</link.rule.ids></links><search><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</description><subject>Knowledge bases (artificial intelligence)</subject><subject>Knowledge management</subject><subject>Language modeling</subject><subject>Learning transfer</subject><subject>Lexical semantics</subject><subject>Modelling</subject><subject>Natural language processing</subject><subject>Ontology</subject><subject>Polysemy</subject><subject>Retrofitting</subject><subject>Semantic relations</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Similarity measures</subject><subject>Taxonomy</subject><subject>Vector spaces</subject><subject>Word frequency</subject><subject>Word sense disambiguation</subject><subject>Words (language)</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE9LxDAQxYMouK5-AG8Bz9VMkzbpUZb1Dyx4UMFbSZpkzdI2a5KKfnuz7IIH8TTDvPd7Aw-hSyDXQIDfPAOtgJasKYEQUvLmCM2A1U0hAMhx3rNc7PRTdBbjJnsYcDZDb8tP2U8yOT9ib3GSX370g-uwHDUezRRkj82gjNZuXOPBpHevI7Y-4E723dRnMt-jGeSYMhXd4HoZXPo-RydW9tFcHOYcvd4tXxYPxerp_nFxuyq6sm5SQQlThAOHileGdYpxxgUTouRC1pYwBqbUUjWcNrRWQCw1itaWq8pYnTU6R1f73G3wH5OJqd34KYz5ZVtyKkRFGK2yC_auLvgYg7HtNrhBhu8WSLsrsP1TYGbogZGDCk6vzW_0_9QPeC1yMQ</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Yang, Dongqiang</creator><creator>Yin, Yanqin</creator><general>Cambridge University Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></search><sort><creationdate>20221101</creationdate><title>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</title><author>Yang, Dongqiang ; Yin, Yanqin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c269t-304b07171575e4cb47478488278a6f0441e2dab973936b10f3eb36f7b5efd1e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Knowledge bases (artificial intelligence)</topic><topic>Knowledge management</topic><topic>Language modeling</topic><topic>Learning transfer</topic><topic>Lexical semantics</topic><topic>Modelling</topic><topic>Natural language processing</topic><topic>Ontology</topic><topic>Polysemy</topic><topic>Retrofitting</topic><topic>Semantic relations</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Similarity measures</topic><topic>Taxonomy</topic><topic>Vector spaces</topic><topic>Word frequency</topic><topic>Word sense disambiguation</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Dongqiang</creatorcontrib><creatorcontrib>Yin, Yanqin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>ProQuest Psychology</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Dongqiang</au><au>Yin, Yanqin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation of taxonomic and neural embedding methods for calculating semantic similarity</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2022-11-01</date><risdate>2022</risdate><volume>28</volume><issue>6</issue><spage>733</spage><epage>761</epage><pages>733-761</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324921000279</doi><tpages>29</tpages><orcidid>https://orcid.org/0000-0002-3053-6610</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1351-3249 |
ispartof | Natural language engineering, 2022-11, Vol.28 (6), p.733-761 |
issn | 1351-3249 1469-8110 |
language | eng |
recordid | cdi_proquest_journals_2738850435 |
source | Cambridge University Press Journals Complete |
subjects | Knowledge bases (artificial intelligence) Knowledge management Language modeling Learning transfer Lexical semantics Modelling Natural language processing Ontology Polysemy Retrofitting Semantic relations Semantics Similarity Similarity measures Taxonomy Vector spaces Word frequency Word sense disambiguation Words (language) |
title | Evaluation of taxonomic and neural embedding methods for calculating semantic similarity |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T14%3A03%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20of%20taxonomic%20and%20neural%20embedding%20methods%20for%20calculating%20semantic%20similarity&rft.jtitle=Natural%20language%20engineering&rft.au=Yang,%20Dongqiang&rft.date=2022-11-01&rft.volume=28&rft.issue=6&rft.spage=733&rft.epage=761&rft.pages=733-761&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324921000279&rft_dat=%3Cproquest_cross%3E2738850435%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2738850435&rft_id=info:pmid/&rft_cupid=10_1017_S1351324921000279&rfr_iscdi=true |