Query translation for CLIR: EWC vs. Google Translate

A new approach to find accurate translation of search engine queries from Japanese into English for the CLIR task is proposed. The Mecab system and online dictionary SPACEALC are utilized to segment Japanese queries and to get all possible English senses for every term detected. To disambiguate term...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Klyuev, V., Haralambous, Y.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 711
container_issue
container_start_page 707
container_title
container_volume
creator Klyuev, V.
Haralambous, Y.
description A new approach to find accurate translation of search engine queries from Japanese into English for the CLIR task is proposed. The Mecab system and online dictionary SPACEALC are utilized to segment Japanese queries and to get all possible English senses for every term detected. To disambiguate terms, the idea of the shortest path on an oriented graph is applied. Nodes of this graph symbolize word senses and edges connect nodes representing neighboring Japanese terms. The EWC semantic relatedness measure is used to select the most related meanings for the translation results. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. The proposed technique is tested on the NTCIR data collection. Queries generated by Google Translate were used to evaluate the quality of translation.
doi_str_mv 10.1109/ICIST.2012.6221738
format Conference Proceeding
fullrecord <record><control><sourceid>hal_6IE</sourceid><recordid>TN_cdi_ieee_primary_6221738</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6221738</ieee_id><sourcerecordid>oai_HAL_hal_00959927v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-h1688-c2bfae4aade4613ac2ca9589c6a4696f0452e8ce7c6117d93c21bbd0a61223fe3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AYhFdUsNb8Ab3s1UPivvu93kqobSAgasRjeLPZ2EhsJKmF_nsrrc5lmOFhDkPINbAEgLm7LM1eioQz4InmHIywJ-QSpDKGCancKYmcsX9ZsDMy4aBlLIUyFyQaxw-2l1EgtJ0Q-fQdhh3dDLgeO9y0_Zo2_UDTPHu-p_O3lG7HhC76_r0LtDhC4YqcN9iNITr6lLw-zIt0GeePiyyd5fEKtLWx51WDQSLWQWoQ6LlHp6zzGqV2umFS8WB9MF4DmNoJz6GqaoYaOBdNEFNye9hdYVd-De0nDruyx7ZczvLyt2PMKee42cKevTmwbQjhHz4eJH4A82hU0Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Query translation for CLIR: EWC vs. Google Translate</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Klyuev, V. ; Haralambous, Y.</creator><creatorcontrib>Klyuev, V. ; Haralambous, Y.</creatorcontrib><description>A new approach to find accurate translation of search engine queries from Japanese into English for the CLIR task is proposed. The Mecab system and online dictionary SPACEALC are utilized to segment Japanese queries and to get all possible English senses for every term detected. To disambiguate terms, the idea of the shortest path on an oriented graph is applied. Nodes of this graph symbolize word senses and edges connect nodes representing neighboring Japanese terms. The EWC semantic relatedness measure is used to select the most related meanings for the translation results. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. The proposed technique is tested on the NTCIR data collection. Queries generated by Google Translate were used to evaluate the quality of translation.</description><identifier>ISSN: 2164-4357</identifier><identifier>ISBN: 9781457703430</identifier><identifier>ISBN: 1457703432</identifier><identifier>EISBN: 1457703459</identifier><identifier>EISBN: 1457703440</identifier><identifier>EISBN: 9781457703447</identifier><identifier>EISBN: 9781457703454</identifier><identifier>DOI: 10.1109/ICIST.2012.6221738</identifier><language>eng</language><publisher>IEEE</publisher><subject>Electronic publishing ; Encyclopedias ; Engineering Sciences ; Google ; Information retrieval ; Internet ; Semantics</subject><ispartof>2012 IEEE International Conference on Information Science and Technology, 2012, p.707-711</ispartof><rights>Attribution</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1443-6115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6221738$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,776,780,785,786,881,2052,4036,4037,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6221738$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-00959927$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Klyuev, V.</creatorcontrib><creatorcontrib>Haralambous, Y.</creatorcontrib><title>Query translation for CLIR: EWC vs. Google Translate</title><title>2012 IEEE International Conference on Information Science and Technology</title><addtitle>ICIST</addtitle><description>A new approach to find accurate translation of search engine queries from Japanese into English for the CLIR task is proposed. The Mecab system and online dictionary SPACEALC are utilized to segment Japanese queries and to get all possible English senses for every term detected. To disambiguate terms, the idea of the shortest path on an oriented graph is applied. Nodes of this graph symbolize word senses and edges connect nodes representing neighboring Japanese terms. The EWC semantic relatedness measure is used to select the most related meanings for the translation results. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. The proposed technique is tested on the NTCIR data collection. Queries generated by Google Translate were used to evaluate the quality of translation.</description><subject>Electronic publishing</subject><subject>Encyclopedias</subject><subject>Engineering Sciences</subject><subject>Google</subject><subject>Information retrieval</subject><subject>Internet</subject><subject>Semantics</subject><issn>2164-4357</issn><isbn>9781457703430</isbn><isbn>1457703432</isbn><isbn>1457703459</isbn><isbn>1457703440</isbn><isbn>9781457703447</isbn><isbn>9781457703454</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AYhFdUsNb8Ab3s1UPivvu93kqobSAgasRjeLPZ2EhsJKmF_nsrrc5lmOFhDkPINbAEgLm7LM1eioQz4InmHIywJ-QSpDKGCancKYmcsX9ZsDMy4aBlLIUyFyQaxw-2l1EgtJ0Q-fQdhh3dDLgeO9y0_Zo2_UDTPHu-p_O3lG7HhC76_r0LtDhC4YqcN9iNITr6lLw-zIt0GeePiyyd5fEKtLWx51WDQSLWQWoQ6LlHp6zzGqV2umFS8WB9MF4DmNoJz6GqaoYaOBdNEFNye9hdYVd-De0nDruyx7ZczvLyt2PMKee42cKevTmwbQjhHz4eJH4A82hU0Q</recordid><startdate>201203</startdate><enddate>201203</enddate><creator>Klyuev, V.</creator><creator>Haralambous, Y.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-1443-6115</orcidid></search><sort><creationdate>201203</creationdate><title>Query translation for CLIR: EWC vs. Google Translate</title><author>Klyuev, V. ; Haralambous, Y.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-h1688-c2bfae4aade4613ac2ca9589c6a4696f0452e8ce7c6117d93c21bbd0a61223fe3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Electronic publishing</topic><topic>Encyclopedias</topic><topic>Engineering Sciences</topic><topic>Google</topic><topic>Information retrieval</topic><topic>Internet</topic><topic>Semantics</topic><toplevel>online_resources</toplevel><creatorcontrib>Klyuev, V.</creatorcontrib><creatorcontrib>Haralambous, Y.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Klyuev, V.</au><au>Haralambous, Y.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Query translation for CLIR: EWC vs. Google Translate</atitle><btitle>2012 IEEE International Conference on Information Science and Technology</btitle><stitle>ICIST</stitle><date>2012-03</date><risdate>2012</risdate><spage>707</spage><epage>711</epage><pages>707-711</pages><issn>2164-4357</issn><isbn>9781457703430</isbn><isbn>1457703432</isbn><eisbn>1457703459</eisbn><eisbn>1457703440</eisbn><eisbn>9781457703447</eisbn><eisbn>9781457703454</eisbn><abstract>A new approach to find accurate translation of search engine queries from Japanese into English for the CLIR task is proposed. The Mecab system and online dictionary SPACEALC are utilized to segment Japanese queries and to get all possible English senses for every term detected. To disambiguate terms, the idea of the shortest path on an oriented graph is applied. Nodes of this graph symbolize word senses and edges connect nodes representing neighboring Japanese terms. The EWC semantic relatedness measure is used to select the most related meanings for the translation results. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. The proposed technique is tested on the NTCIR data collection. Queries generated by Google Translate were used to evaluate the quality of translation.</abstract><pub>IEEE</pub><doi>10.1109/ICIST.2012.6221738</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1443-6115</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2164-4357
ispartof 2012 IEEE International Conference on Information Science and Technology, 2012, p.707-711
issn 2164-4357
language eng
recordid cdi_ieee_primary_6221738
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Electronic publishing
Encyclopedias
Engineering Sciences
Google
Information retrieval
Internet
Semantics
title Query translation for CLIR: EWC vs. Google Translate
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T08%3A02%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Query%20translation%20for%20CLIR:%20EWC%20vs.%20Google%20Translate&rft.btitle=2012%20IEEE%20International%20Conference%20on%20Information%20Science%20and%20Technology&rft.au=Klyuev,%20V.&rft.date=2012-03&rft.spage=707&rft.epage=711&rft.pages=707-711&rft.issn=2164-4357&rft.isbn=9781457703430&rft.isbn_list=1457703432&rft_id=info:doi/10.1109/ICIST.2012.6221738&rft_dat=%3Chal_6IE%3Eoai_HAL_hal_00959927v1%3C/hal_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1457703459&rft.eisbn_list=1457703440&rft.eisbn_list=9781457703447&rft.eisbn_list=9781457703454&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6221738&rfr_iscdi=true