Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph

The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wu, Honglin, Liu, Shaoming
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 84
container_issue
container_start_page 75
container_title
container_volume
creator Wu, Honglin
Liu, Shaoming
description The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.
doi_str_mv 10.1007/11940098_8
format Conference Proceeding
fullrecord <record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_20127831</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>20127831</sourcerecordid><originalsourceid>FETCH-LOGICAL-p1338-28c257dbe4c849341b31c4ea68cd57c24b29e8fa97dff7e5bce9850398e88a4d3</originalsourceid><addsrcrecordid>eNpFULtOwzAUNS-JUrrwBV6QWAJ27MT22FZQQEUsVGVAihznJjE0TmQHAX9PSpF6lqtzz2M4CF1Qck0JETeUKk6Ikpk8QGcs4YSrNJXyEI1oSmnEGFdHe0G8HqMRYSSOlODsFE1CeCcDGB0yyQi9rVtf4OnGVq4B1-MZ9F8ADs9r6yAA1q7Aj7rTf2QVrKvwk_62zWeD12Cruh9ob-rtv3V4Zjvte9sDXnjd1efopNSbAJP_O0aru9uX-X20fF48zKfLqKOMySiWJk5EkQM3kivGac6o4aBTaYpEmJjnsQJZaiWKshSQ5AaUTAhTEqTUvGBjdLnr7XQwelN67YwNWedto_1PFhMaC8no4Lva-cIguQp8lrftR8goybbTZvtp2S8rqGY9</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><source>Springer Books</source><creator>Wu, Honglin ; Liu, Shaoming</creator><contributor>Matsumoto, Yuji ; Sproat, Richard W. ; Zhang, Min ; Wong, Kam-Fai</contributor><creatorcontrib>Wu, Honglin ; Liu, Shaoming ; Matsumoto, Yuji ; Sproat, Richard W. ; Zhang, Min ; Wong, Kam-Fai</creatorcontrib><description>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 354049667X</identifier><identifier>ISBN: 9783540496670</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 3540496688</identifier><identifier>EISBN: 9783540496687</identifier><identifier>DOI: 10.1007/11940098_8</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; bipartite graph ; Computer science; control theory; systems ; Exact sciences and technology ; matching ; similarity measure ; Speech and sound recognition and synthesis. Linguistics ; word alignment</subject><ispartof>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead, 2006, p.75-84</ispartof><rights>Springer-Verlag Berlin Heidelberg 2006</rights><rights>2008 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11940098_8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11940098_8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20127831$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Matsumoto, Yuji</contributor><contributor>Sproat, Richard W.</contributor><contributor>Zhang, Min</contributor><contributor>Wong, Kam-Fai</contributor><creatorcontrib>Wu, Honglin</creatorcontrib><creatorcontrib>Liu, Shaoming</creatorcontrib><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><title>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead</title><description>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>bipartite graph</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>matching</subject><subject>similarity measure</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>word alignment</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>354049667X</isbn><isbn>9783540496670</isbn><isbn>3540496688</isbn><isbn>9783540496687</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpFULtOwzAUNS-JUrrwBV6QWAJ27MT22FZQQEUsVGVAihznJjE0TmQHAX9PSpF6lqtzz2M4CF1Qck0JETeUKk6Ikpk8QGcs4YSrNJXyEI1oSmnEGFdHe0G8HqMRYSSOlODsFE1CeCcDGB0yyQi9rVtf4OnGVq4B1-MZ9F8ADs9r6yAA1q7Aj7rTf2QVrKvwk_62zWeD12Cruh9ob-rtv3V4Zjvte9sDXnjd1efopNSbAJP_O0aru9uX-X20fF48zKfLqKOMySiWJk5EkQM3kivGac6o4aBTaYpEmJjnsQJZaiWKshSQ5AaUTAhTEqTUvGBjdLnr7XQwelN67YwNWedto_1PFhMaC8no4Lva-cIguQp8lrftR8goybbTZvtp2S8rqGY9</recordid><startdate>2006</startdate><enddate>2006</enddate><creator>Wu, Honglin</creator><creator>Liu, Shaoming</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2006</creationdate><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><author>Wu, Honglin ; Liu, Shaoming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p1338-28c257dbe4c849341b31c4ea68cd57c24b29e8fa97dff7e5bce9850398e88a4d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>bipartite graph</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>matching</topic><topic>similarity measure</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>word alignment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Honglin</creatorcontrib><creatorcontrib>Liu, Shaoming</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Honglin</au><au>Liu, Shaoming</au><au>Matsumoto, Yuji</au><au>Sproat, Richard W.</au><au>Zhang, Min</au><au>Wong, Kam-Fai</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</atitle><btitle>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead</btitle><date>2006</date><risdate>2006</risdate><spage>75</spage><epage>84</epage><pages>75-84</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>354049667X</isbn><isbn>9783540496670</isbn><eisbn>3540496688</eisbn><eisbn>9783540496687</eisbn><abstract>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11940098_8</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0302-9743
ispartof Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead, 2006, p.75-84
issn 0302-9743
1611-3349
language eng
recordid cdi_pascalfrancis_primary_20127831
source Springer Books
subjects Applied sciences
Artificial intelligence
bipartite graph
Computer science
control theory
systems
Exact sciences and technology
matching
similarity measure
Speech and sound recognition and synthesis. Linguistics
word alignment
title Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T12%3A23%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Word%20Alignment%20Between%20Chinese%20and%20Japanese%20Using%20Maximum%20Weight%20Matching%20on%20Bipartite%20Graph&rft.btitle=Computer%20Processing%20of%20Oriental%20Languages.%20Beyond%20the%20Orient:%20The%20Research%20Challenges%20Ahead&rft.au=Wu,%20Honglin&rft.date=2006&rft.spage=75&rft.epage=84&rft.pages=75-84&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=354049667X&rft.isbn_list=9783540496670&rft_id=info:doi/10.1007/11940098_8&rft_dat=%3Cpascalfrancis_sprin%3E20127831%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=3540496688&rft.eisbn_list=9783540496687&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true