Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph
The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 84 |
---|---|
container_issue | |
container_start_page | 75 |
container_title | |
container_volume | |
creator | Wu, Honglin Liu, Shaoming |
description | The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++. |
doi_str_mv | 10.1007/11940098_8 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_20127831</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>20127831</sourcerecordid><originalsourceid>FETCH-LOGICAL-p1338-28c257dbe4c849341b31c4ea68cd57c24b29e8fa97dff7e5bce9850398e88a4d3</originalsourceid><addsrcrecordid>eNpFULtOwzAUNS-JUrrwBV6QWAJ27MT22FZQQEUsVGVAihznJjE0TmQHAX9PSpF6lqtzz2M4CF1Qck0JETeUKk6Ikpk8QGcs4YSrNJXyEI1oSmnEGFdHe0G8HqMRYSSOlODsFE1CeCcDGB0yyQi9rVtf4OnGVq4B1-MZ9F8ADs9r6yAA1q7Aj7rTf2QVrKvwk_62zWeD12Cruh9ob-rtv3V4Zjvte9sDXnjd1efopNSbAJP_O0aru9uX-X20fF48zKfLqKOMySiWJk5EkQM3kivGac6o4aBTaYpEmJjnsQJZaiWKshSQ5AaUTAhTEqTUvGBjdLnr7XQwelN67YwNWedto_1PFhMaC8no4Lva-cIguQp8lrftR8goybbTZvtp2S8rqGY9</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><source>Springer Books</source><creator>Wu, Honglin ; Liu, Shaoming</creator><contributor>Matsumoto, Yuji ; Sproat, Richard W. ; Zhang, Min ; Wong, Kam-Fai</contributor><creatorcontrib>Wu, Honglin ; Liu, Shaoming ; Matsumoto, Yuji ; Sproat, Richard W. ; Zhang, Min ; Wong, Kam-Fai</creatorcontrib><description>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 354049667X</identifier><identifier>ISBN: 9783540496670</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 3540496688</identifier><identifier>EISBN: 9783540496687</identifier><identifier>DOI: 10.1007/11940098_8</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Applied sciences ; Artificial intelligence ; bipartite graph ; Computer science; control theory; systems ; Exact sciences and technology ; matching ; similarity measure ; Speech and sound recognition and synthesis. Linguistics ; word alignment</subject><ispartof>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead, 2006, p.75-84</ispartof><rights>Springer-Verlag Berlin Heidelberg 2006</rights><rights>2008 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11940098_8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11940098_8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20127831$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Matsumoto, Yuji</contributor><contributor>Sproat, Richard W.</contributor><contributor>Zhang, Min</contributor><contributor>Wong, Kam-Fai</contributor><creatorcontrib>Wu, Honglin</creatorcontrib><creatorcontrib>Liu, Shaoming</creatorcontrib><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><title>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead</title><description>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>bipartite graph</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>matching</subject><subject>similarity measure</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>word alignment</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>354049667X</isbn><isbn>9783540496670</isbn><isbn>3540496688</isbn><isbn>9783540496687</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpFULtOwzAUNS-JUrrwBV6QWAJ27MT22FZQQEUsVGVAihznJjE0TmQHAX9PSpF6lqtzz2M4CF1Qck0JETeUKk6Ikpk8QGcs4YSrNJXyEI1oSmnEGFdHe0G8HqMRYSSOlODsFE1CeCcDGB0yyQi9rVtf4OnGVq4B1-MZ9F8ADs9r6yAA1q7Aj7rTf2QVrKvwk_62zWeD12Cruh9ob-rtv3V4Zjvte9sDXnjd1efopNSbAJP_O0aru9uX-X20fF48zKfLqKOMySiWJk5EkQM3kivGac6o4aBTaYpEmJjnsQJZaiWKshSQ5AaUTAhTEqTUvGBjdLnr7XQwelN67YwNWedto_1PFhMaC8no4Lva-cIguQp8lrftR8goybbTZvtp2S8rqGY9</recordid><startdate>2006</startdate><enddate>2006</enddate><creator>Wu, Honglin</creator><creator>Liu, Shaoming</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2006</creationdate><title>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</title><author>Wu, Honglin ; Liu, Shaoming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p1338-28c257dbe4c849341b31c4ea68cd57c24b29e8fa97dff7e5bce9850398e88a4d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>bipartite graph</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>matching</topic><topic>similarity measure</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>word alignment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Honglin</creatorcontrib><creatorcontrib>Liu, Shaoming</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Honglin</au><au>Liu, Shaoming</au><au>Matsumoto, Yuji</au><au>Sproat, Richard W.</au><au>Zhang, Min</au><au>Wong, Kam-Fai</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph</atitle><btitle>Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead</btitle><date>2006</date><risdate>2006</risdate><spage>75</spage><epage>84</epage><pages>75-84</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>354049667X</isbn><isbn>9783540496670</isbn><eisbn>3540496688</eisbn><eisbn>9783540496687</eisbn><abstract>The word-aligned bilingual corpus is an important knowledge source for many tasks in NLP especially in machine translation. Among the existing word alignment methods, the unknown word problem, the synonym problem and the global optimization problem are very important factors impacting the recall and precision of alignment results. In this paper, we proposed a word alignment model between Chinese and Japanese which measures similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence, and matches words by maximum weight matching on bipartite graph. The model can partly solve the problems mentioned above. The model was proved to be effective by experiments. It achieved 80% as F-Score than 72% of GIZA++.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11940098_8</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0302-9743 |
ispartof | Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead, 2006, p.75-84 |
issn | 0302-9743 1611-3349 |
language | eng |
recordid | cdi_pascalfrancis_primary_20127831 |
source | Springer Books |
subjects | Applied sciences Artificial intelligence bipartite graph Computer science control theory systems Exact sciences and technology matching similarity measure Speech and sound recognition and synthesis. Linguistics word alignment |
title | Word Alignment Between Chinese and Japanese Using Maximum Weight Matching on Bipartite Graph |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T12%3A23%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Word%20Alignment%20Between%20Chinese%20and%20Japanese%20Using%20Maximum%20Weight%20Matching%20on%20Bipartite%20Graph&rft.btitle=Computer%20Processing%20of%20Oriental%20Languages.%20Beyond%20the%20Orient:%20The%20Research%20Challenges%20Ahead&rft.au=Wu,%20Honglin&rft.date=2006&rft.spage=75&rft.epage=84&rft.pages=75-84&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=354049667X&rft.isbn_list=9783540496670&rft_id=info:doi/10.1007/11940098_8&rft_dat=%3Cpascalfrancis_sprin%3E20127831%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=3540496688&rft.eisbn_list=9783540496687&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |