Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances

This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Transactions of the Japanese Society for Artificial Intelligence 2005, Vol.20(3), pp.139-148
Hauptverfasser:	Akiba, Yasuhiro, Imamura, Kenji, Sumita, Eiichiro, Nakaiwa, Hiromi, Yamamoto, Seiichi, Okuno, Hiroshi G.
Format:	Artikel
Sprache:	eng ; jpn
Schlagworte:	BLEU decision tree edit distances machine translation evaluation mWER reference translations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	148
container_issue	3
container_start_page	139
container_title	Transactions of the Japanese Society for Artificial Intelligence
container_volume	20
creator	Akiba, Yasuhiro Imamura, Kenji Sumita, Eiichiro Nakaiwa, Hiromi Yamamoto, Seiichi Okuno, Hiroshi G.
description	This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nce (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.
doi_str_mv	10.1527/tjsai.20.139
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1476944090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3180058311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2319-159d482fa0a7e5fa2b789d31b6fecefecc93a7dc06af6fb98f50992c08c2ad3e3</originalsourceid><addsrcrecordid>eNpFkEtPAjEQxxujiQS5-QGaeBXsY189EkQ0gXAAzs1st8WSZXdtuwe-vdUleuh0pv3N64_QIyUzmrL8JZw82BmLERc3aER5kk0Lwsnt1Sc5Te7RxHtbEkIZTyhJR2g370N7hmAVXjmotMOtwZs93vah64PHtsGLtq7br95CjXfhUmtcXvDB2-aIN30dbBdflpUN-NX6AI3S_gHdGai9nlzvMTq8LfeL9-l6u_pYzNdTxTgVU5qKKimYAQK5Tg2wMi9ExWmZGa10PEpwyCtFMjCZKUVhUiIEU6RQDCqu-Rg9DXU7F-fTPshT27smtpQ0yTORJESQSD0PlHKt904b2Tl7BneRlMgf5eSvcpLFiIuIzwf8FLc56j8YXNSo1v8wH0zM-ftTn-Ckbvg3R4R6HA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1476944090</pqid></control><display><type>article</type><title>Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances</title><source>J-STAGE Free</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Akiba, Yasuhiro ; Imamura, Kenji ; Sumita, Eiichiro ; Nakaiwa, Hiromi ; Yamamoto, Seiichi ; Okuno, Hiroshi G.</creator><creatorcontrib>Akiba, Yasuhiro ; Imamura, Kenji ; Sumita, Eiichiro ; Nakaiwa, Hiromi ; Yamamoto, Seiichi ; Okuno, Hiroshi G.</creatorcontrib><description>This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nce (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.</description><identifier>ISSN: 1346-0714</identifier><identifier>EISSN: 1346-8030</identifier><identifier>DOI: 10.1527/tjsai.20.139</identifier><language>eng ; jpn</language><publisher>Tokyo: The Japanese Society for Artificial Intelligence</publisher><subject>BLEU ; decision tree ; edit distances ; machine translation evaluation ; mWER ; reference translations</subject><ispartof>Transactions of the Japanese Society for Artificial Intelligence, 2005, Vol.20(3), pp.139-148</ispartof><rights>2005 JSAI (The Japanese Society for Artificial Intelligence)</rights><rights>Copyright Japan Science and Technology Agency 2005</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2319-159d482fa0a7e5fa2b789d31b6fecefecc93a7dc06af6fb98f50992c08c2ad3e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27902,27903,27904</link.rule.ids></links><search><creatorcontrib>Akiba, Yasuhiro</creatorcontrib><creatorcontrib>Imamura, Kenji</creatorcontrib><creatorcontrib>Sumita, Eiichiro</creatorcontrib><creatorcontrib>Nakaiwa, Hiromi</creatorcontrib><creatorcontrib>Yamamoto, Seiichi</creatorcontrib><creatorcontrib>Okuno, Hiroshi G.</creatorcontrib><title>Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances</title><title>Transactions of the Japanese Society for Artificial Intelligence</title><description>This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nce (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.</description><subject>BLEU</subject><subject>decision tree</subject><subject>edit distances</subject><subject>machine translation evaluation</subject><subject>mWER</subject><subject>reference translations</subject><issn>1346-0714</issn><issn>1346-8030</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNpFkEtPAjEQxxujiQS5-QGaeBXsY189EkQ0gXAAzs1st8WSZXdtuwe-vdUleuh0pv3N64_QIyUzmrL8JZw82BmLERc3aER5kk0Lwsnt1Sc5Te7RxHtbEkIZTyhJR2g370N7hmAVXjmotMOtwZs93vah64PHtsGLtq7br95CjXfhUmtcXvDB2-aIN30dbBdflpUN-NX6AI3S_gHdGai9nlzvMTq8LfeL9-l6u_pYzNdTxTgVU5qKKimYAQK5Tg2wMi9ExWmZGa10PEpwyCtFMjCZKUVhUiIEU6RQDCqu-Rg9DXU7F-fTPshT27smtpQ0yTORJESQSD0PlHKt904b2Tl7BneRlMgf5eSvcpLFiIuIzwf8FLc56j8YXNSo1v8wH0zM-ftTn-Ckbvg3R4R6HA</recordid><startdate>2005</startdate><enddate>2005</enddate><creator>Akiba, Yasuhiro</creator><creator>Imamura, Kenji</creator><creator>Sumita, Eiichiro</creator><creator>Nakaiwa, Hiromi</creator><creator>Yamamoto, Seiichi</creator><creator>Okuno, Hiroshi G.</creator><general>The Japanese Society for Artificial Intelligence</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2005</creationdate><title>Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances</title><author>Akiba, Yasuhiro ; Imamura, Kenji ; Sumita, Eiichiro ; Nakaiwa, Hiromi ; Yamamoto, Seiichi ; Okuno, Hiroshi G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2319-159d482fa0a7e5fa2b789d31b6fecefecc93a7dc06af6fb98f50992c08c2ad3e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng ; jpn</language><creationdate>2005</creationdate><topic>BLEU</topic><topic>decision tree</topic><topic>edit distances</topic><topic>machine translation evaluation</topic><topic>mWER</topic><topic>reference translations</topic><toplevel>online_resources</toplevel><creatorcontrib>Akiba, Yasuhiro</creatorcontrib><creatorcontrib>Imamura, Kenji</creatorcontrib><creatorcontrib>Sumita, Eiichiro</creatorcontrib><creatorcontrib>Nakaiwa, Hiromi</creatorcontrib><creatorcontrib>Yamamoto, Seiichi</creatorcontrib><creatorcontrib>Okuno, Hiroshi G.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Transactions of the Japanese Society for Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Akiba, Yasuhiro</au><au>Imamura, Kenji</au><au>Sumita, Eiichiro</au><au>Nakaiwa, Hiromi</au><au>Yamamoto, Seiichi</au><au>Okuno, Hiroshi G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances</atitle><jtitle>Transactions of the Japanese Society for Artificial Intelligence</jtitle><date>2005</date><risdate>2005</risdate><volume>20</volume><issue>3</issue><spage>139</spage><epage>148</epage><pages>139-148</pages><issn>1346-0714</issn><eissn>1346-8030</eissn><abstract>This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nce (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.</abstract><cop>Tokyo</cop><pub>The Japanese Society for Artificial Intelligence</pub><doi>10.1527/tjsai.20.139</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1346-0714
ispartof	Transactions of the Japanese Society for Artificial Intelligence, 2005, Vol.20(3), pp.139-148
issn	1346-0714 1346-8030
language	eng ; jpn
recordid	cdi_proquest_journals_1476944090
source	J-STAGE Free; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	BLEU decision tree edit distances machine translation evaluation mWER reference translations
title	Automatic Grader of MT Outputs in Colloquial Style by Using Multiple Edit Distances
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A17%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Grader%20of%20MT%20Outputs%20in%20Colloquial%20Style%20by%20Using%20Multiple%20Edit%20Distances&rft.jtitle=Transactions%20of%20the%20Japanese%20Society%20for%20Artificial%20Intelligence&rft.au=Akiba,%20Yasuhiro&rft.date=2005&rft.volume=20&rft.issue=3&rft.spage=139&rft.epage=148&rft.pages=139-148&rft.issn=1346-0714&rft.eissn=1346-8030&rft_id=info:doi/10.1527/tjsai.20.139&rft_dat=%3Cproquest_cross%3E3180058311%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1476944090&rft_id=info:pmid/&rfr_iscdi=true