Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning
Translation templates are an important cause of knowledge in machine translation (MT) systems. Their quality and scale directly influence the performance of MT systems. How to obtain high-quality and efficient translation templates from corpora has become a hot topic in recent study. In this paper,...
Gespeichert in:
Veröffentlicht in: | Mathematical problems in engineering 2022-04, Vol.2022, p.1-9 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 9 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | Mathematical problems in engineering |
container_volume | 2022 |
creator | Dong, Zhaofeng |
description | Translation templates are an important cause of knowledge in machine translation (MT) systems. Their quality and scale directly influence the performance of MT systems. How to obtain high-quality and efficient translation templates from corpora has become a hot topic in recent study. In this paper, a tree to String alignment template (TAT) based on syntactic structure is proposed. This template describes the alignment between the source language syntax tree and the target language string. The syntactic structure, a large number of construction tags, and variables are introduced into the template, which enables the syntactic model to deal with discontinuous phrases and has the ability of generalization. Templates can be used in syntactic statistics, case-based, and rule-based MT systems according to different decoders. ATTEBSC algorithm is a basic method to learn translation templates by comparing sentence pairs. It demands that sentence pairs be constructed in a precise comparison structure ahead of time, but there are no strict guidelines on how to do it. In this paper, we propose a method to calculate the specific comparison scheme using the longest common subsequence (LCS) and use the normalized LCS distance to screen sentences with high similarity and then use the ATTEBSC algorithm to automatically remove the template. Experiments show that this method is easy and effective, and many expensive templates can be learned. |
doi_str_mv | 10.1155/2022/9349657 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2653906877</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2653906877</sourcerecordid><originalsourceid>FETCH-LOGICAL-c294t-6af7841006a3ed6dd4ee2dd5cdbde22e4c12e9a46c28c0f1d7ebde613176a8ab3</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqVw4wdE4gihtuNHciylPKRKXILEzXLtTeuqdYKdCPj3uLRnTjva-bSrGYSuCb4nhPMJxZROqoJVgssTNCJcFDknTJ4mjSnLCS0-ztFFjBuMKeGkHKF6OvTtTvfOZPPvPmjTu9ZnbZPN_Wrr4jqfrZ2HCFkdtI9b_WfXsOuShJg96Ag2S6tHgC5bgA7e-dUlOmv0NsLVcY7R-9O8nr3ki7fn19l0kRtasT4XupElIxgLXYAV1jIAai03dmmBUmCGUKg0E4aWBjfESkiGIAWRQpd6WYzRzeFuF9rPAWKvNu0QfHqpqOBFhUUpZaLuDpQJbYwBGtUFt9PhRxGs9r2pfW_q2FvCbw94Cm71l_uf_gWGfW2k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2653906877</pqid></control><display><type>article</type><title>Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Wiley Online Library Open Access</source><source>Alma/SFX Local Collection</source><creator>Dong, Zhaofeng</creator><contributor>Jan, Naeem ; Naeem Jan</contributor><creatorcontrib>Dong, Zhaofeng ; Jan, Naeem ; Naeem Jan</creatorcontrib><description>Translation templates are an important cause of knowledge in machine translation (MT) systems. Their quality and scale directly influence the performance of MT systems. How to obtain high-quality and efficient translation templates from corpora has become a hot topic in recent study. In this paper, a tree to String alignment template (TAT) based on syntactic structure is proposed. This template describes the alignment between the source language syntax tree and the target language string. The syntactic structure, a large number of construction tags, and variables are introduced into the template, which enables the syntactic model to deal with discontinuous phrases and has the ability of generalization. Templates can be used in syntactic statistics, case-based, and rule-based MT systems according to different decoders. ATTEBSC algorithm is a basic method to learn translation templates by comparing sentence pairs. It demands that sentence pairs be constructed in a precise comparison structure ahead of time, but there are no strict guidelines on how to do it. In this paper, we propose a method to calculate the specific comparison scheme using the longest common subsequence (LCS) and use the normalized LCS distance to screen sentences with high similarity and then use the ATTEBSC algorithm to automatically remove the template. Experiments show that this method is easy and effective, and many expensive templates can be learned.</description><identifier>ISSN: 1024-123X</identifier><identifier>EISSN: 1563-5147</identifier><identifier>DOI: 10.1155/2022/9349657</identifier><language>eng</language><publisher>New York: Hindawi</publisher><subject>Accuracy ; Algorithms ; Alignment ; Automation ; Bilingualism ; Decoders ; Engineering ; Grammar ; Knowledge ; Language ; Libraries ; Linguistics ; Machine learning ; Machine translation ; Methods ; Natural language processing ; Semantics ; Sentences ; Strings ; Syntax</subject><ispartof>Mathematical problems in engineering, 2022-04, Vol.2022, p.1-9</ispartof><rights>Copyright © 2022 Zhaofeng Dong.</rights><rights>Copyright © 2022 Zhaofeng Dong. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c294t-6af7841006a3ed6dd4ee2dd5cdbde22e4c12e9a46c28c0f1d7ebde613176a8ab3</cites><orcidid>0000-0003-0319-3787</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906</link.rule.ids></links><search><contributor>Jan, Naeem</contributor><contributor>Naeem Jan</contributor><creatorcontrib>Dong, Zhaofeng</creatorcontrib><title>Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning</title><title>Mathematical problems in engineering</title><description>Translation templates are an important cause of knowledge in machine translation (MT) systems. Their quality and scale directly influence the performance of MT systems. How to obtain high-quality and efficient translation templates from corpora has become a hot topic in recent study. In this paper, a tree to String alignment template (TAT) based on syntactic structure is proposed. This template describes the alignment between the source language syntax tree and the target language string. The syntactic structure, a large number of construction tags, and variables are introduced into the template, which enables the syntactic model to deal with discontinuous phrases and has the ability of generalization. Templates can be used in syntactic statistics, case-based, and rule-based MT systems according to different decoders. ATTEBSC algorithm is a basic method to learn translation templates by comparing sentence pairs. It demands that sentence pairs be constructed in a precise comparison structure ahead of time, but there are no strict guidelines on how to do it. In this paper, we propose a method to calculate the specific comparison scheme using the longest common subsequence (LCS) and use the normalized LCS distance to screen sentences with high similarity and then use the ATTEBSC algorithm to automatically remove the template. Experiments show that this method is easy and effective, and many expensive templates can be learned.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Alignment</subject><subject>Automation</subject><subject>Bilingualism</subject><subject>Decoders</subject><subject>Engineering</subject><subject>Grammar</subject><subject>Knowledge</subject><subject>Language</subject><subject>Libraries</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Machine translation</subject><subject>Methods</subject><subject>Natural language processing</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Strings</subject><subject>Syntax</subject><issn>1024-123X</issn><issn>1563-5147</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kEtPwzAQhC0EEqVw4wdE4gihtuNHciylPKRKXILEzXLtTeuqdYKdCPj3uLRnTjva-bSrGYSuCb4nhPMJxZROqoJVgssTNCJcFDknTJ4mjSnLCS0-ztFFjBuMKeGkHKF6OvTtTvfOZPPvPmjTu9ZnbZPN_Wrr4jqfrZ2HCFkdtI9b_WfXsOuShJg96Ag2S6tHgC5bgA7e-dUlOmv0NsLVcY7R-9O8nr3ki7fn19l0kRtasT4XupElIxgLXYAV1jIAai03dmmBUmCGUKg0E4aWBjfESkiGIAWRQpd6WYzRzeFuF9rPAWKvNu0QfHqpqOBFhUUpZaLuDpQJbYwBGtUFt9PhRxGs9r2pfW_q2FvCbw94Cm71l_uf_gWGfW2k</recordid><startdate>20220414</startdate><enddate>20220414</enddate><creator>Dong, Zhaofeng</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TB</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><orcidid>https://orcid.org/0000-0003-0319-3787</orcidid></search><sort><creationdate>20220414</creationdate><title>Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning</title><author>Dong, Zhaofeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c294t-6af7841006a3ed6dd4ee2dd5cdbde22e4c12e9a46c28c0f1d7ebde613176a8ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Alignment</topic><topic>Automation</topic><topic>Bilingualism</topic><topic>Decoders</topic><topic>Engineering</topic><topic>Grammar</topic><topic>Knowledge</topic><topic>Language</topic><topic>Libraries</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Machine translation</topic><topic>Methods</topic><topic>Natural language processing</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Strings</topic><topic>Syntax</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Zhaofeng</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Mathematical problems in engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong, Zhaofeng</au><au>Jan, Naeem</au><au>Naeem Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning</atitle><jtitle>Mathematical problems in engineering</jtitle><date>2022-04-14</date><risdate>2022</risdate><volume>2022</volume><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>1024-123X</issn><eissn>1563-5147</eissn><abstract>Translation templates are an important cause of knowledge in machine translation (MT) systems. Their quality and scale directly influence the performance of MT systems. How to obtain high-quality and efficient translation templates from corpora has become a hot topic in recent study. In this paper, a tree to String alignment template (TAT) based on syntactic structure is proposed. This template describes the alignment between the source language syntax tree and the target language string. The syntactic structure, a large number of construction tags, and variables are introduced into the template, which enables the syntactic model to deal with discontinuous phrases and has the ability of generalization. Templates can be used in syntactic statistics, case-based, and rule-based MT systems according to different decoders. ATTEBSC algorithm is a basic method to learn translation templates by comparing sentence pairs. It demands that sentence pairs be constructed in a precise comparison structure ahead of time, but there are no strict guidelines on how to do it. In this paper, we propose a method to calculate the specific comparison scheme using the longest common subsequence (LCS) and use the normalized LCS distance to screen sentences with high similarity and then use the ATTEBSC algorithm to automatically remove the template. Experiments show that this method is easy and effective, and many expensive templates can be learned.</abstract><cop>New York</cop><pub>Hindawi</pub><doi>10.1155/2022/9349657</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-0319-3787</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1024-123X |
ispartof | Mathematical problems in engineering, 2022-04, Vol.2022, p.1-9 |
issn | 1024-123X 1563-5147 |
language | eng |
recordid | cdi_proquest_journals_2653906877 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Wiley Online Library Open Access; Alma/SFX Local Collection |
subjects | Accuracy Algorithms Alignment Automation Bilingualism Decoders Engineering Grammar Knowledge Language Libraries Linguistics Machine learning Machine translation Methods Natural language processing Semantics Sentences Strings Syntax |
title | Automatic Extraction of English-Chinese Translation Templates Based on Deep Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T10%3A32%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Extraction%20of%20English-Chinese%20Translation%20Templates%20Based%20on%20Deep%20Learning&rft.jtitle=Mathematical%20problems%20in%20engineering&rft.au=Dong,%20Zhaofeng&rft.date=2022-04-14&rft.volume=2022&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=1024-123X&rft.eissn=1563-5147&rft_id=info:doi/10.1155/2022/9349657&rft_dat=%3Cproquest_cross%3E2653906877%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2653906877&rft_id=info:pmid/&rfr_iscdi=true |