On Creating an English-Thai Code-switched Machine Translation in Medical Domain

Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pengpun, Parinthapat, Tiankanon, Krittamate, Chinkamol, Amrest, Kinchagawat, Jiramet, Chairuengjitjaras, Pitchaya, Supholkhan, Pasit, Aussavavirojekul, Pubordee, Boonnag, Chiraphat, Veerakanjana, Kanyakorn, Phimsiri, Hirunkul, Sae-jia, Boonthicha, Sataudom, Nattawach, Ittichaiwong, Piyalitt, Limkonchotiwat, Peerat
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Pengpun, Parinthapat Tiankanon, Krittamate Chinkamol, Amrest Kinchagawat, Jiramet Chairuengjitjaras, Pitchaya Supholkhan, Pasit Aussavavirojekul, Pubordee Boonnag, Chiraphat Veerakanjana, Kanyakorn Phimsiri, Hirunkul Sae-jia, Boonthicha Sataudom, Nattawach Ittichaiwong, Piyalitt Limkonchotiwat, Peerat
description	Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.
doi_str_mv	10.48550/arxiv.2410.16221
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_16221</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_16221</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_162213</originalsourceid><addsrcrecordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><source>arXiv.org</source><creator>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creator><creatorcontrib>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creatorcontrib><description>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><identifier>DOI: 10.48550/arxiv.2410.16221</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.16221$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.16221$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.18653/v1/2024.findings-emnlp.351$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><description>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</recordid><startdate>20241021</startdate><enddate>20241021</enddate><creator>Pengpun, Parinthapat</creator><creator>Tiankanon, Krittamate</creator><creator>Chinkamol, Amrest</creator><creator>Kinchagawat, Jiramet</creator><creator>Chairuengjitjaras, Pitchaya</creator><creator>Supholkhan, Pasit</creator><creator>Aussavavirojekul, Pubordee</creator><creator>Boonnag, Chiraphat</creator><creator>Veerakanjana, Kanyakorn</creator><creator>Phimsiri, Hirunkul</creator><creator>Sae-jia, Boonthicha</creator><creator>Sataudom, Nattawach</creator><creator>Ittichaiwong, Piyalitt</creator><creator>Limkonchotiwat, Peerat</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241021</creationdate><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><author>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_162213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pengpun, Parinthapat</au><au>Tiankanon, Krittamate</au><au>Chinkamol, Amrest</au><au>Kinchagawat, Jiramet</au><au>Chairuengjitjaras, Pitchaya</au><au>Supholkhan, Pasit</au><au>Aussavavirojekul, Pubordee</au><au>Boonnag, Chiraphat</au><au>Veerakanjana, Kanyakorn</au><au>Phimsiri, Hirunkul</au><au>Sae-jia, Boonthicha</au><au>Sataudom, Nattawach</au><au>Ittichaiwong, Piyalitt</au><au>Limkonchotiwat, Peerat</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</atitle><date>2024-10-21</date><risdate>2024</risdate><abstract>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</abstract><doi>10.48550/arxiv.2410.16221</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.16221
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_16221
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
title	On Creating an English-Thai Code-switched Machine Translation in Medical Domain
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T11%3A06%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Creating%20an%20English-Thai%20Code-switched%20Machine%20Translation%20in%20Medical%20Domain&rft.au=Pengpun,%20Parinthapat&rft.date=2024-10-21&rft_id=info:doi/10.48550/arxiv.2410.16221&rft_dat=%3Carxiv_GOX%3E2410_16221%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true