On Creating an English-Thai Code-switched Machine Translation in Medical Domain

Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Pengpun, Parinthapat, Tiankanon, Krittamate, Chinkamol, Amrest, Kinchagawat, Jiramet, Chairuengjitjaras, Pitchaya, Supholkhan, Pasit, Aussavavirojekul, Pubordee, Boonnag, Chiraphat, Veerakanjana, Kanyakorn, Phimsiri, Hirunkul, Sae-jia, Boonthicha, Sataudom, Nattawach, Ittichaiwong, Piyalitt, Limkonchotiwat, Peerat
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Pengpun, Parinthapat
Tiankanon, Krittamate
Chinkamol, Amrest
Kinchagawat, Jiramet
Chairuengjitjaras, Pitchaya
Supholkhan, Pasit
Aussavavirojekul, Pubordee
Boonnag, Chiraphat
Veerakanjana, Kanyakorn
Phimsiri, Hirunkul
Sae-jia, Boonthicha
Sataudom, Nattawach
Ittichaiwong, Piyalitt
Limkonchotiwat, Peerat
description Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.
doi_str_mv 10.48550/arxiv.2410.16221
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_16221</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_16221</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_162213</originalsourceid><addsrcrecordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><source>arXiv.org</source><creator>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creator><creatorcontrib>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creatorcontrib><description>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><identifier>DOI: 10.48550/arxiv.2410.16221</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.16221$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.16221$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.18653/v1/2024.findings-emnlp.351$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><description>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</recordid><startdate>20241021</startdate><enddate>20241021</enddate><creator>Pengpun, Parinthapat</creator><creator>Tiankanon, Krittamate</creator><creator>Chinkamol, Amrest</creator><creator>Kinchagawat, Jiramet</creator><creator>Chairuengjitjaras, Pitchaya</creator><creator>Supholkhan, Pasit</creator><creator>Aussavavirojekul, Pubordee</creator><creator>Boonnag, Chiraphat</creator><creator>Veerakanjana, Kanyakorn</creator><creator>Phimsiri, Hirunkul</creator><creator>Sae-jia, Boonthicha</creator><creator>Sataudom, Nattawach</creator><creator>Ittichaiwong, Piyalitt</creator><creator>Limkonchotiwat, Peerat</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241021</creationdate><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><author>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_162213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pengpun, Parinthapat</au><au>Tiankanon, Krittamate</au><au>Chinkamol, Amrest</au><au>Kinchagawat, Jiramet</au><au>Chairuengjitjaras, Pitchaya</au><au>Supholkhan, Pasit</au><au>Aussavavirojekul, Pubordee</au><au>Boonnag, Chiraphat</au><au>Veerakanjana, Kanyakorn</au><au>Phimsiri, Hirunkul</au><au>Sae-jia, Boonthicha</au><au>Sataudom, Nattawach</au><au>Ittichaiwong, Piyalitt</au><au>Limkonchotiwat, Peerat</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</atitle><date>2024-10-21</date><risdate>2024</risdate><abstract>Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</abstract><doi>10.48550/arxiv.2410.16221</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2410.16221
ispartof
issn
language eng
recordid cdi_arxiv_primary_2410_16221
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
title On Creating an English-Thai Code-switched Machine Translation in Medical Domain
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T11%3A06%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Creating%20an%20English-Thai%20Code-switched%20Machine%20Translation%20in%20Medical%20Domain&rft.au=Pengpun,%20Parinthapat&rft.date=2024-10-21&rft_id=info:doi/10.48550/arxiv.2410.16221&rft_dat=%3Carxiv_GOX%3E2410_16221%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true