On Creating an English-Thai Code-switched Machine Translation in Medical Domain
Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medi...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Pengpun, Parinthapat Tiankanon, Krittamate Chinkamol, Amrest Kinchagawat, Jiramet Chairuengjitjaras, Pitchaya Supholkhan, Pasit Aussavavirojekul, Pubordee Boonnag, Chiraphat Veerakanjana, Kanyakorn Phimsiri, Hirunkul Sae-jia, Boonthicha Sataudom, Nattawach Ittichaiwong, Piyalitt Limkonchotiwat, Peerat |
description | Machine translation (MT) in the medical domain plays a pivotal role in
enhancing healthcare quality and disseminating medical knowledge. Despite
advancements in English-Thai MT technology, common MT approaches often
underperform in the medical field due to their inability to precisely translate
medical terminologies. Our research prioritizes not merely improving
translation accuracy but also maintaining medical terminology in English within
the translated text through code-switched (CS) translation. We developed a
method to produce CS medical translation data, fine-tuned a CS translation
model with this data, and evaluated its performance against strong baselines,
such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model
demonstrated competitive performance in automatic metrics and was highly
favored in human preference evaluations. Our evaluation result also shows that
medical professionals significantly prefer CS translations that maintain
critical English terms accurately, even if it slightly compromises fluency. Our
code and test set are publicly available
https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024. |
doi_str_mv | 10.48550/arxiv.2410.16221 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_16221</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_16221</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_162213</originalsourceid><addsrcrecordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><source>arXiv.org</source><creator>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creator><creatorcontrib>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</creatorcontrib><description>Machine translation (MT) in the medical domain plays a pivotal role in
enhancing healthcare quality and disseminating medical knowledge. Despite
advancements in English-Thai MT technology, common MT approaches often
underperform in the medical field due to their inability to precisely translate
medical terminologies. Our research prioritizes not merely improving
translation accuracy but also maintaining medical terminology in English within
the translated text through code-switched (CS) translation. We developed a
method to produce CS medical translation data, fine-tuned a CS translation
model with this data, and evaluated its performance against strong baselines,
such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model
demonstrated competitive performance in automatic metrics and was highly
favored in human preference evaluations. Our evaluation result also shows that
medical professionals significantly prefer CS translations that maintain
critical English terms accurately, even if it slightly compromises fluency. Our
code and test set are publicly available
https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><identifier>DOI: 10.48550/arxiv.2410.16221</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.16221$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.16221$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.18653/v1/2024.findings-emnlp.351$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><description>Machine translation (MT) in the medical domain plays a pivotal role in
enhancing healthcare quality and disseminating medical knowledge. Despite
advancements in English-Thai MT technology, common MT approaches often
underperform in the medical field due to their inability to precisely translate
medical terminologies. Our research prioritizes not merely improving
translation accuracy but also maintaining medical terminology in English within
the translated text through code-switched (CS) translation. We developed a
method to produce CS medical translation data, fine-tuned a CS translation
model with this data, and evaluated its performance against strong baselines,
such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model
demonstrated competitive performance in automatic metrics and was highly
favored in human preference evaluations. Our evaluation result also shows that
medical professionals significantly prefer CS translations that maintain
critical English terms accurately, even if it slightly compromises fluency. Our
code and test set are publicly available
https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAUhbs4GPUBnLwvAEIF444YF8LCTm7aSm9SLqYl_ry9SNydTnJyvpxPiG2axNkpz5M9-hc9YplNRXqUMl2KumYovMGRuANkKLlzFGzUWCQoBm2i8KRRWaOhQmWJDTQeObiJGBiIoTKaFDo4Dz0Sr8Xihi6YzS9XYncpm-Iazdft3VOP_t1-FdpZ4fB_8QE23Ts3</recordid><startdate>20241021</startdate><enddate>20241021</enddate><creator>Pengpun, Parinthapat</creator><creator>Tiankanon, Krittamate</creator><creator>Chinkamol, Amrest</creator><creator>Kinchagawat, Jiramet</creator><creator>Chairuengjitjaras, Pitchaya</creator><creator>Supholkhan, Pasit</creator><creator>Aussavavirojekul, Pubordee</creator><creator>Boonnag, Chiraphat</creator><creator>Veerakanjana, Kanyakorn</creator><creator>Phimsiri, Hirunkul</creator><creator>Sae-jia, Boonthicha</creator><creator>Sataudom, Nattawach</creator><creator>Ittichaiwong, Piyalitt</creator><creator>Limkonchotiwat, Peerat</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241021</creationdate><title>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</title><author>Pengpun, Parinthapat ; Tiankanon, Krittamate ; Chinkamol, Amrest ; Kinchagawat, Jiramet ; Chairuengjitjaras, Pitchaya ; Supholkhan, Pasit ; Aussavavirojekul, Pubordee ; Boonnag, Chiraphat ; Veerakanjana, Kanyakorn ; Phimsiri, Hirunkul ; Sae-jia, Boonthicha ; Sataudom, Nattawach ; Ittichaiwong, Piyalitt ; Limkonchotiwat, Peerat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_162213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pengpun, Parinthapat</creatorcontrib><creatorcontrib>Tiankanon, Krittamate</creatorcontrib><creatorcontrib>Chinkamol, Amrest</creatorcontrib><creatorcontrib>Kinchagawat, Jiramet</creatorcontrib><creatorcontrib>Chairuengjitjaras, Pitchaya</creatorcontrib><creatorcontrib>Supholkhan, Pasit</creatorcontrib><creatorcontrib>Aussavavirojekul, Pubordee</creatorcontrib><creatorcontrib>Boonnag, Chiraphat</creatorcontrib><creatorcontrib>Veerakanjana, Kanyakorn</creatorcontrib><creatorcontrib>Phimsiri, Hirunkul</creatorcontrib><creatorcontrib>Sae-jia, Boonthicha</creatorcontrib><creatorcontrib>Sataudom, Nattawach</creatorcontrib><creatorcontrib>Ittichaiwong, Piyalitt</creatorcontrib><creatorcontrib>Limkonchotiwat, Peerat</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pengpun, Parinthapat</au><au>Tiankanon, Krittamate</au><au>Chinkamol, Amrest</au><au>Kinchagawat, Jiramet</au><au>Chairuengjitjaras, Pitchaya</au><au>Supholkhan, Pasit</au><au>Aussavavirojekul, Pubordee</au><au>Boonnag, Chiraphat</au><au>Veerakanjana, Kanyakorn</au><au>Phimsiri, Hirunkul</au><au>Sae-jia, Boonthicha</au><au>Sataudom, Nattawach</au><au>Ittichaiwong, Piyalitt</au><au>Limkonchotiwat, Peerat</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Creating an English-Thai Code-switched Machine Translation in Medical Domain</atitle><date>2024-10-21</date><risdate>2024</risdate><abstract>Machine translation (MT) in the medical domain plays a pivotal role in
enhancing healthcare quality and disseminating medical knowledge. Despite
advancements in English-Thai MT technology, common MT approaches often
underperform in the medical field due to their inability to precisely translate
medical terminologies. Our research prioritizes not merely improving
translation accuracy but also maintaining medical terminology in English within
the translated text through code-switched (CS) translation. We developed a
method to produce CS medical translation data, fine-tuned a CS translation
model with this data, and evaluated its performance against strong baselines,
such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model
demonstrated competitive performance in automatic metrics and was highly
favored in human preference evaluations. Our evaluation result also shows that
medical professionals significantly prefer CS translations that maintain
critical English terms accurately, even if it slightly compromises fluency. Our
code and test set are publicly available
https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.</abstract><doi>10.48550/arxiv.2410.16221</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.16221 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_16221 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning |
title | On Creating an English-Thai Code-switched Machine Translation in Medical Domain |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T11%3A06%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Creating%20an%20English-Thai%20Code-switched%20Machine%20Translation%20in%20Medical%20Domain&rft.au=Pengpun,%20Parinthapat&rft.date=2024-10-21&rft_id=info:doi/10.48550/arxiv.2410.16221&rft_dat=%3Carxiv_GOX%3E2410_16221%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |