Machine Translation for Nko: Tools, Corpora and Baseline Results
Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed to...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-11 |
---|---|
Hauptverfasser: | , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Moussa Koulako Bala Doumbouya Baba, Mamadi Diané Cissé, Solo Farabado Diané, Djibrila Sow, Abdoulaye Séré Moussa Doumbouya Bangoura, Daouda Bayo, Fodé Moriba Ibrahima Sory 2 Condé Kalo, Mory Diané Piech, Chris Manning, Christopher |
description | Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2881541704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2881541704</sourcerecordid><originalsourceid>FETCH-proquest_journals_28815417043</originalsourceid><addsrcrecordid>eNqNi8EKgkAUAJcgSMp_eNA1Yd3VlE6RFF3qEHuXh62kLftsn_5_BX1ApznMzExESus0KTOlFiJm7qWUaluoPNeR2F-weXTeggno2eHYkYeWAlyftAND5HgDFYWBAgL6OxyQrfsON8uTG3kl5i06tvGPS7E-HU11ToZAr8nyWPc0Bf9RtSrLNM_SQmb6v-oN4rI4XQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2881541704</pqid></control><display><type>article</type><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><source>Freely Accessible Journals</source><creator>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</creator><creatorcontrib>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</creatorcontrib><description>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bilingualism ; Languages ; Machine translation ; Quality control</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Moussa Koulako Bala Doumbouya</creatorcontrib><creatorcontrib>Baba, Mamadi Diané</creatorcontrib><creatorcontrib>Cissé, Solo Farabado</creatorcontrib><creatorcontrib>Diané, Djibrila</creatorcontrib><creatorcontrib>Sow, Abdoulaye</creatorcontrib><creatorcontrib>Séré Moussa Doumbouya</creatorcontrib><creatorcontrib>Bangoura, Daouda</creatorcontrib><creatorcontrib>Bayo, Fodé Moriba</creatorcontrib><creatorcontrib>Ibrahima Sory 2 Condé</creatorcontrib><creatorcontrib>Kalo, Mory Diané</creatorcontrib><creatorcontrib>Piech, Chris</creatorcontrib><creatorcontrib>Manning, Christopher</creatorcontrib><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><title>arXiv.org</title><description>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</description><subject>Bilingualism</subject><subject>Languages</subject><subject>Machine translation</subject><subject>Quality control</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi8EKgkAUAJcgSMp_eNA1Yd3VlE6RFF3qEHuXh62kLftsn_5_BX1ApznMzExESus0KTOlFiJm7qWUaluoPNeR2F-weXTeggno2eHYkYeWAlyftAND5HgDFYWBAgL6OxyQrfsON8uTG3kl5i06tvGPS7E-HU11ToZAr8nyWPc0Bf9RtSrLNM_SQmb6v-oN4rI4XQ</recordid><startdate>20231115</startdate><enddate>20231115</enddate><creator>Moussa Koulako Bala Doumbouya</creator><creator>Baba, Mamadi Diané</creator><creator>Cissé, Solo Farabado</creator><creator>Diané, Djibrila</creator><creator>Sow, Abdoulaye</creator><creator>Séré Moussa Doumbouya</creator><creator>Bangoura, Daouda</creator><creator>Bayo, Fodé Moriba</creator><creator>Ibrahima Sory 2 Condé</creator><creator>Kalo, Mory Diané</creator><creator>Piech, Chris</creator><creator>Manning, Christopher</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231115</creationdate><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><author>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28815417043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bilingualism</topic><topic>Languages</topic><topic>Machine translation</topic><topic>Quality control</topic><toplevel>online_resources</toplevel><creatorcontrib>Moussa Koulako Bala Doumbouya</creatorcontrib><creatorcontrib>Baba, Mamadi Diané</creatorcontrib><creatorcontrib>Cissé, Solo Farabado</creatorcontrib><creatorcontrib>Diané, Djibrila</creatorcontrib><creatorcontrib>Sow, Abdoulaye</creatorcontrib><creatorcontrib>Séré Moussa Doumbouya</creatorcontrib><creatorcontrib>Bangoura, Daouda</creatorcontrib><creatorcontrib>Bayo, Fodé Moriba</creatorcontrib><creatorcontrib>Ibrahima Sory 2 Condé</creatorcontrib><creatorcontrib>Kalo, Mory Diané</creatorcontrib><creatorcontrib>Piech, Chris</creatorcontrib><creatorcontrib>Manning, Christopher</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Moussa Koulako Bala Doumbouya</au><au>Baba, Mamadi Diané</au><au>Cissé, Solo Farabado</au><au>Diané, Djibrila</au><au>Sow, Abdoulaye</au><au>Séré Moussa Doumbouya</au><au>Bangoura, Daouda</au><au>Bayo, Fodé Moriba</au><au>Ibrahima Sory 2 Condé</au><au>Kalo, Mory Diané</au><au>Piech, Chris</au><au>Manning, Christopher</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Machine Translation for Nko: Tools, Corpora and Baseline Results</atitle><jtitle>arXiv.org</jtitle><date>2023-11-15</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2881541704 |
source | Freely Accessible Journals |
subjects | Bilingualism Languages Machine translation Quality control |
title | Machine Translation for Nko: Tools, Corpora and Baseline Results |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T13%3A51%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Machine%20Translation%20for%20Nko:%20Tools,%20Corpora%20and%20Baseline%20Results&rft.jtitle=arXiv.org&rft.au=Moussa%20Koulako%20Bala%20Doumbouya&rft.date=2023-11-15&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2881541704%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2881541704&rft_id=info:pmid/&rfr_iscdi=true |