Machine Translation for Nko: Tools, Corpora and Baseline Results

Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-11
Hauptverfasser: Moussa Koulako Bala Doumbouya, Baba, Mamadi Diané, Cissé, Solo Farabado, Diané, Djibrila, Sow, Abdoulaye, Séré Moussa Doumbouya, Bangoura, Daouda, Bayo, Fodé Moriba, Ibrahima Sory 2 Condé, Kalo, Mory Diané, Piech, Chris, Manning, Christopher
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Moussa Koulako Bala Doumbouya
Baba, Mamadi Diané
Cissé, Solo Farabado
Diané, Djibrila
Sow, Abdoulaye
Séré Moussa Doumbouya
Bangoura, Daouda
Bayo, Fodé Moriba
Ibrahima Sory 2 Condé
Kalo, Mory Diané
Piech, Chris
Manning, Christopher
description Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2881541704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2881541704</sourcerecordid><originalsourceid>FETCH-proquest_journals_28815417043</originalsourceid><addsrcrecordid>eNqNi8EKgkAUAJcgSMp_eNA1Yd3VlE6RFF3qEHuXh62kLftsn_5_BX1ApznMzExESus0KTOlFiJm7qWUaluoPNeR2F-weXTeggno2eHYkYeWAlyftAND5HgDFYWBAgL6OxyQrfsON8uTG3kl5i06tvGPS7E-HU11ToZAr8nyWPc0Bf9RtSrLNM_SQmb6v-oN4rI4XQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2881541704</pqid></control><display><type>article</type><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><source>Freely Accessible Journals</source><creator>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</creator><creatorcontrib>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</creatorcontrib><description>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bilingualism ; Languages ; Machine translation ; Quality control</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Moussa Koulako Bala Doumbouya</creatorcontrib><creatorcontrib>Baba, Mamadi Diané</creatorcontrib><creatorcontrib>Cissé, Solo Farabado</creatorcontrib><creatorcontrib>Diané, Djibrila</creatorcontrib><creatorcontrib>Sow, Abdoulaye</creatorcontrib><creatorcontrib>Séré Moussa Doumbouya</creatorcontrib><creatorcontrib>Bangoura, Daouda</creatorcontrib><creatorcontrib>Bayo, Fodé Moriba</creatorcontrib><creatorcontrib>Ibrahima Sory 2 Condé</creatorcontrib><creatorcontrib>Kalo, Mory Diané</creatorcontrib><creatorcontrib>Piech, Chris</creatorcontrib><creatorcontrib>Manning, Christopher</creatorcontrib><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><title>arXiv.org</title><description>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</description><subject>Bilingualism</subject><subject>Languages</subject><subject>Machine translation</subject><subject>Quality control</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi8EKgkAUAJcgSMp_eNA1Yd3VlE6RFF3qEHuXh62kLftsn_5_BX1ApznMzExESus0KTOlFiJm7qWUaluoPNeR2F-weXTeggno2eHYkYeWAlyftAND5HgDFYWBAgL6OxyQrfsON8uTG3kl5i06tvGPS7E-HU11ToZAr8nyWPc0Bf9RtSrLNM_SQmb6v-oN4rI4XQ</recordid><startdate>20231115</startdate><enddate>20231115</enddate><creator>Moussa Koulako Bala Doumbouya</creator><creator>Baba, Mamadi Diané</creator><creator>Cissé, Solo Farabado</creator><creator>Diané, Djibrila</creator><creator>Sow, Abdoulaye</creator><creator>Séré Moussa Doumbouya</creator><creator>Bangoura, Daouda</creator><creator>Bayo, Fodé Moriba</creator><creator>Ibrahima Sory 2 Condé</creator><creator>Kalo, Mory Diané</creator><creator>Piech, Chris</creator><creator>Manning, Christopher</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231115</creationdate><title>Machine Translation for Nko: Tools, Corpora and Baseline Results</title><author>Moussa Koulako Bala Doumbouya ; Baba, Mamadi Diané ; Cissé, Solo Farabado ; Diané, Djibrila ; Sow, Abdoulaye ; Séré Moussa Doumbouya ; Bangoura, Daouda ; Bayo, Fodé Moriba ; Ibrahima Sory 2 Condé ; Kalo, Mory Diané ; Piech, Chris ; Manning, Christopher</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28815417043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bilingualism</topic><topic>Languages</topic><topic>Machine translation</topic><topic>Quality control</topic><toplevel>online_resources</toplevel><creatorcontrib>Moussa Koulako Bala Doumbouya</creatorcontrib><creatorcontrib>Baba, Mamadi Diané</creatorcontrib><creatorcontrib>Cissé, Solo Farabado</creatorcontrib><creatorcontrib>Diané, Djibrila</creatorcontrib><creatorcontrib>Sow, Abdoulaye</creatorcontrib><creatorcontrib>Séré Moussa Doumbouya</creatorcontrib><creatorcontrib>Bangoura, Daouda</creatorcontrib><creatorcontrib>Bayo, Fodé Moriba</creatorcontrib><creatorcontrib>Ibrahima Sory 2 Condé</creatorcontrib><creatorcontrib>Kalo, Mory Diané</creatorcontrib><creatorcontrib>Piech, Chris</creatorcontrib><creatorcontrib>Manning, Christopher</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Moussa Koulako Bala Doumbouya</au><au>Baba, Mamadi Diané</au><au>Cissé, Solo Farabado</au><au>Diané, Djibrila</au><au>Sow, Abdoulaye</au><au>Séré Moussa Doumbouya</au><au>Bangoura, Daouda</au><au>Bayo, Fodé Moriba</au><au>Ibrahima Sory 2 Condé</au><au>Kalo, Mory Diané</au><au>Piech, Chris</au><au>Manning, Christopher</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Machine Translation for Nko: Tools, Corpora and Baseline Results</atitle><jtitle>arXiv.org</jtitle><date>2023-11-15</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Fria\(\parallel\)el: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_2881541704
source Freely Accessible Journals
subjects Bilingualism
Languages
Machine translation
Quality control
title Machine Translation for Nko: Tools, Corpora and Baseline Results
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T13%3A51%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Machine%20Translation%20for%20Nko:%20Tools,%20Corpora%20and%20Baseline%20Results&rft.jtitle=arXiv.org&rft.au=Moussa%20Koulako%20Bala%20Doumbouya&rft.date=2023-11-15&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2881541704%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2881541704&rft_id=info:pmid/&rfr_iscdi=true