ENGLISH-AKUAPEM TWI PARALLEL CORPUS

This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Azunre, Paul, Adu-Gyamfi, Lawrence, Appiah, Esther, Akwerh, Felix, Osei, Salomey, Amoaba, Cynthia, Addo, Salomey Afua, Buabeng-Munkoh, Edwin, Boateng, Nana, Adjei, Franklin, Adabankah, Bernard
Format: Dataset
Sprache:twi
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Azunre, Paul
Adu-Gyamfi, Lawrence
Appiah, Esther
Akwerh, Felix
Osei, Salomey
Amoaba, Cynthia
Addo, Salomey Afua
Buabeng-Munkoh, Edwin
Boateng, Nana
Adjei, Franklin
Adabankah, Bernard
description This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa
doi_str_mv 10.5281/zenodo.4430880
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_4430880</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_4430880</sourcerecordid><originalsourceid>FETCH-LOGICAL-d790-88e4cb0c08f928ac530d39cfbf60fbe4ceaace0cc7f0319bd21fa6d8e9723e983</originalsourceid><addsrcrecordid>eNotzjsLwjAYheEsDqKuzgXn1i9Na7-MoVQtRlu0xTGkuYDgDXXRX6-i0xleODyEjClEaYx0-nLni71EScIAEfpkUmwWstwtQ7FqRV2sg2ZfBrXYCikLGeTVtm53Q9Lz-nh3o_8OSDMvmnwZympR5kKGNuMQIrrEdGAAPY9Rm5SBZdz4zs_Ad5_mtDYOjMk8MMo7G1OvZxYdz2LmOLIBiX63Vj-0OTycut4OJ317Kgrqi1c_vPrj2Ru6yzuS</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><source>DataCite</source><creator>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</creator><creatorcontrib>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</creatorcontrib><description>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</description><identifier>DOI: 10.5281/zenodo.4430880</identifier><language>twi</language><publisher>Zenodo</publisher><subject>Akuapem Twi ; Ghana ; Machine Translation</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1888</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.4430880$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Azunre, Paul</creatorcontrib><creatorcontrib>Adu-Gyamfi, Lawrence</creatorcontrib><creatorcontrib>Appiah, Esther</creatorcontrib><creatorcontrib>Akwerh, Felix</creatorcontrib><creatorcontrib>Osei, Salomey</creatorcontrib><creatorcontrib>Amoaba, Cynthia</creatorcontrib><creatorcontrib>Addo, Salomey Afua</creatorcontrib><creatorcontrib>Buabeng-Munkoh, Edwin</creatorcontrib><creatorcontrib>Boateng, Nana</creatorcontrib><creatorcontrib>Adjei, Franklin</creatorcontrib><creatorcontrib>Adabankah, Bernard</creatorcontrib><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><description>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</description><subject>Akuapem Twi</subject><subject>Ghana</subject><subject>Machine Translation</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2021</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNotzjsLwjAYheEsDqKuzgXn1i9Na7-MoVQtRlu0xTGkuYDgDXXRX6-i0xleODyEjClEaYx0-nLni71EScIAEfpkUmwWstwtQ7FqRV2sg2ZfBrXYCikLGeTVtm53Q9Lz-nh3o_8OSDMvmnwZympR5kKGNuMQIrrEdGAAPY9Rm5SBZdz4zs_Ad5_mtDYOjMk8MMo7G1OvZxYdz2LmOLIBiX63Vj-0OTycut4OJ317Kgrqi1c_vPrj2Ru6yzuS</recordid><startdate>20210110</startdate><enddate>20210110</enddate><creator>Azunre, Paul</creator><creator>Adu-Gyamfi, Lawrence</creator><creator>Appiah, Esther</creator><creator>Akwerh, Felix</creator><creator>Osei, Salomey</creator><creator>Amoaba, Cynthia</creator><creator>Addo, Salomey Afua</creator><creator>Buabeng-Munkoh, Edwin</creator><creator>Boateng, Nana</creator><creator>Adjei, Franklin</creator><creator>Adabankah, Bernard</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20210110</creationdate><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><author>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d790-88e4cb0c08f928ac530d39cfbf60fbe4ceaace0cc7f0319bd21fa6d8e9723e983</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>twi</language><creationdate>2021</creationdate><topic>Akuapem Twi</topic><topic>Ghana</topic><topic>Machine Translation</topic><toplevel>online_resources</toplevel><creatorcontrib>Azunre, Paul</creatorcontrib><creatorcontrib>Adu-Gyamfi, Lawrence</creatorcontrib><creatorcontrib>Appiah, Esther</creatorcontrib><creatorcontrib>Akwerh, Felix</creatorcontrib><creatorcontrib>Osei, Salomey</creatorcontrib><creatorcontrib>Amoaba, Cynthia</creatorcontrib><creatorcontrib>Addo, Salomey Afua</creatorcontrib><creatorcontrib>Buabeng-Munkoh, Edwin</creatorcontrib><creatorcontrib>Boateng, Nana</creatorcontrib><creatorcontrib>Adjei, Franklin</creatorcontrib><creatorcontrib>Adabankah, Bernard</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Azunre, Paul</au><au>Adu-Gyamfi, Lawrence</au><au>Appiah, Esther</au><au>Akwerh, Felix</au><au>Osei, Salomey</au><au>Amoaba, Cynthia</au><au>Addo, Salomey Afua</au><au>Buabeng-Munkoh, Edwin</au><au>Boateng, Nana</au><au>Adjei, Franklin</au><au>Adabankah, Bernard</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><date>2021-01-10</date><risdate>2021</risdate><abstract>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.4430880</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.5281/zenodo.4430880
ispartof
issn
language twi
recordid cdi_datacite_primary_10_5281_zenodo_4430880
source DataCite
subjects Akuapem Twi
Ghana
Machine Translation
title ENGLISH-AKUAPEM TWI PARALLEL CORPUS
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T07%3A16%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Azunre,%20Paul&rft.date=2021-01-10&rft_id=info:doi/10.5281/zenodo.4430880&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_4430880%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true