ENGLISH-AKUAPEM TWI PARALLEL CORPUS

This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Azunre, Paul, Adu-Gyamfi, Lawrence, Appiah, Esther, Akwerh, Felix, Osei, Salomey, Amoaba, Cynthia, Addo, Salomey Afua, Buabeng-Munkoh, Edwin, Boateng, Nana, Adjei, Franklin, Adabankah, Bernard
Format:	Dataset
Sprache:	twi
Schlagworte:	Akuapem Twi Ghana Machine Translation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Azunre, Paul Adu-Gyamfi, Lawrence Appiah, Esther Akwerh, Felix Osei, Salomey Amoaba, Cynthia Addo, Salomey Afua Buabeng-Munkoh, Edwin Boateng, Nana Adjei, Franklin Adabankah, Bernard
description	This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa
doi_str_mv	10.5281/zenodo.4430880
format	Dataset
fullrecord	<record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_4430880</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_4430880</sourcerecordid><originalsourceid>FETCH-LOGICAL-d790-88e4cb0c08f928ac530d39cfbf60fbe4ceaace0cc7f0319bd21fa6d8e9723e983</originalsourceid><addsrcrecordid>eNotzjsLwjAYheEsDqKuzgXn1i9Na7-MoVQtRlu0xTGkuYDgDXXRX6-i0xleODyEjClEaYx0-nLni71EScIAEfpkUmwWstwtQ7FqRV2sg2ZfBrXYCikLGeTVtm53Q9Lz-nh3o_8OSDMvmnwZympR5kKGNuMQIrrEdGAAPY9Rm5SBZdz4zs_Ad5_mtDYOjMk8MMo7G1OvZxYdz2LmOLIBiX63Vj-0OTycut4OJ317Kgrqi1c_vPrj2Ru6yzuS</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><source>DataCite</source><creator>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</creator><creatorcontrib>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</creatorcontrib><description>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</description><identifier>DOI: 10.5281/zenodo.4430880</identifier><language>twi</language><publisher>Zenodo</publisher><subject>Akuapem Twi ; Ghana ; Machine Translation</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1888</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.4430880$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Azunre, Paul</creatorcontrib><creatorcontrib>Adu-Gyamfi, Lawrence</creatorcontrib><creatorcontrib>Appiah, Esther</creatorcontrib><creatorcontrib>Akwerh, Felix</creatorcontrib><creatorcontrib>Osei, Salomey</creatorcontrib><creatorcontrib>Amoaba, Cynthia</creatorcontrib><creatorcontrib>Addo, Salomey Afua</creatorcontrib><creatorcontrib>Buabeng-Munkoh, Edwin</creatorcontrib><creatorcontrib>Boateng, Nana</creatorcontrib><creatorcontrib>Adjei, Franklin</creatorcontrib><creatorcontrib>Adabankah, Bernard</creatorcontrib><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><description>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</description><subject>Akuapem Twi</subject><subject>Ghana</subject><subject>Machine Translation</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2021</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNotzjsLwjAYheEsDqKuzgXn1i9Na7-MoVQtRlu0xTGkuYDgDXXRX6-i0xleODyEjClEaYx0-nLni71EScIAEfpkUmwWstwtQ7FqRV2sg2ZfBrXYCikLGeTVtm53Q9Lz-nh3o_8OSDMvmnwZympR5kKGNuMQIrrEdGAAPY9Rm5SBZdz4zs_Ad5_mtDYOjMk8MMo7G1OvZxYdz2LmOLIBiX63Vj-0OTycut4OJ317Kgrqi1c_vPrj2Ru6yzuS</recordid><startdate>20210110</startdate><enddate>20210110</enddate><creator>Azunre, Paul</creator><creator>Adu-Gyamfi, Lawrence</creator><creator>Appiah, Esther</creator><creator>Akwerh, Felix</creator><creator>Osei, Salomey</creator><creator>Amoaba, Cynthia</creator><creator>Addo, Salomey Afua</creator><creator>Buabeng-Munkoh, Edwin</creator><creator>Boateng, Nana</creator><creator>Adjei, Franklin</creator><creator>Adabankah, Bernard</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20210110</creationdate><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><author>Azunre, Paul ; Adu-Gyamfi, Lawrence ; Appiah, Esther ; Akwerh, Felix ; Osei, Salomey ; Amoaba, Cynthia ; Addo, Salomey Afua ; Buabeng-Munkoh, Edwin ; Boateng, Nana ; Adjei, Franklin ; Adabankah, Bernard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d790-88e4cb0c08f928ac530d39cfbf60fbe4ceaace0cc7f0319bd21fa6d8e9723e983</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>twi</language><creationdate>2021</creationdate><topic>Akuapem Twi</topic><topic>Ghana</topic><topic>Machine Translation</topic><toplevel>online_resources</toplevel><creatorcontrib>Azunre, Paul</creatorcontrib><creatorcontrib>Adu-Gyamfi, Lawrence</creatorcontrib><creatorcontrib>Appiah, Esther</creatorcontrib><creatorcontrib>Akwerh, Felix</creatorcontrib><creatorcontrib>Osei, Salomey</creatorcontrib><creatorcontrib>Amoaba, Cynthia</creatorcontrib><creatorcontrib>Addo, Salomey Afua</creatorcontrib><creatorcontrib>Buabeng-Munkoh, Edwin</creatorcontrib><creatorcontrib>Boateng, Nana</creatorcontrib><creatorcontrib>Adjei, Franklin</creatorcontrib><creatorcontrib>Adabankah, Bernard</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Azunre, Paul</au><au>Adu-Gyamfi, Lawrence</au><au>Appiah, Esther</au><au>Akwerh, Felix</au><au>Osei, Salomey</au><au>Amoaba, Cynthia</au><au>Addo, Salomey Afua</au><au>Buabeng-Munkoh, Edwin</au><au>Boateng, Nana</au><au>Adjei, Franklin</au><au>Adabankah, Bernard</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>ENGLISH-AKUAPEM TWI PARALLEL CORPUS</title><date>2021-01-10</date><risdate>2021</risdate><abstract>This dataset (verified_data.csv) is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi. The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. Another potential application is training unsupervised embeddings for the Akuapem Twi language. In addition a higher quality 697 crowdsourced sentences (crowdsourced_data.csv) are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. Acknowledgement: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.4430880</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.5281/zenodo.4430880
ispartof
issn
language	twi
recordid	cdi_datacite_primary_10_5281_zenodo_4430880
source	DataCite
subjects	Akuapem Twi Ghana Machine Translation
title	ENGLISH-AKUAPEM TWI PARALLEL CORPUS
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T07%3A16%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Azunre,%20Paul&rft.date=2021-01-10&rft_id=info:doi/10.5281/zenodo.4430880&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_4430880%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true