Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for tra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Bin Moin, Mukaffi
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows: Specifics of the Core Data: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) English: Train 1875, Test 375, Validation 250 (Total 2500) Specifics of the Regional Data: —--- Chittagong: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Noakhali: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Sylhet: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Barishal: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500) Mymensingh: —--- Bangla: Train 1875, Test 375, Validation 250 (Total 2500) Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
DOI:10.17632/bj5jgk878b