Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for tra...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows:
Specifics of the Core Data:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
English: Train 1875, Test 375, Validation 250 (Total 2500)
Specifics of the Regional Data:
—---
Chittagong:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Noakhali:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Sylhet:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Barishal:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500)
Mymensingh:
—---
Bangla: Train 1875, Test 375, Validation 250 (Total 2500)
Banglish: Train 1875, Test 375, Validation 250 (Total 2500) |
---|---|
DOI: | 10.17632/bj5jgk878b |