Amharic text dataset extracted from memes for hate speech detection or classification
the dataset is collected from social media such as facebook and telegram. the dataset is further processed. the collection are orginal_cleaned: this dataset is neither stemed nor stopword are remove: stopword_removed: in this dataset stopwords are removed but not stemmed and in stemed datset is stem...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | the dataset is collected from social media such as facebook and telegram.
the dataset is further processed. the collection are orginal_cleaned: this dataset is neither stemed nor stopword are remove: stopword_removed: in this dataset stopwords are removed but not stemmed and in stemed datset is stemmed and stopwords are removed. stemming is done using hornmorpho developed by Michael Gesser( available at https://github.com/hltdi/HornMorpho)
all datasets are normalized and free from noise such as punctuation marks and emojs. |
---|---|
DOI: | 10.17632/gw3fdtw5v7.2 |