Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic
In this paper, we present an analysis of the first Ethiopic Twitter Dataset for the Amharic language targeted for recognizing abusive speech. The dataset has been collected since 2014 that is written in Fidel script. Since several languages can be written using the Fidel script, we have used the exi...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we present an analysis of the first Ethiopic Twitter Dataset
for the Amharic language targeted for recognizing abusive speech. The dataset
has been collected since 2014 that is written in Fidel script. Since several
languages can be written using the Fidel script, we have used the existing
Amharic, Tigrinya and Ge'ez corpora to retain only the Amharic tweets. We have
analyzed the tweets for abusive speech content with the following targets:
Analyze the distribution and tendency of abusive speech content over time and
compare the abusive speech content between a Twitter and general reference
Amharic corpus. |
---|---|
DOI: | 10.48550/arxiv.1912.04419 |