HOCON34k: A Corpus of Hate speech in Online Comments from German Newspapers
We have compiled a dataset containing 34,223 comments in German, authored by users from online-platforms associated with public discourse in German newspapers. Each comment was annotated for hate speech and the adequacy of contextual information by a group of 29 volunteers, using a binary annotation...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Dataset |
Sprache: | ger |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We have compiled a dataset containing 34,223 comments in German, authored by users from online-platforms associated with public discourse in German newspapers. Each comment was annotated for hate speech and the adequacy of contextual information by a group of 29 volunteers, using a binary annotation approach. The inter-rater reliability for hate speech is 0.4428 across all annotators and increases to 0.6078 when considering an optimized subset of 12 annotators, as measured by Fleiss’ Kappa. Additionally, we present a baseline text classification using BERT, achieving an MCC-score up to 0.32 and an F2-score up to 0.64 in our initial experiment on this new corpus. The data set, named HOCON34k, comprising German hate speech comments from newspapers, is publicly available for research purposes. |
---|---|
DOI: | 10.5281/zenodo.12665947 |