Mind Your Language: Abuse and Offense Detection for Code-Switched Languages
In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In multilingual societies like the Indian subcontinent, use of code-switched
languages is much popular and convenient for the users. In this paper, we study
offense and abuse detection in the code-switched pair of Hindi and English
(i.e. Hinglish), the pair that is the most spoken. The task is made difficult
due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish
language. We apply transfer learning and make a LSTM based model for hate
speech classification. This model surpasses the performance shown by the
current best models to establish itself as the state-of-the-art in the
unexplored domain of Hinglish offensive text classification.We also release our
model and the embeddings trained for research purposes |
---|---|
DOI: | 10.48550/arxiv.1809.08652 |