Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions

Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer science review 2020-11, Vol.38, p.100311, Article 100311
Hauptverfasser:	Ayo, Femi Emmanuel, Folorunso, Olusegun, Ibharalu, Friday Thomas, Osinuga, Idowu Ademola
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayesian network Combinatorial algorithm Detection Fuzzy logic Hate speech Twitter data stream
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research focus, but little research efforts have been devoted to the design of a generic metadata architecture, threshold settings and fragmentation issues. Hate speech classification techniques presented in literature address some of the challenges inherent in Twitter data streams but limited in the aforementioned issues. This study presented collection of hate speech benchmarks datasets suitable for testing the efficiency of classification models. This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification. The summary of performance evaluation for the surveyed machine learning methods was also presented. The study also presented a generic metadata architecture for hate speech classification in Twitter to tackle issues with Twitter data streams. The developed generic metadata architecture was observed to performed better across all evaluation metrics for hate speech detection having 0.95, 0.93, 0.92 and 0.93 for accuracy, precision, recall and F1-score respectively, when compared to similar methods. Similarly, the developed generic metadata architecture for hate speech sentiment classification performed better with F1-score of 91.5% compared to related methods. The developed generic metadata architecture also indicates a more perfect test having an AUC of 0.97, when compared to similar methods. The statistical validation of results points out the efficiency of the developed system. Finally, the results also showed that the developed system is very good for automatic topic detection and categorization. •This study presented collection of hate speech benchmarks datasets.•This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification.•The summary of performance evaluation for the surveyed machine learning methods was also presented.•The study also presented a generic metadata architecture for hate speech classification in Twitter data.•The results showed that the developed generic metadata model is good for topic detection and categorization.
ISSN:	1574-0137 1876-7745
DOI:	10.1016/j.cosrev.2020.100311