Detecting abusive language using character N-gram features

Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-gener...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mehdad, Yashar, Tetreault, Joel
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-generated model is applied to the feature values for the plurality of features to generate a score for the set of text, where the model includes a plurality of weights, each of the weights corresponding to one of the features. It may then be determined whether the set of text includes abusive language based, at least in part, on the score.