Detecting abusive language using character N-gram features
Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-gener...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Methods and apparatus for detecting abusive language are disclosed. In one embodiment, a set of character N-grams is ascertained for a set of text. Feature values for a plurality of features of the set of text are determined, based, at least in part, on the set of character N-grams. A computer-generated model is applied to the feature values for the plurality of features to generate a score for the set of text, where the model includes a plurality of weights, each of the weights corresponding to one of the features. It may then be determined whether the set of text includes abusive language based, at least in part, on the score. |
---|