Anomaly Detection in Text Data Sets using Character-Level Representation

This paper proposes a character-level representation of unsupervised text data sets for anomaly detection problems. An empirical examination of the character-level text representation was conducted to demonstrate the ability to separate outlying and normal records using an ensemble of multiple class...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2021-04, Vol.1880 (1), p.12028
Hauptverfasser: Mohaghegh, Mahsa, Abdurakhmanov, Amantay
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper proposes a character-level representation of unsupervised text data sets for anomaly detection problems. An empirical examination of the character-level text representation was conducted to demonstrate the ability to separate outlying and normal records using an ensemble of multiple classic numerical anomaly classifiers. Experimental results obtained on two different data sets confirmed the applicability of the developed unsupervised model to detect outlying instances in various real-world scenarios, providing the opportunity to quickly assess a large amount of textual data in terms of information consistency and conformity without knowledge of the data content itself.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1880/1/012028