Handling nominal features in anomaly intrusion detection problems

Computer network data stream used in intrusion detection usually involve many data types. A common data type is that of symbolic or nominal features. Whether being coded into numerical values or not, nominal features need to be treated differently from numeric features. This paper studies the effect...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mei-Ling Shyu, Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Shu-Ching Chen, LiWu Chang, Goldring, T.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Computer network data stream used in intrusion detection usually involve many data types. A common data type is that of symbolic or nominal features. Whether being coded into numerical values or not, nominal features need to be treated differently from numeric features. This paper studies the effectiveness of two approaches in handling nominal features: a simple coding scheme via the use of indicator variables and a scaling method based on multiple correspondence analysis (MCA). In particular, we apply the techniques with two anomaly detection methods: the principal component classifier (PCC) and the Canberra metric. The experiments with KDD 1999 data demonstrate that MCA works better than the indicator variable approach for both detection methods with the PCC coming much ahead of the Canberra metric.
ISSN:1097-8585
2332-6476
DOI:10.1109/RIDE.2005.10