Improved spam e-mail filtering based on committee machines and information theoretic feature extraction

A novel approach for spam e-mail filtering is herein considered based on the committee machines neural network models and on information theoretic feature extraction. An extensive experimental study is organized, the most extensive so far in the literature, based on widely accepted benchmarking e-ma...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zorkadis, V., Panayotou, M., Karras, D.A.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Boosting Classification tree analysis Electronic mail Feature extraction Information filtering Information filters Multi-layer neural network Neural networks Nonlinear filters Regression tree analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A novel approach for spam e-mail filtering is herein considered based on the committee machines neural network models and on information theoretic feature extraction. An extensive experimental study is organized, the most extensive so far in the literature, based on widely accepted benchmarking e-mail data sets, comparing the proposed methodology with the naive Bayes spam filter as well as with the boosting tree methodology, the linear models based classification (classification via regression) and the nonlinear models based classification using simple neural network models, including multilayer perceptrons. Moreover, several feature extraction approaches based on information theory are evaluated. It is shown that the committee machines mail categorization performance is compared very favorably to the other rival methods performance, including the Bayes spam filter which is the most widely used approach in the e-mail services market. It is, also, found that the proposed information theoretic Boolean features present a remarkably high spam categorization performance compared to their analog counterparts performance.
ISSN:	2161-4393 2161-4407
DOI:	10.1109/IJCNN.2005.1555826