Spam: A Big Data Challenge
Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam filtering solution is required. Scalable solutions available for...
Gespeichert in:
Veröffentlicht in: | International journal of advanced research in computer science 2017-01, Vol.8 (1) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam filtering solution is required. Scalable solutions available for machine learning and statistical studies can be used to implement a scalable solution for spam filtering also. From Big data Analytics domain, Mahout is an open source library from Apache for building scalable solutions in machine learning. This paper uses mahout framework to analyse the time and accuracy efficiencies of the results of two Naïve Bayes classification algorithms. |
---|---|
ISSN: | 0976-5697 |