A case study on phishing detection with a machine learning net

Phishing attacks aims to steal sensitive information and, unfortunately, are becoming a common practice on the web. Email phishing is one of the most common types of attacks on the web and can have a big impact on individuals and enterprises. There is still a gap in prevention when it comes to detec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of data science and analytics 2024-06
Hauptverfasser: Bezerra, Ana, Pereira, Ivo, Rebelo, Miguel Ângelo, Coelho, Duarte, Oliveira, Daniel Alves de, Costa, Joaquim F. Pinto, Cruz, Ricardo P. M.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Phishing attacks aims to steal sensitive information and, unfortunately, are becoming a common practice on the web. Email phishing is one of the most common types of attacks on the web and can have a big impact on individuals and enterprises. There is still a gap in prevention when it comes to detecting phishing emails, as new attacks are usually not detected. The goal of this work was to develop a model capable of identifying phishing emails based on machine learning approaches. The work was performed in collaboration with E-goi, a multi-channel marketing automation company. The data consisted of emails collected from the E-goi servers in the electronic mail format. The problem consisted of a classification problem with unbalanced classes, with the minority class corresponding to the phishing emails and having less than 1% of the total emails. Several models were evaluated after careful data selection and feature extraction based on the email content and the literature regarding these types of problems. Due to the imbalance present in the data, several sampling methods based on under-sampling techniques were tested to see their impact on the model’s ability to detect phishing emails. The final model consisted of a neural network able to detect more than 80% of phishing emails without compromising the remaining emails sent by E-goi clients.
ISSN:2364-415X
2364-4168
DOI:10.1007/s41060-024-00579-w