Classifying Swahili Smishing Attacks for Mobile Money Users: A Machine-Learning Approach

Due to the massive adoption of mobile money in Sub-Saharan countries, the global transaction value of mobile money exceeded \ 2 billion in 2021. Projections show transaction values will exceed \ 3 billion by the end of 2022, and Sub-Saharan Africa contributes half of the daily transactions. SMS (S...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2022, Vol.10, p.83061-83074
Hauptverfasser:	Mambina, Iddi S., Ndibwile, Jema D., Michael, Kisangiri F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Africa African languages Classification Classifiers Electronic commerce Languages Machine learning Malware Messages mobile money Model accuracy Natural language processing Performance evaluation Phishing Short message service smishing SMS social engineering Sociology Statistics Sub-Saharan Africa Text messaging Unsolicited e-mail
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Due to the massive adoption of mobile money in Sub-Saharan countries, the global transaction value of mobile money exceeded \ 2 billion in 2021. Projections show transaction values will exceed \ 3 billion by the end of 2022, and Sub-Saharan Africa contributes half of the daily transactions. SMS (Short Message Service) phishing cost corporations and individuals millions of dollars annually. Spammers use Smishing (SMS Phishing) messages to trick a mobile money user into sending electronic cash to an unintended mobile wallet. Though Smishing is an incarnation of phishing, they differ in the information available and attack strategy. As a result, detecting Smishing becomes difficult. Numerous models and techniques to detect Smishing attacks have been introduced for high-resource languages, yet few target low-resource languages such as Swahili. This study proposes a machine-learning based model to classify Swahili Smishing text messages targeting mobile money users. Experimental results show a hybrid model of Extratree classifier feature selection and Random Forest using TFIDF (Term Frequency Inverse Document Frequency) vectorization yields the best model with an accuracy score of 99.86%. Results are measured against a baseline Multinomial Naïve-Bayes model. In addition, comparison with a set of other classic classifiers is also done. The model returns the lowest false positive and false negative of 2 and 4, respectively, with a Log-Loss of 0.04. A Swahili dataset with 32259 messages is used for performance evaluation.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3196464