Reliable Malware Analysis and Detection using Topology Data Analysis
Increasingly, malwares are becoming complex and they are spreading on networks targeting different infrastructures and personal-end devices to collect, modify, and destroy victim information. Malware behaviors are polymorphic, metamorphic, persistent, able to hide to bypass detectors and adapt to ne...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Increasingly, malwares are becoming complex and they are spreading on
networks targeting different infrastructures and personal-end devices to
collect, modify, and destroy victim information. Malware behaviors are
polymorphic, metamorphic, persistent, able to hide to bypass detectors and
adapt to new environments, and even leverage machine learning techniques to
better damage targets. Thus, it makes them difficult to analyze and detect with
traditional endpoint detection and response, intrusion detection and prevention
systems. To defend against malwares, recent work has proposed different
techniques based on signatures and machine learning. In this paper, we propose
to use an algebraic topological approach called topological-based data analysis
(TDA) to efficiently analyze and detect complex malware patterns. Next, we
compare the different TDA techniques (i.e., persistence homology, tomato, TDA
Mapper) and existing techniques (i.e., PCA, UMAP, t-SNE) using different
classifiers including random forest, decision tree, xgboost, and lightgbm. We
also propose some recommendations to deploy the best-identified models for
malware detection at scale. Results show that TDA Mapper (combined with PCA) is
better for clustering and for identifying hidden relationships between malware
clusters compared to PCA. Persistent diagrams are better to identify
overlapping malware clusters with low execution time compared to UMAP and
t-SNE. For malware detection, malware analysts can use Random Forest and
Decision Tree with t-SNE and Persistent Diagram to achieve better performance
and robustness on noised data. |
---|---|
DOI: | 10.48550/arxiv.2211.01535 |