SENSE: An unsupervised semantic learning model for cross-platform vulnerability search

Binary Similarity Analysis (BSA) emerges as a vital approach for identifying homologous vulnerabilities. However, it is constrained by semantic incompleteness, structural differences, and false positives arising from variations in compilation environments. In this paper, we propose a novel Unsupervi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & security 2023-12, Vol.135, p.103500, Article 103500
Hauptverfasser: Li, Munan, Liu, Hongbo, Jiang, Xiangdong, Zhao, Zheng, Zhang, Tianhao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Binary Similarity Analysis (BSA) emerges as a vital approach for identifying homologous vulnerabilities. However, it is constrained by semantic incompleteness, structural differences, and false positives arising from variations in compilation environments. In this paper, we propose a novel Unsupervised Semantic Learning Model named SENSE for cross-platform vulnerability search. The model comprises two main components: semantic learner and graph learner. The semantic learner is pre-trained with a mask language task on a well-normalized binary corpus, enabling it to capture contextual semantic relations and generate block embedding that effectively encode the semantic features. In the graph learner, a gated graph neural network with a self-gating layer is adopted to eliminate redundant features and an adversarial loss is incorporated to enhance the robustness of function embedding across different compiler environments. Finally, SENSE is trained in an unsupervised manner using a batch-wise sampling strategy along with maximum mutual information loss. This encourages semantically similar functions to exhibit tighter embedding representations, thereby reducing false positives and improving search efficiency. Through extensive experiments, we have demonstrated that SENSE outperforms state-of-the-art methods in terms of binary search accuracy. Our results also reveal that SENSE is capable of generating robust function embedding that mitigate the differences arising from diverse architecture and optimization options.
ISSN:0167-4048
DOI:10.1016/j.cose.2023.103500