BinEnhance: An Enhancement Framework Based on External Environment Semantics for Binary Code Search
Binary code search plays a crucial role in applications like software reuse detection. Currently, existing models are typically based on either internal code semantics or a combination of function call graphs (CG) and internal code semantics. However, these models have limitations. Internal code sem...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Binary code search plays a crucial role in applications like software reuse
detection. Currently, existing models are typically based on either internal
code semantics or a combination of function call graphs (CG) and internal code
semantics. However, these models have limitations. Internal code semantic
models only consider the semantics within the function, ignoring the
inter-function semantics, making it difficult to handle situations such as
function inlining. The combination of CG and internal code semantics is
insufficient for addressing complex real-world scenarios. To address these
limitations, we propose BinEnhance, a novel framework designed to leverage the
inter-function semantics to enhance the expression of internal code semantics
for binary code search. Specifically, BinEnhance constructs an External
Environment Semantic Graph (EESG), which establishes a stable and analogous
external environment for homologous functions by using different inter-function
semantic relations (e.g., call, location, data-co-use). After the construction
of EESG, we utilize the embeddings generated by existing internal code semantic
models to initialize nodes of EESG. Finally, we design a Semantic Enhancement
Model (SEM) that uses Relational Graph Convolutional Networks (RGCNs) and a
residual block to learn valuable external semantics on the EESG for generating
the enhanced semantics embedding. In addition, BinEnhance utilizes data feature
similarity to refine the cosine similarity of semantic embeddings. We conduct
experiments under six different tasks (e.g., under function inlining scenario)
and the results illustrate the performance and robustness of BinEnhance. The
application of BinEnhance to HermesSim, Asm2vec, TREX, Gemini, and Asteria on
two public datasets results in an improvement of Mean Average Precision (MAP)
from 53.6% to 69.7%. Moreover, the efficiency increases fourfold. |
---|---|
DOI: | 10.48550/arxiv.2411.01102 |