software gene-based anti-obfuscation binary code clone detection method
The invention discloses a software gene-based anti-obfuscation binary code clone detection method, which comprises the following steps of: compiling a source program by using an O-LLVM compiler to obtain an assembly program, extracting CFG from the assembly program, dividing the assembly program int...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a software gene-based anti-obfuscation binary code clone detection method, which comprises the following steps of: compiling a source program by using an O-LLVM compiler to obtain an assembly program, extracting CFG from the assembly program, dividing the assembly program into a plurality of software gene blocks by using the concept of a software gene; after each node in the CFG is segmented into independent software gene blocks for instruction normalization, utilizing a random walk algorithm to traverse the nodes in the CFG to obtain a software gene sequence as a training set, then applying a machine learning algorithm to train the training set, wherein the word embedding is mainly performed on an assembly instruction by adopting a natural language processing method (Word2Vec); then adopting Doc2Vec to carry out semantic embedding on a software gene sequence, extracting the semantic information of a function, and finally obtaining a good effect in anti-confusion code clone detection b |
---|