software gene-based anti-obfuscation binary code clone detection method

The invention discloses a software gene-based anti-obfuscation binary code clone detection method, which comprises the following steps of: compiling a source program by using an O-LLVM compiler to obtain an assembly program, extracting CFG from the assembly program, dividing the assembly program int...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SONG ZHIHUI, SHAN ZHENG, TANG KE, QIAO MENG, LIU FUDONG, XIONG QIBING, ZHANG CHUNYAN, XU LIANQIU, HUANG YIZHAO, GUI HAIREN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a software gene-based anti-obfuscation binary code clone detection method, which comprises the following steps of: compiling a source program by using an O-LLVM compiler to obtain an assembly program, extracting CFG from the assembly program, dividing the assembly program into a plurality of software gene blocks by using the concept of a software gene; after each node in the CFG is segmented into independent software gene blocks for instruction normalization, utilizing a random walk algorithm to traverse the nodes in the CFG to obtain a software gene sequence as a training set, then applying a machine learning algorithm to train the training set, wherein the word embedding is mainly performed on an assembly instruction by adopting a natural language processing method (Word2Vec); then adopting Doc2Vec to carry out semantic embedding on a software gene sequence, extracting the semantic information of a function, and finally obtaining a good effect in anti-confusion code clone detection b