Bin2vec: Learning Representations of Binary Executable Programs for Security Tasks
Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time-consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary program...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tackling binary program analysis problems has traditionally implied manually
defining rules and heuristics, a tedious and time-consuming task for human
analysts. In order to improve automation and scalability, we propose an
alternative direction based on distributed representations of binary programs
with applicability to a number of downstream tasks. We introduce Bin2vec, a new
approach leveraging Graph Convolutional Networks (GCN) along with computational
program graphs in order to learn a high dimensional representation of binary
executable programs. We demonstrate the versatility of this approach by using
our representations to solve two semantically different binary analysis tasks -
functional algorithm classification and vulnerability discovery. We compare the
proposed approach to our own strong baseline as well as published results and
demonstrate improvement over state-of-the-art methods for both tasks. We
evaluated Bin2vec on 49191 binaries for the functional algorithm classification
task, and on 30 different CWE-IDs including at least 100 CVE entries each for
the vulnerability discovery task. We set a new state-of-the-art result by
reducing the classification error by 40% compared to the source-code-based
inst2vec approach, while working on binary code. For almost every vulnerability
class in our dataset, our prediction accuracy is over 80% (and over 90% in
multiple classes). |
---|---|
DOI: | 10.48550/arxiv.2002.03388 |