Hunting Vulnerable Smart Contracts via Graph Embedding Based Bytecode Matching

Smart contract vulnerabilities have attracted lots of concerns due to the resultant financial losses. Matching-based detection methods extrapolating known vulnerabilities to unknown have proven to be effective in other platforms. However, directly adopting the technique to smart contracts is obstruc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information forensics and security 2021, Vol.16, p.2144-2156
Hauptverfasser: Huang, Jianjun, Han, Songming, You, Wei, Shi, Wenchang, Liang, Bin, Wu, Jingzheng, Wu, Yanjun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Smart contract vulnerabilities have attracted lots of concerns due to the resultant financial losses. Matching-based detection methods extrapolating known vulnerabilities to unknown have proven to be effective in other platforms. However, directly adopting the technique to smart contracts is obstructed by two issues, i.e., diversity of bytecode generation resulting from the rapid evolution of compilers and interference of noise code easily caused by the homogeneous business logics. To address the problems, we propose contract bytecode-oriented normalization and slicing techniques to augment bytecode matching. Specifically, we conduct data- and instruction-level normalizations to uniform the bytecode generated by different compilers, and enforce contract-specific slicing by tracking data- and control-flows with simulated bytecode executions to prune the noise code as far as possible. Based on the above techniques, we design an unsupervised graph embedding algorithm to encode the code graphs into quantitatively comparable vectors. The potentially vulnerable smart contracts can be identified by measuring the similarities between their vectors and known vulnerable ones. Our evaluations have shown the efficiency (0.47 seconds per contract on average), effectiveness (160 verified true positives) and high precision (91.95% for top-ranked). It is worth noting that, we also identify dozens of honeypot contracts, further demonstrating the capability of our method.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2021.3050051