VulHunter: Hunting Vulnerable Smart Contracts at EVM Bytecode-Level via Multiple Instance Learning

With the economic development of Ethereum, the frequent security incidents involving smart contracts running on this platform have caused billions of dollars in losses. Consequently, there is a pressing need to identify the vulnerabilities in contracts, while the state-of-the-art (SOTA) detection me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2023-11, Vol.49 (11), p.4886-4916
Hauptverfasser: Li, Zhaoxuan, Lu, Siqi, Zhang, Rui, Zhao, Ziming, Liang, Rujin, Xue, Rui, Li, Wenhao, Zhang, Fan, Gao, Sheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the economic development of Ethereum, the frequent security incidents involving smart contracts running on this platform have caused billions of dollars in losses. Consequently, there is a pressing need to identify the vulnerabilities in contracts, while the state-of-the-art (SOTA) detection methods have been limited in this regard as they cannot overcome three challenges at the same time. (i) Meet the requirements of detecting the source code, bytecode, and opcode of contracts simultaneously; (ii) reduce the reliance on manual pre-defined rules/patterns and expert involvement; (iii) assist contract developers in completing the contract lifecycle more safely, e.g., vulnerability repair and abnormal monitoring. With the development of machine learning (ML), using it to detect the contract runtime execution sequences (called instances) has made it possible to address these challenges. However, the lack of datasets with fine-grained sequence labels poses a significant obstacle, given the unreadability of bytecode/opcode. To this end, we propose a method named VulHunter that extracts the instances by traversing the Control Flow Graph built from contract opcodes. Based on the hybrid attention and multi-instance learning mechanisms, VulHunter reasons the instance labels and designs an optional classifier to automatically capture the subtle features of both normal and defective contracts, thereby identifying the vulnerable instances. Then, it combines the symbolic execution to construct and solve symbolic constraints to validate their feasibility. Finally, we implement a prototype of VulHunter with 15K lines of code and compare it with 9 SOTA methods on five open source datasets including 52,042 source codes and 184,289 bytecodes. The results indicate that VulHunter can detect contract vulnerabilities more accurately (90.04% accuracy and 85.60% F1 score), efficiently (only 4.4 seconds per contract), and robustly (0% analysis failure rate) than SOTA methods. Also, it can focus on specific metrics such as precision and recall by employing different baseline models and hyperparameters to meet the various user requirements, e.g., vulnerability discovery and misreport mitigation. More importantly, compared with the previous ML-based arts, it can not only provide classification results, defective contract source code statements, key opcode fragments, and vulnerable execution paths, but also eliminate misreports and facilitate more operations such as vulnerability
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2023.3317209