Automating Comment Generation for Smart Contract from Bytecode

Recently, smart contracts have played a vital role in automatic financial and business transactions. To help end users without programming background to better understand the logic of smart contracts, previous studies have proposed models for automatically translating smart contract source code into...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on software engineering and methodology 2024-10
Hauptverfasser: Xiang, Jianhang, Gao, Zhipeng, Bao, Lingfeng, Hu, Xing, Chen, Jiayuan, Xia, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, smart contracts have played a vital role in automatic financial and business transactions. To help end users without programming background to better understand the logic of smart contracts, previous studies have proposed models for automatically translating smart contract source code into their corresponding code summaries. However, in practice, only 13% of smart contracts deployed on the Ethereum blockchain are associated with source code. The practical usage of these existing tools is significantly restricted. Considering that bytecode is always necessary when deploying smart contracts, in this paper, we first introduce the task of automatically generating smart contract code summaries from bytecode. We propose a novel approach, named SmartBT (Smart contract Bytecode Translator) for automatically translating smart contract bytecode into fine-grained natural language description directly. Two key challenges are posed for this task: structural code logic hidden in bytecode and the huge semantic gap between bytecode and natural language descriptions. To address the first challenge, we transform bytecode into CFG (Control-Flow Graph) to learn code structural and logic details. Regarding the second challenge, we introduce an information retrieval component to fetch similar comments for filling the semantic gap. Then the structural input and semantic input are used to build an attentional sequence-to-sequence neural network model. The copy mechanism is employed to copy rare words directly from similar comments and the coverage mechanism is employed to eliminate repetitive outputs. The automatic evaluation results show that SmartBT outperforms a set of baselines by a large margin, and the human evaluation results show the effectiveness and potential of SmartBT in producing meaningful and accurate comments for smart contract code from bytecode directly.
ISSN:1049-331X
1557-7392
DOI:10.1145/3699597