SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair

Chen, Zaoyu; Qin, Haoran; Chen, Nuo; Zhao, Xiangyu; Xue, Lei; Luo, Xiapu; Wu, Xiao-Ming

Computer Science > Software Engineering

arXiv:2503.01098 (cs)

[Submitted on 3 Mar 2025]

Title:SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair

Authors:Zaoyu Chen, Haoran Qin, Nuo Chen, Xiangyu Zhao, Lei Xue, Xiapu Luo, Xiao-Ming Wu

View PDF HTML (experimental)

Abstract:Smart contracts are crucial programs on blockchains, and their immutability post-deployment makes functional correctness vital. Despite progress in code completion models, benchmarks for Solidity, the primary smart contract language, are lacking. Existing metrics like BLEU do not adequately assess the functional correctness of generated smart contracts. To fill this gap, we introduce SolBench, a benchmark for evaluating the functional correctness of Solidity smart contracts generated by code completion models. SolBench includes 4,178 functions from 1,155 Ethereum-deployed contracts. Testing advanced models revealed challenges in generating correct code without context, as Solidity functions rely on context-defined variables and interfaces. To address this, we propose a Retrieval-Augmented Code Repair framework. In this framework, an executor verifies functional correctness, and if necessary, an LLM repairs the code using retrieved snippets informed by executor traces. We conduct a comprehensive evaluation of both closed-source and open-source LLMs across various model sizes and series to assess their performance in smart contract completion. The results show that code repair and retrieval techniques effectively enhance the correctness of smart contract completion while reducing computational costs.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2503.01098 [cs.SE]
	(or arXiv:2503.01098v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2503.01098

Submission history

From: Zaoyu Chen [view email]
[v1] Mon, 3 Mar 2025 01:55:20 UTC (10,013 KB)

Computer Science > Software Engineering

Title:SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators