-
Depermissioning Web3: a Permissionless Accountable RPC Protocol for Blockchain Networks
Authors:
Weihong Wang,
Tom Van Cutsem
Abstract:
In blockchain networks, so-called "full nodes" serve data to and relay transactions from clients through an RPC interface. This serving layer enables integration of "Web3" data, stored on blockchains, with "Web2" mobile or web applications that cannot directly participate as peers in a blockchain network. In practice, the serving layer is dominated by a small number of centralized services ("node…
▽ More
In blockchain networks, so-called "full nodes" serve data to and relay transactions from clients through an RPC interface. This serving layer enables integration of "Web3" data, stored on blockchains, with "Web2" mobile or web applications that cannot directly participate as peers in a blockchain network. In practice, the serving layer is dominated by a small number of centralized services ("node providers") that offer permissioned access to RPC endpoints. Clients register with these providers because they offer reliable and convenient access to blockchain data: operating a full node themselves requires significant computational and storage resources, and public (permissionless) RPC nodes lack financial incentives to serve large numbers of clients with consistent performance.
Permissioned access to an otherwise permissionless blockchain network raises concerns regarding the privacy, integrity, and availability of data access. To address this, we propose a Permissionless Accountable RPC Protocol (PARP). It enables clients and full nodes to interact pseudonymously while keeping both parties accountable. PARP leverages "light client" schemes for essential data integrity checks, combined with fraud proofs, to keep full nodes honest and accountable. It integrates payment channels to facilitate micro-payments, holding clients accountable for the resources they consume and providing an economic incentive for full nodes to serve. Our prototype implementation for Ethereum demonstrates the feasibility of PARP, and we quantify its overhead compared to the base RPC protocol.
△ Less
Submitted 6 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
A Scalable State Sharing Protocol for Low-Resource Validator Nodes in Blockchain Networks
Authors:
Ruben Hias,
Weihong Wang,
Jan Vanhoof,
Tom Van Cutsem
Abstract:
The perpetual growth of data stored on popular blockchains such as Ethereum leads to significant scalability challenges and substantial storage costs for operators of full nodes. Increasing costs may lead to fewer independently operated nodes in the network, which poses risks to decentralization (and hence network security), but also pushes decentralized app developers towards centrally hosted API…
▽ More
The perpetual growth of data stored on popular blockchains such as Ethereum leads to significant scalability challenges and substantial storage costs for operators of full nodes. Increasing costs may lead to fewer independently operated nodes in the network, which poses risks to decentralization (and hence network security), but also pushes decentralized app developers towards centrally hosted API services.
This paper introduces a new protocol that allows validator nodes to participate in a blockchain network without the need to store the full state of the network on each node. The key idea is to use the blockchain network as both a replicated state machine and as a distributed storage system. By distributing states across nodes and enabling efficient data retrieval through a Kademlia-inspired routing protocol, we reduce storage costs for validators. Cryptographic proofs (such as Merkle proofs) are used to allow nodes to verify data stored by other nodes without having to trust those nodes directly. While the protocol trades off data storage for increased network bandwidth, we show how gossiping and caching can minimize the increased bandwidth needs.
To validate our state sharing protocol, we conduct an extensive quantitative analysis of Ethereum's data storage and data access patterns. Our findings indicate that while our protocol significantly lowers storage needs, it comes with an increased bandwidth usage ranging from 1.5 MB to 5 MB per block, translating to an additional monthly bandwidth of 319 GB to 1,065 GB. Despite this, the size remains small enough such that it can be passed to all nodes and validated within Ethereum's 12-second block validation window. Further analysis shows that Merkle proofs are the most significant contributor to the additional bandwidth. To address this concern, we also analyze the impact of switching to the more space-efficient Verkle Proofs.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Static Application Security Testing of Consensus-Critical Code in the Cosmos Network
Authors:
Jasper Surmont,
Weihong Wang,
Tom Van Cutsem
Abstract:
Blockchains require deterministic execution in order to reach consensus. This is often guaranteed in languages designed to write smart contracts, such as Solidity. Application-specific blockchains or ``appchains'' allow the blockchain application logic to be written using general-purpose programming languages, giving developers more flexibility but also additional responsibilities. In particular,…
▽ More
Blockchains require deterministic execution in order to reach consensus. This is often guaranteed in languages designed to write smart contracts, such as Solidity. Application-specific blockchains or ``appchains'' allow the blockchain application logic to be written using general-purpose programming languages, giving developers more flexibility but also additional responsibilities. In particular, developers must ensure that their blockchain application logic does not contain any sources of non-determinism. Any source of non-determinism may be a potential source of vulnerabilities.
This paper focuses on the use of Static Application Security Testing (SAST) tools to detect such sources of non-determinism at development time. We focus on Cosmos, a prominent open-source project that lets developers build interconnected networks of application-specific blockchains. Cosmos provides a Software Development Kit (SDK) that allows these chains to be implemented in the Go programming language. We create a corpus of 11 representative Cosmos-based appchains to analyze for sources of non-determinism in Go.
As part of our study, we identified cosmos-sdk-codeql, a set of CodeQL code analysis rules for Cosmos applications. We find that these rules generate many false positives and propose a refactored set of rules that more precisely detects sources of non-determinism only in code that runs as part of the blockchain logic. We demonstrate a significant increase in the precision of the rules, making the SAST tool more effective and hence potentially contributing to enhanced security for Cosmos-based blockchains.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Natural Language-Guided Programming
Authors:
Geert Heyman,
Rafael Huysegems,
Pascal Justen,
Tom Van Cutsem
Abstract:
In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a search engine and then manually adapt found examples to their specific context of use. We put forward a vision based on a new breed of developer tools that have the…
▽ More
In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a search engine and then manually adapt found examples to their specific context of use. We put forward a vision based on a new breed of developer tools that have the potential to largely automate this process. The key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next, formulated in plain natural language. We call this practice of enriching the code with natural language intent to facilitate its completion natural language-guided programming.
To show that this idea is feasible we design, implement and benchmark a tool that solves this problem in the context of a specific domain (data science) and a specific programming language (Python). Central to the tool is the use of language models trained on a large corpus of documented code. Our initial experiments confirm the feasibility of the idea but also make it clear that we have only scratched the surface of what may become possible in the future. We end the paper with a comprehensive research agenda to stimulate additional research in the budding area of natural language-guided programming.
△ Less
Submitted 7 October, 2021; v1 submitted 11 August, 2021;
originally announced August 2021.
-
Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent
Authors:
Geert Heyman,
Tom Van Cutsem
Abstract:
In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. On three benchmark datasets, we investigate how code retrieval systems can be improved by leveraging descriptions to better capture the intents of code snippets. Building on recent progress in transfer learning and natural language…
▽ More
In this work, we propose and study annotated code search: the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. On three benchmark datasets, we investigate how code retrieval systems can be improved by leveraging descriptions to better capture the intents of code snippets. Building on recent progress in transfer learning and natural language processing, we create a domain-specific retrieval model for code annotated with a natural language description. We find that our model yields significantly more relevant search results (with absolute gains up to 20.6% in mean reciprocal rank) compared to state-of-the-art code retrieval methods that do not use descriptions but attempt to compute the intent of snippets solely from unannotated code.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Import2vec - Learning Embeddings for Software Libraries
Authors:
Bart Theeten,
Frederik Vandeputte,
Tom Van Cutsem
Abstract:
We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning.
We apply word embedding techniques from natural language processi…
▽ More
We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning.
We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).
△ Less
Submitted 27 March, 2019;
originally announced April 2019.
-
Towards Composable Concurrency Abstractions
Authors:
Janwillem Swalens,
Stefan Marr,
Joeri De Koster,
Tom Van Cutsem
Abstract:
In the past decades, many different programming models for managing concurrency in applications have been proposed, such as the actor model, Communicating Sequential Processes, and Software Transactional Memory. The ubiquity of multi-core processors has made harnessing concurrency even more important. We observe that modern languages, such as Scala, Clojure, or F#, provide not one, but multiple co…
▽ More
In the past decades, many different programming models for managing concurrency in applications have been proposed, such as the actor model, Communicating Sequential Processes, and Software Transactional Memory. The ubiquity of multi-core processors has made harnessing concurrency even more important. We observe that modern languages, such as Scala, Clojure, or F#, provide not one, but multiple concurrency models that help developers manage concurrency. Large end-user applications are rarely built using just a single concurrency model. Programmers need to manage a responsive UI, deal with file or network I/O, asynchronous workflows, and shared resources. Different concurrency models facilitate different requirements. This raises the issue of how these concurrency models interact, and whether they are composable. After all, combining different concurrency models may lead to subtle bugs or inconsistencies.
In this paper, we perform an in-depth study of the concurrency abstractions provided by the Clojure language. We study all pairwise combinations of the abstractions, noting which ones compose without issues, and which do not. We make an attempt to abstract from the specifics of Clojure, identifying the general properties of concurrency models that facilitate or hinder composition.
△ Less
Submitted 13 June, 2014;
originally announced June 2014.
-
Proceedings First International Workshop on Decentralized Coordination of Distributed Processes
Authors:
Tom Van Cutsem,
Mark Miller
Abstract:
This volume contains the papers presented at the 1st International Workshop on "Decentralized Coordination of Distributed Processes", DCDP 2010, held in Amsterdam, The Netherlands on June 10th, 2010 in conjunction with the 5th International Federated Conferences on Distributed Computing Techniques, DisCoTec 2010. The central theme of the workshop is the decentralized coordination of distributed pr…
▽ More
This volume contains the papers presented at the 1st International Workshop on "Decentralized Coordination of Distributed Processes", DCDP 2010, held in Amsterdam, The Netherlands on June 10th, 2010 in conjunction with the 5th International Federated Conferences on Distributed Computing Techniques, DisCoTec 2010. The central theme of the workshop is the decentralized coordination of distributed processes. Decentralized: there is no single authority in the network that everything is vulnerable to. Coordinated: processes need to cooperate to achieve meaningful results, potentially in the face of mutual suspicion. Distributed: processes are separated by a potentially unreliable network.
△ Less
Submitted 8 June, 2010;
originally announced June 2010.