-
The Secrets Must Not Flow: Scaling Security Verification to Large Codebases (extended version)
Authors:
Linard Arquint,
Samarth Kishor,
Jason R. Koenig,
Joey Dodds,
Daniel Kroening,
Peter Müller
Abstract:
Existing program verifiers can prove advanced properties about security protocol implementations, but are difficult to scale to large codebases because of the manual effort required. We develop a novel methodology called *Diodon* that addresses this challenge by splitting the codebase into the protocol implementation (the *Core*) and the remainder (the *Application*). This split allows us to apply…
▽ More
Existing program verifiers can prove advanced properties about security protocol implementations, but are difficult to scale to large codebases because of the manual effort required. We develop a novel methodology called *Diodon* that addresses this challenge by splitting the codebase into the protocol implementation (the *Core*) and the remainder (the *Application*). This split allows us to apply powerful semi-automated verification techniques to the security-critical Core, while fully-automatic static analyses scale the verification to the entire codebase by ensuring that the Application cannot invalidate the security properties proved for the Core. The static analyses achieve that by proving *I/O independence*, i.e., that the I/O operations within the Application are independent of the Core's security-relevant data (such as keys), and that the Application meets the Core's requirements. We have proved Diodon sound by first showing that we can safely allow the Application to perform I/O independent of the security protocol, and second that manual verification and static analyses soundly compose. We evaluate Diodon on two case studies: an implementation of the signed Diffie-Hellman key exchange and a large (100k+ LoC) production Go codebase implementing a key exchange protocol for which we obtained secrecy and injective agreement guarantees by verifying a Core of about 1% of the code with the auto-active program verifier Gobra in less than three person months.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Program Synthesis from Partial Traces
Authors:
Margarida Ferreira,
Victor Nicolet,
Joey Dodds,
Daniel Kroening
Abstract:
We present the first technique to synthesize programs that compose side-effecting functions, pure functions, and control flow, from partial traces containing records of only the side-effecting functions. This technique can be applied to synthesize API composing scripts from logs of calls made to those APIs, or a script from traces of system calls made by a workload, for example. All of the provide…
▽ More
We present the first technique to synthesize programs that compose side-effecting functions, pure functions, and control flow, from partial traces containing records of only the side-effecting functions. This technique can be applied to synthesize API composing scripts from logs of calls made to those APIs, or a script from traces of system calls made by a workload, for example. All of the provided traces are positive examples, meaning that they describe desired behavior. Our approach does not require negative examples. Instead, it generalizes over the examples and uses cost metrics to prevent over-generalization. Because the problem is too complex for traditional monolithic program synthesis techniques, we propose a new combination of optimizing rewrites and syntax-guided program synthesis. The resulting program is correct by construction, so its output will always be able to reproduce the input traces. We evaluate the quality of the programs synthesized when considering various optimization metrics and the synthesizer's efficiency on real-world benchmarks. The results show that our approach can generate useful real-world programs.
△ Less
Submitted 10 June, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
Towards Translating Real-World Code with LLMs: A Study of Translating to Rust
Authors:
Hasan Ferit Eniser,
Hanliang Zhang,
Cristina David,
Meng Wang,
Maria Christakis,
Brandon Paulsen,
Joey Dodds,
Daniel Kroening
Abstract:
Large language models (LLMs) show promise in code translation - the task of translating code written in one programming language to another language - due to their ability to write code in most programming languages. However, LLM's effectiveness on translating real-world code remains largely unstudied. In this work, we perform the first substantial study on LLM-based translation to Rust by assessi…
▽ More
Large language models (LLMs) show promise in code translation - the task of translating code written in one programming language to another language - due to their ability to write code in most programming languages. However, LLM's effectiveness on translating real-world code remains largely unstudied. In this work, we perform the first substantial study on LLM-based translation to Rust by assessing the ability of five state-of-the-art LLMs, GPT4, Claude 3, Claude 2.1, Gemini Pro, and Mixtral. We conduct our study on code extracted from real-world open source projects. To enable our study, we develop FLOURINE, an end-to-end code translation tool that uses differential fuzzing to check if a Rust translation is I/O equivalent to the original source program, eliminating the need for pre-existing test cases. As part of our investigation, we assess both the LLM's ability to produce an initially successful translation, as well as their capacity to fix a previously generated buggy one. If the original and the translated programs are not I/O equivalent, we apply a set of automated feedback strategies, including feedback to the LLM with counterexamples. Our results show that the most successful LLM can translate 47% of our benchmarks, and also provides insights into next steps for improvements.
△ Less
Submitted 17 April, 2025; v1 submitted 19 May, 2024;
originally announced May 2024.
-
VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners
Authors:
Aidan Z. H. Yang,
Yoshiki Takashima,
Brandon Paulsen,
Josiah Dodds,
Daniel Kroening
Abstract:
Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based appr…
▽ More
Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches can theoretically produce correct transpilations that maintain input-output equivalence to the original, they often yield unreadable Rust code that uses unsafe subsets of the Rust language. On the other hand, while LLM-based approaches typically produce more readable, maintainable, and safe code, they do not provide any guarantees about correctness. In this work, we present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness. VERT's only requirement is that there is Web Assembly compiler for the source language, which is true for most major languages. VERT first uses the Web Assembly compiler to obtain an oracle Rust program. In parallel, VERT uses an LLM to generate a readable candidate Rust program. This candidate is verified against the oracle, and if verification fails, we regenerate a new candidate transpilation until verification succeeds. We evaluate VERT by transpiling a suite of 1,394 programs taken from competitive programming style benchmarks. Combining Anthropic's Claude-2 and VERT increases Rust transpilations passing property-based testing from 31% to 54% and bounded model-checking from 1% to 42% compared to using Claude alone. In addition, we evaluate VERT's ability to generate non-trivial safe Rust on programs taken from real-world C projects that make significant use of pointers. Our results provide insights into the limitations of LLMs to write safe Rust.
△ Less
Submitted 25 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.