-
Comprehensive Formal Verification of Observational Correctness for the CHERIoT-Ibex Processor
Authors:
Louis-Emile Ploix,
Alasdair Armstrong,
Tom Melham,
Ray Lin,
Haolong Wang,
Anastasia Courtney
Abstract:
The CHERI architecture equips conventional RISC ISAs with significant architectural extensions that provide a hardware-enforced mechanism for memory protection and software compartmentalisation. Architectural capabilities replace conventional integer pointers with memory addresses bound to permissions constraining their use. We present the first comprehensive formal verification of a capability ex…
▽ More
The CHERI architecture equips conventional RISC ISAs with significant architectural extensions that provide a hardware-enforced mechanism for memory protection and software compartmentalisation. Architectural capabilities replace conventional integer pointers with memory addresses bound to permissions constraining their use. We present the first comprehensive formal verification of a capability extended RISC-V processor with internally 'compressed' capabilities - a concise encoding of capabilities with some resemblance to floating point number representations.
The reference model for RTL correctness is a minor variant of the full and definitive ISA description written in the Sail ISA specification language. This is made accessible to formal verification tools by a prototype flow for translation of Sail into SystemVerilog. Our verification demonstrates a methodology for establishing that the processor always produces a stream of interactions with memory that is identical to that specified in Sail, when started in the same initial state. We additionally establish liveness. This abstract, microarchitecture-independent observational correctness property provides a comprehensive and clear assurance of functional correctness for the CHERIoT-Ibex processor's observable interactions with memory.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
A Formal CHERI-C Semantics for Verification
Authors:
Seung Hoon Park,
Rekha Pai,
Tom Melham
Abstract:
CHERI-C extends the C programming language by adding hardware capabilities, ensuring a certain degree of memory safety while remaining efficient. Capabilities can also be employed for higher-level security measures, such as software compartmentalization, that have to be used correctly to achieve the desired security guarantees. As the extension changes the semantics of C, new theories and tooling…
▽ More
CHERI-C extends the C programming language by adding hardware capabilities, ensuring a certain degree of memory safety while remaining efficient. Capabilities can also be employed for higher-level security measures, such as software compartmentalization, that have to be used correctly to achieve the desired security guarantees. As the extension changes the semantics of C, new theories and tooling are required to reason about CHERI-C code and verify correctness. In this work, we present a formal memory model that provides a memory semantics for CHERI-C programs. We present a generalised theory with rich properties suitable for verification and potentially other types of analyses. Our theory is backed by an Isabelle/HOL formalisation that also generates an OCaml executable instance of the memory model. The verified and extracted code is then used to instantiate the parametric Gillian program analysis framework, with which we can perform concrete execution of CHERI-C programs. The tool can run a CHERI-C test suite, demonstrating the correctness of our tool, and catch a good class of safety violations that the CHERI hardware might miss.
△ Less
Submitted 26 January, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Active Learning of Abstract System Models from Traces using Model Checking [Extended]
Authors:
Natasha Yogananda Jeppu,
Tom Melham,
Daniel Kroening
Abstract:
We present a new active model-learning approach to generating abstractions of a system implementation, as finite state automata (FSAs), from execution traces. Given an implementation and a set of observable system variables, the generated automata admit all system behaviours over the given variables and provide useful insight in the form of invariants that hold on the implementation. To achieve th…
▽ More
We present a new active model-learning approach to generating abstractions of a system implementation, as finite state automata (FSAs), from execution traces. Given an implementation and a set of observable system variables, the generated automata admit all system behaviours over the given variables and provide useful insight in the form of invariants that hold on the implementation. To achieve this, the proposed approach uses a pluggable model learning component that can generate an FSA from a given set of traces. Conditions that encode a completeness hypothesis are then extracted from the FSA under construction and used to evaluate its degree of completeness by checking their truth value against the system using software model checking. This generates new traces that express any missing behaviours. The new trace data is used to iteratively refine the abstraction, until all system behaviours are admitted by the learned abstraction. To evaluate the approach, we reverse-engineer a set of publicly available Simulink Stateflow models from their C implementations.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Exposing Previously Undetectable Faults in Deep Neural Networks
Authors:
Isaac Dunn,
Hadrien Pouget,
Daniel Kroening,
Tom Melham
Abstract:
Existing methods for testing DNNs solve the oracle problem by constraining the raw features (e.g. image pixel values) to be within a small distance of a dataset example for which the desired DNN output is known. But this limits the kinds of faults these approaches are able to detect. In this paper, we introduce a novel DNN testing method that is able to find faults in DNNs that other methods canno…
▽ More
Existing methods for testing DNNs solve the oracle problem by constraining the raw features (e.g. image pixel values) to be within a small distance of a dataset example for which the desired DNN output is known. But this limits the kinds of faults these approaches are able to detect. In this paper, we introduce a novel DNN testing method that is able to find faults in DNNs that other methods cannot. The crux is that, by leveraging generative machine learning, we can generate fresh test inputs that vary in their high-level features (for images, these include object shape, location, texture, and colour). We demonstrate that our approach is capable of detecting deliberately injected faults as well as new faults in state-of-the-art DNNs, and that in both cases, existing methods are unable to find these faults.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities
Authors:
Isaac Dunn,
Laura Hanu,
Hadrien Pouget,
Daniel Kroening,
Tom Melham
Abstract:
We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce th…
▽ More
We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.
△ Less
Submitted 23 October, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Learning Concise Models from Long Execution Traces
Authors:
Natasha Yogananda Jeppu,
Tom Melham,
Daniel Kroening,
John O'Leary
Abstract:
Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges,…
▽ More
Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges, succinctly describing system behaviour. It employs trace segmentation to tackle complexity for long traces. We learn concise models capturing transaction-level, system-wide behaviour--experimentally demonstrating the approach using traces from a variety of sources, including the x86 QEMU virtual platform and the Real-Time Linux kernel.
△ Less
Submitted 5 May, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Hardware/Software Co-verification Using Path-based Symbolic Execution
Authors:
Rajdeep Mukherjee,
Saurabh Joshi,
John O'Leary,
Daniel Kroening,
Tom Melham
Abstract:
Conventional tools for formal hardware/software co-verification use bounded model checking techniques to construct a single monolithic propositional formula. Formulas generated in this way are extremely complex and contain a great deal of irrelevant logic, hence are difficult to solve even by the state-of-the-art Satis ability (SAT) solvers. In a typical hardware/software co-design the firmware on…
▽ More
Conventional tools for formal hardware/software co-verification use bounded model checking techniques to construct a single monolithic propositional formula. Formulas generated in this way are extremely complex and contain a great deal of irrelevant logic, hence are difficult to solve even by the state-of-the-art Satis ability (SAT) solvers. In a typical hardware/software co-design the firmware only exercises a fraction of the hardware state-space, and we can use this observation to generate simpler and more concise formulas. In this paper, we present a novel verification algorithm for hardware/software co-designs that identify partitions of the firmware and the hardware logic pertaining to the feasible execution paths by means of path-based symbolic simulation with custom path-pruning, property-guided slicing and incremental SAT solving. We have implemented this approach in our tool COVERIF. We have experimentally compared COVERIF with HW-CBMC, a monolithic BMC based co-verification tool, and observed an average speed-up of 5X over HW-CBMC for proving safety properties as well as detecting critical co-design bugs in an open-source Universal Asynchronous Receiver Transmitter design and a large SoC design.
△ Less
Submitted 5 January, 2020;
originally announced January 2020.
-
DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning
Authors:
Mohammadhosein Hasanbeig,
Natasha Yogananda Jeppu,
Alessandro Abate,
Tom Melham,
Daniel Kroening
Abstract:
This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesis…
▽ More
This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.
△ Less
Submitted 6 March, 2021; v1 submitted 22 November, 2019;
originally announced November 2019.
-
CREST: Hardware Formal Verification with ANSI-C Reference Specifications
Authors:
Andreas Tiemeyer,
Tom Melham,
Daniel Kroening,
John O'Leary
Abstract:
This paper presents CREST, a prototype front-end tool intended as an add-on to commercial EDA formal verifcation environments. CREST is an adaptation of the CBMC bounded model checker for C, an academic tool widely used in industry for software analysis and property verification. It leverages the capabilities of CBMC to process hardware datapath specifications written in arbitrary ANSI-C, without…
▽ More
This paper presents CREST, a prototype front-end tool intended as an add-on to commercial EDA formal verifcation environments. CREST is an adaptation of the CBMC bounded model checker for C, an academic tool widely used in industry for software analysis and property verification. It leverages the capabilities of CBMC to process hardware datapath specifications written in arbitrary ANSI-C, without limiting restrictions to a synthesizable subset. We briefly sketch the architecture of our tool and show its use in a range of verification case studies.
△ Less
Submitted 4 August, 2019;
originally announced August 2019.
-
Adaptive Generation of Unrestricted Adversarial Inputs
Authors:
Isaac Dunn,
Hadrien Pouget,
Tom Melham,
Daniel Kroening
Abstract:
Neural networks are vulnerable to adversarially-constructed perturbations of their inputs. Most research so far has considered perturbations of a fixed magnitude under some $l_p$ norm. Although studying these attacks is valuable, there has been increasing interest in the construction of (and robustness to) unrestricted attacks, which are not constrained to a small and rather artificial subset of a…
▽ More
Neural networks are vulnerable to adversarially-constructed perturbations of their inputs. Most research so far has considered perturbations of a fixed magnitude under some $l_p$ norm. Although studying these attacks is valuable, there has been increasing interest in the construction of (and robustness to) unrestricted attacks, which are not constrained to a small and rather artificial subset of all possible adversarial inputs. We introduce a novel algorithm for generating such unrestricted adversarial inputs which, unlike prior work, is adaptive: it is able to tune its attacks to the classifier being targeted. It also offers a 400-2,000x speedup over the existing state of the art. We demonstrate our approach by generating unrestricted adversarial inputs that fool classifiers robust to perturbation-based attacks. We also show that, by virtue of being adaptive and unrestricted, our attack is able to defeat adversarial training against it.
△ Less
Submitted 1 October, 2019; v1 submitted 7 May, 2019;
originally announced May 2019.
-
Automatic Heap Layout Manipulation for Exploitation
Authors:
Sean Heelan,
Tom Melham,
Daniel Kroening
Abstract:
Heap layout manipulation is integral to exploiting heap-based memory corruption vulnerabilities. In this paper we present the first automatic approach to the problem, based on pseudo-random black-box search. Our approach searches for the inputs required to place the source of a heap-based buffer overflow or underflow next to heap-allocated objects that an exploit developer, or automatic exploit ge…
▽ More
Heap layout manipulation is integral to exploiting heap-based memory corruption vulnerabilities. In this paper we present the first automatic approach to the problem, based on pseudo-random black-box search. Our approach searches for the inputs required to place the source of a heap-based buffer overflow or underflow next to heap-allocated objects that an exploit developer, or automatic exploit generation system, wishes to read or corrupt. We present a framework for benchmarking heap layout manipulation algorithms, and use it to evaluate our approach on several real-world allocators, showing that pseudo-random black box search can be highly effective. We then present SHRIKE, a novel system that can perform automatic heap layout manipulation on the PHP interpreter and can be used in the construction of control-flow hijacking exploits. Starting from PHP's regression tests, SHRIKE discovers fragments of PHP code that interact with the interpreter's heap in useful ways, such as making allocations and deallocations of particular sizes, or allocating objects containing sensitive data, such as pointers. SHRIKE then uses our search algorithm to piece together these fragments into programs, searching for one that achieves a desired heap layout. SHRIKE allows an exploit developer to focus on the higher level concepts in an exploit, and to defer the resolution of heap layout constraints to SHRIKE. We demonstrate this by using SHRIKE in the construction of a control-flow hijacking exploit for the PHP interpreter.
△ Less
Submitted 3 September, 2018; v1 submitted 23 April, 2018;
originally announced April 2018.
-
Lifting CDCL to Template-based Abstract Domains for Program Verification
Authors:
Rajdeep Mukherjee,
Peter Schrammel,
Leopold Haller,
Daniel Kroening,
Tom Melham
Abstract:
The success of Conflict Driven Clause Learning (CDCL) for Boolean satisfiability has inspired adoption in other domains. We present a novel lifting of CDCL to program analysis called Abstract Conflict Driven Learning for Programs (ACDLP). ACDLP alternates between model search, which performs over-approximate deduction with constraint propagation, and conflict analysis, which performs under-approxi…
▽ More
The success of Conflict Driven Clause Learning (CDCL) for Boolean satisfiability has inspired adoption in other domains. We present a novel lifting of CDCL to program analysis called Abstract Conflict Driven Learning for Programs (ACDLP). ACDLP alternates between model search, which performs over-approximate deduction with constraint propagation, and conflict analysis, which performs under-approximate abduction with heuristic choice. We instantiate the model search and conflict analysis algorithms to an abstract domain of template polyhedra, strictly generalizing CDCL from the Boolean lattice to a richer lattice structure. Our template polyhedra can express intervals, octagons and restricted polyhedral constraints over program variables. We have imple- mented ACDLP for automatic bounded safety verification of C programs. We evaluate the performance of our analyser by comparing with CBMC, which uses CDCL, and Astree, a commercial abstract interpretation tool. We observe two orders of magnitude reduction in the number of decisions, propagations, and conflicts as well as a 1.5x speedup in runtime compared to CBMC. Compared to Astree, ACDLP solves twice as many benchmarks and has much higher precision. This is the first instantiation of CDCL with a template polyhedra abstract domain.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
Verification of the Tree-Based Hierarchical Read-Copy Update in the Linux Kernel
Authors:
Lihao Liang,
Paul E. McKenney,
Daniel Kroening,
Tom Melham
Abstract:
Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel synchronization mechanism that runs low-overhead readers concurrently with updaters. Production-quality RCU implementations for multi-core systems are decidedly non-trivial. Giving the ubiquity of Linux, a rare "million-year" bug can occur several times per day across the installed base. Stringent validation of RCU's complex behav…
▽ More
Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel synchronization mechanism that runs low-overhead readers concurrently with updaters. Production-quality RCU implementations for multi-core systems are decidedly non-trivial. Giving the ubiquity of Linux, a rare "million-year" bug can occur several times per day across the installed base. Stringent validation of RCU's complex behaviors is thus critically important. Exhaustive testing is infeasible due to the exponential number of possible executions, which suggests use of formal verification.
Previous verification efforts on RCU either focus on simple implementations or use modeling languages, the latter requiring error-prone manual translation that must be repeated frequently due to regular changes in the Linux kernel's RCU implementation. In this paper, we first describe the implementation of Tree RCU in the Linux kernel. We then discuss how to construct a model directly from Tree RCU's source code in C, and use the CBMC model checker to verify its safety and liveness properties. To our best knowledge, this is the first verification of a significant part of RCU's source code, and is an important step towards integration of formal verification into the Linux kernel's regression test suite.
△ Less
Submitted 22 November, 2018; v1 submitted 10 October, 2016;
originally announced October 2016.
-
Equivalence Checking a Floating-point Unit against a High-level C Model (Extended Version)
Authors:
Rajdeep Mukherjee,
Saurabh Joshi,
Andreas Griesmayer,
Daniel Kroening,
Tom Melham
Abstract:
Semiconductor companies have increasingly adopted a methodology that starts with a system-level design specification in C/C++/SystemC. This model is extensively simulated to ensure correct functionality and performance. Later, a Register Transfer Level (RTL) implementation is created in Verilog, either manually by a designer or automatically by a high-level synthesis tool. It is essential to check…
▽ More
Semiconductor companies have increasingly adopted a methodology that starts with a system-level design specification in C/C++/SystemC. This model is extensively simulated to ensure correct functionality and performance. Later, a Register Transfer Level (RTL) implementation is created in Verilog, either manually by a designer or automatically by a high-level synthesis tool. It is essential to check that the C and Verilog programs are consistent. In this paper, we present a two-step approach, embodied in two equivalence checking tools, VERIFOX and HW-CBMC, to validate designs at the software and RTL levels, respectively. VERIFOX is used for equivalence checking of an untimed software model in C against a high-level reference model in C. HW-CBMC verifies the equivalence of a Verilog RTL implementation against an untimed software model in C. To evaluate our tools, we applied them to a commercial floating-point arithmetic unit (FPU) from ARM and an open-source dual-path floating-point adder.
△ Less
Submitted 1 September, 2016;
originally announced September 2016.
-
On the Semantics of ReFLect as a Basis for a Reflective Theorem Prover
Authors:
Tom Melham,
Raphael Cohn,
Ian Childs
Abstract:
This paper explores the semantics of a combinatory fragment of reFLect, the lambda-calculus underlying a functional language used by Intel Corporation for hardware design and verification. ReFLect is similar to ML, but has a primitive data type whose elements are the abstract syntax trees of reFLect expressions themselves. Following the LCF paradigm, this is intended to serve as the object languag…
▽ More
This paper explores the semantics of a combinatory fragment of reFLect, the lambda-calculus underlying a functional language used by Intel Corporation for hardware design and verification. ReFLect is similar to ML, but has a primitive data type whose elements are the abstract syntax trees of reFLect expressions themselves. Following the LCF paradigm, this is intended to serve as the object language of a higher-order logic theorem prover for specification and reasoning - but one in which object- and meta-languages are unified. The aim is to intermix program evaluation and logical deduction through reflection mechanisms. We identify some difficulties with the semantics of reFLect as currently defined, and propose a minimal modification of the type system that avoids these problems.
△ Less
Submitted 23 September, 2013;
originally announced September 2013.
-
Chaining Test Cases for Reactive System Testing (extended version)
Authors:
Peter Schrammel,
Tom Melham,
Daniel Kroening
Abstract:
Testing of synchronous reactive systems is challenging because long input sequences are often needed to drive them into a state at which a desired feature can be tested. This is particularly problematic in on-target testing, where a system is tested in its real-life application environment and the time required for resetting is high. This paper presents an approach to discovering a test case chain…
▽ More
Testing of synchronous reactive systems is challenging because long input sequences are often needed to drive them into a state at which a desired feature can be tested. This is particularly problematic in on-target testing, where a system is tested in its real-life application environment and the time required for resetting is high. This paper presents an approach to discovering a test case chain---a single software execution that covers a group of test goals and minimises overall test execution time. Our technique targets the scenario in which test goals for the requirements are given as safety properties. We give conditions for the existence and minimality of a single test case chain and minimise the number of test chains if a single test chain is infeasible. We report experimental results with a prototype tool for C code generated from Simulink models and compare it to state-of-the-art test suite generators.
△ Less
Submitted 6 November, 2013; v1 submitted 14 June, 2013;
originally announced June 2013.