-
DeTRAP: RISC-V Return Address Protection With Debug Triggers
Authors:
Isaac Richter,
Jie Zhou,
John Criswell
Abstract:
Modern microcontroller software is often written in C/C++ and suffers from control-flow hijacking vulnerabilities. Previous mitigations suffer from high performance and memory overheads and require either the presence of memory protection hardware or sophisticated program analysis in the compiler.
This paper presents DeTRAP (Debug Trigger Return Address Protection). DeTRAP utilizes a full implem…
▽ More
Modern microcontroller software is often written in C/C++ and suffers from control-flow hijacking vulnerabilities. Previous mitigations suffer from high performance and memory overheads and require either the presence of memory protection hardware or sophisticated program analysis in the compiler.
This paper presents DeTRAP (Debug Trigger Return Address Protection). DeTRAP utilizes a full implementation of the RISC-V debug hardware specification to provide a write-protected shadow stack for return addresses. Unlike previous work, DeTRAP requires no memory protection hardware and only minor changes to the compiler toolchain.
We tested DeTRAP on an FPGA running a 32-bit RISC-V microcontroller core and found average execution time overheads to be between 0.5% and 1.9% on evaluated benchmark suites with code size overheads averaging 7.9% or less.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
Authors:
Jie Zhou,
Mingshen Sun,
John Criswell
Abstract:
Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing wr…
▽ More
Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust.
This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.
△ Less
Submitted 26 May, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
InversOS: Efficient Control-Flow Protection for AArch64 Applications with Privilege Inversion
Authors:
Zhuojia Shen,
John Criswell
Abstract:
With the increasing popularity of AArch64 processors in general-purpose computing, securing software running on AArch64 systems against control-flow hijacking attacks has become a critical part toward secure computation. Shadow stacks keep shadow copies of function return addresses and, when protected from illegal modifications and coupled with forward-edge control-flow integrity, form an effectiv…
▽ More
With the increasing popularity of AArch64 processors in general-purpose computing, securing software running on AArch64 systems against control-flow hijacking attacks has become a critical part toward secure computation. Shadow stacks keep shadow copies of function return addresses and, when protected from illegal modifications and coupled with forward-edge control-flow integrity, form an effective and proven defense against such attacks. However, AArch64 lacks native support for write-protected shadow stacks, while software alternatives either incur prohibitive performance overhead or provide weak security guarantees.
We present InversOS, the first hardware-assisted write-protected shadow stacks for AArch64 user-space applications, utilizing commonly available features of AArch64 to achieve efficient intra-address space isolation (called Privilege Inversion) required to protect shadow stacks. Privilege Inversion adopts unconventional design choices that run protected applications in the kernel mode and mark operating system (OS) kernel memory as user-accessible; InversOS therefore uses a novel combination of OS kernel modifications, compiler transformations, and another AArch64 feature to ensure the safety of doing so and to support legacy applications. We show that InversOS is secure by design, effective against various control-flow hijacking attacks, and performant on selected benchmarks and applications (incurring overhead of 7.0% on LMBench, 7.1% on SPEC CPU 2017, and 3.0% on Nginx web server).
△ Less
Submitted 19 July, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Fat Pointers for Temporal Memory Safety of C
Authors:
Jie Zhou,
John Criswell,
Michael Hicks
Abstract:
Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high per…
▽ More
Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high performance overhead and/or miss detecting certain types of temporal memory safety bugs.
In this paper, we propose a temporal memory safety solution that is both efficient and comprehensive. Specifically, we extend Checked C, a spatially-safe extension to C, with temporally-safe pointers. These are implemented by combining two techniques: fat pointers and dynamic key-lock checks. We show that the fat-pointer solution significantly improves running time and memory overhead compared to the disjoint-metadata approach that provides the same level of protection. With empirical program data and hands-on experience porting real-world applications, we also show that our solution is practical in terms of backward compatibility -- one of the major complaints about fat pointers.
△ Less
Submitted 19 March, 2023; v1 submitted 26 August, 2022;
originally announced August 2022.
-
Fast Execute-Only Memory for Embedded Systems
Authors:
Zhuojia Shen,
Komail Dharsee,
John Criswell
Abstract:
Remote code disclosure attacks threaten embedded systems as they allow attackers to steal intellectual property or to find reusable code for use in control-flow hijacking attacks. Execute-only memory (XOM) prevents remote code disclosures, but existing XOM solutions either require a memory management unit that is not available on ARM embedded systems or incur significant overhead.
We present Pic…
▽ More
Remote code disclosure attacks threaten embedded systems as they allow attackers to steal intellectual property or to find reusable code for use in control-flow hijacking attacks. Execute-only memory (XOM) prevents remote code disclosures, but existing XOM solutions either require a memory management unit that is not available on ARM embedded systems or incur significant overhead.
We present PicoXOM: a fast and novel XOM system for ARMv7-M and ARMv8-M devices which leverages ARM's Data Watchpoint and Tracing unit along with the processor's simplified memory protection hardware. On average, PicoXOM incurs 0.33% performance overhead and 5.89% code size overhead on two benchmark suites and five real-world applications.
△ Less
Submitted 4 September, 2020; v1 submitted 29 May, 2020;
originally announced June 2020.
-
Silhouette: Efficient Protected Shadow Stacks for Embedded Systems
Authors:
Jie Zhou,
Yufei Du,
Zhuojia Shen,
Lele Ma,
John Criswell,
Robert J. Walls
Abstract:
Microcontroller-based embedded systems are increasingly used for applications that can have serious and immediate consequences if compromised---including automobile control systems, smart locks, drones, and implantable medical devices. Due to resource and execution-time constraints, C is the primary language used for programming these devices. Unfortunately, C is neither type-safe nor memory-safe,…
▽ More
Microcontroller-based embedded systems are increasingly used for applications that can have serious and immediate consequences if compromised---including automobile control systems, smart locks, drones, and implantable medical devices. Due to resource and execution-time constraints, C is the primary language used for programming these devices. Unfortunately, C is neither type-safe nor memory-safe, and control-flow hijacking remains a prevalent threat.
This paper presents Silhouette: a compiler-based defense that efficiently guarantees the integrity of return addresses, significantly reducing the attack surface for control-flow hijacking. Silhouette combines an incorruptible shadow stack for return addresses with checks on forward control flow and memory protection to ensure that all functions return to the correct dynamic caller. To protect its shadow stack, Silhouette uses store hardening, an efficient intra-address space isolation technique targeting various ARM architectures that leverages special store instructions found on ARM processors.
We implemented Silhouette for the ARMv7-M architecture, but our techniques are applicable to other common embedded ARM architectures. Our evaluation shows that Silhouette incurs a geometric mean of 1.3% and 3.4% performance overhead on two benchmark suites. Furthermore, we prototyped Silhouette-Invert, an alternative implementation of Silhouette, which incurs just 0.3% and 1.9% performance overhead, at the cost of a minor hardware change.
△ Less
Submitted 25 June, 2020; v1 submitted 26 October, 2019;
originally announced October 2019.
-
Restricting Control Flow During Speculative Execution with Venkman
Authors:
Zhuojia Shen,
Jie Zhou,
Divya Ojha,
John Criswell
Abstract:
Side-channel attacks such as Spectre that utilize speculative execution to steal application secrets pose a significant threat to modern computing systems. While program transformations can mitigate some Spectre attacks, more advanced attacks can divert control flow speculatively to bypass these protective instructions, rendering existing defenses useless.
In this paper, we present Venkman: a sy…
▽ More
Side-channel attacks such as Spectre that utilize speculative execution to steal application secrets pose a significant threat to modern computing systems. While program transformations can mitigate some Spectre attacks, more advanced attacks can divert control flow speculatively to bypass these protective instructions, rendering existing defenses useless.
In this paper, we present Venkman: a system that employs program transformation to completely thwart Spectre attacks that poison entries in the Branch Target Buffer (BTB) and the Return Stack Buffer (RSB). Venkman transforms code so that all valid targets of a control-flow transfer have an identical alignment in the virtual address space; it further transforms all branches to ensure that all entries added to the BTB and RSB are properly aligned. By transforming all code this way, Venkman ensures that, in any program wanting Spectre defenses, all control-flow transfers, including speculative ones, do not skip over protective instructions Venkman adds to the code segment to mitigate Spectre attacks. Unlike existing defenses, Venkman does not reduce sharing of the BTB and RSB and does not flush these structures, allowing safe sharing and reuse among programs while maintaining strong protection against Spectre attacks. We built a prototype of Venkman on an IBM POWER8 machine. Our evaluation on the SPEC benchmarks and selected applications shows that Venkman increases execution time to 3.47$\times$ on average and increases code size to 1.94$\times$ on average when it is used to ensure that fences are executed to mitigate Spectre attacks. Our evaluation also shows that Spectre-resistant Software Fault Isolation (SFI) built using Venkman incurs a geometric mean of 2.42$\times$ space overhead and 1.68$\times$ performance overhead.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Fast Intra-kernel Isolation and Security with IskiOS
Authors:
Spyridoula Gravani,
Mohammad Hedayati,
John Criswell,
Michael L. Scott
Abstract:
The kernels of operating systems such as Windows, Linux, and MacOS are vulnerable to control-flow hijacking. Defenses exist, but many require efficient intra-address-space isolation. Execute-only memory, for example, requires read protection on code segments, and shadow stacks require protection from buffer overwrites. Intel's Protection Keys for Userspace (PKU) could, in principle, provide the in…
▽ More
The kernels of operating systems such as Windows, Linux, and MacOS are vulnerable to control-flow hijacking. Defenses exist, but many require efficient intra-address-space isolation. Execute-only memory, for example, requires read protection on code segments, and shadow stacks require protection from buffer overwrites. Intel's Protection Keys for Userspace (PKU) could, in principle, provide the intra-kernel isolation needed by such defenses, but, when used as designed, it applies only to user-mode application code. This paper presents an unconventional approach to memory protection, allowing PKU to be used within the operating system kernel on existing Intel hardware, replacing the traditional user/supervisor isolation mechanism and, simultaneously, enabling efficient intra-kernel isolation. We call the resulting mechanism Protection Keys for Kernelspace (PKK). To demonstrate its utility and efficiency, we present a system we call IskiOS: a Linux variant featuring execute-only memory (XOM) and the first-ever race-free shadow stacks for x86-64. Experiments with the LMBench kernel microbenchmarks display a geometric mean overhead of about 11% for PKK and no additional overhead for XOM. IskiOS's shadow stacks bring the total to 22%. For full applications, experiments with the system benchmarks of the Phoronix test suite display negligible overhead for PKK and XOM, and less than 5% geometric mean overhead for shadow stacks.
△ Less
Submitted 2 August, 2021; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Codestitcher: Inter-Procedural Basic Block Layout Optimization
Authors:
Rahman Lavaee,
John Criswell,
Chen Ding
Abstract:
Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as compilers, browsers, interpreters, language VMs and shared libraries.
This paper presents Codestitcher, an inter-procedural basic block code layout optimizer whi…
▽ More
Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as compilers, browsers, interpreters, language VMs and shared libraries.
This paper presents Codestitcher, an inter-procedural basic block code layout optimizer which reorders basic blocks in an executable to benefit from better cache and TLB performance. Codestitcher provides a hierarchical framework which can be used to improve locality in various layers of the memory hierarchy. Our evaluation shows that Codestitcher improves the performance of the original program by 3\% to 25\% (on average, by 10\%) on 5 widely used applications with large code sizes: MySQL, Clang, Firefox, Apache, and Python. It gives an additional improvement of 4\% over LLVM's PGO and 3\% over PGO combined with the best function reordering technique.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.