-
VeriFence: Lightweight and Precise Spectre Defenses for Untrusted Linux Kernel Extensions
Authors:
Luis Gerhorst,
Henriette Herzog,
Peter Wägemann,
Maximilian Ott,
Rüdiger Kapitza,
Timo Hönig
Abstract:
High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are static…
▽ More
High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are statically analyzed for memory- and type-safety, which imposes some restrictions but allows for good expressiveness and high performance. However, to mitigate the Spectre vulnerabilities disclosed in 2018, defenses which reject potentially-dangerous programs had to be deployed. We find that this affects 31% to 54% of programs in a dataset with 844 real-world BPF programs from popular open-source projects. To solve this, users are forced to disable the defenses to continue using the programs, which puts the entire system at risk.
To enable secure and expressive untrusted Linux kernel extensions, we propose VeriFence, an enhancement to the kernel's Spectre defenses that reduces the number of BPF application programs rejected from 54% to zero. We measure VeriFence's overhead for all mainstream performance-sensitive applications of BPF (i.e., event tracing, profiling, and packet processing) and find that it improves significantly upon the status-quo where affected BPF programs are either unusable or enable transient execution attacks on the kernel.
△ Less
Submitted 8 January, 2025; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Migration-Based Synchronization
Authors:
Stefan Reif,
Phillip Raffeck,
Luis Gerhorst,
Wolfgang Schröder-Preikschat,
Timo Hönig
Abstract:
A fundamental challenge in multi- and many-core systems is the correct execution of concurrent access to shared data. A common drawback from existing synchronization mechanisms is the loss of data locality as the shared data is transferred between the accessing cores. In real-time systems, this is especially important as knowledge about data access times is crucial to establish bounds on execution…
▽ More
A fundamental challenge in multi- and many-core systems is the correct execution of concurrent access to shared data. A common drawback from existing synchronization mechanisms is the loss of data locality as the shared data is transferred between the accessing cores. In real-time systems, this is especially important as knowledge about data access times is crucial to establish bounds on execution times and guarantee the meeting of deadlines.We propose in this paper a refinement of our previously sketched approach of Migration-Based Synchronization (MBS) as well as its first practical implementation. The core concept of MBS is the replacement of data migration with control-flow migration to achieve synchronized memory accesses with guaranteed data locality. This leads to both shorter and more predictable execution times for critical sections. As MBS can be used as a substitute for classical locks, it can be employed in legacy applications without code alterations.We further examine how the gained data locality improves the results of worst-case timing analyses and results in tighter bounds on execution and response time. We reason about the similarity of MBS to existing synchronization approaches and how it enables us to reuse existing analysis techniques.Finally, we evaluate our prototype implementation, showing that MBS can exploit data locality with similar overheads as traditional locking mechanisms.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
AnyCall: Fast and Flexible System-Call Aggregation
Authors:
Luis Gerhorst,
Benedict Herzog,
Stefan Reif,
Wolfgang Schröder-Preikschat,
Timo Hönig
Abstract:
Operating systems rely on system calls to allow the controlled communication of isolated processes with the kernel and other processes. Every system call includes a processor mode switch from the unprivileged user mode to the privileged kernel mode. Although processor mode switches are the essential isolation mechanism to guarantee the system's integrity, they induce direct and indirect performanc…
▽ More
Operating systems rely on system calls to allow the controlled communication of isolated processes with the kernel and other processes. Every system call includes a processor mode switch from the unprivileged user mode to the privileged kernel mode. Although processor mode switches are the essential isolation mechanism to guarantee the system's integrity, they induce direct and indirect performance costs as they invalidate parts of the processor state. In recent years, high-performance networks and storage hardware has made the user/kernel transition overhead the bottleneck for IO-heavy applications. To make matters worse, security vulnerabilities in modern processors (e.g., Meltdown) have prompted kernel mitigations that further increase the transition overhead. To decouple system calls from user/kernel transitions we propose AnyCall, which uses an in-kernel compiler to execute safety-checked user bytecode in kernel mode. This allows for very fast system calls interleaved with error checking and processing logic using only a single user/kernel transition. We have implemented AnyCall based on the Linux kernel's eBPF subsystem. Our evaluation demonstrates that system call bursts are up to 55 times faster using AnyCall and that real-world applications can be sped up by 24% even if only a minimal part of their code is run by AnyCall.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
$Δ$elta: Differential Energy-Efficiency, Latency, and Timing Analysis for Real-Time Networks
Authors:
Stefan Reif,
Andreas Schmidt,
Timo Hönig,
Thorsten Herfet,
Wolfgang Schröder-Preikschat
Abstract:
The continuously increasing degree of automation in many areas (e.g. manufacturing engineering, public infrastructure) lead to the construction of cyber-physical systems and cyber-physical networks. To both, time and energy are the most critical operating resources. Considering for instance the Tactile Internet specification, end-to-end latencies in these systems must be below 1ms, which means tha…
▽ More
The continuously increasing degree of automation in many areas (e.g. manufacturing engineering, public infrastructure) lead to the construction of cyber-physical systems and cyber-physical networks. To both, time and energy are the most critical operating resources. Considering for instance the Tactile Internet specification, end-to-end latencies in these systems must be below 1ms, which means that both communication and system latencies are in the same order of magnitude and must be predictably low. As control loops are commonly handled over different variants of network infrastructure (e.g. mobile and fibre links) particular attention must be payed to the design of reliable, yet fast and energy-efficient data-transmission channels that are robust towards unexpected transmission failures. As design goals are often conflicting (e.g. high performance vs. low energy), it is necessary to analyze and investigate trade-offs with regards to design decisions during the construction of cyber-physical networks. In this paper, we present $Δ$elta, an approach towards a tool-supported construction process for cyber-physical networks. $Δ$elta extends the previously presented X-Lap tool by new analysis features, but keeps the original measurements facilities unchanged. $Δ$elta jointly analyzes and correlates the runtime behavior (i.e. performance, latency) and energy demand of individual system components. It provides an automated analysis with precise thread-local time interpolation, control-flow extraction, and examination of latency criticality. We further demonstrate the applicability of $Δ$elta with an evaluation of a prototypical implementation.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
X-Lap: A Systems Approach for Cross-Layer Profiling and Latency Analysis for Cyber-Physical Networks
Authors:
Stefan Reif,
Andreas Schmidt,
Timo Hönig,
Thorsten Herfet,
Wolfgang Schröder-Preikschat
Abstract:
Networked control applications for cyber-physical networks demand predictable and reliable real-time communication. Applications of this domain have to cooperate with network protocols, the operating system, and the hardware to improve safety properties and increase resource efficiency. In consequence, a cross-layer approach is necessary for the design and holistic optimisation of cyber-physical s…
▽ More
Networked control applications for cyber-physical networks demand predictable and reliable real-time communication. Applications of this domain have to cooperate with network protocols, the operating system, and the hardware to improve safety properties and increase resource efficiency. In consequence, a cross-layer approach is necessary for the design and holistic optimisation of cyber-physical systems and networks. This paper presents X-Lap, a cross-layer, inter-host timing analysis tool tailored to the needs of real-time communication. We use X-Lap to evaluate the timing behaviour of a reliable real-time communication protocol. Our analysis identifies parts of the protocol which are responsible for unwanted jitter. To system designers, X-Lap provides useful support for the design and evaluation of networked real-time systems.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.