-
Automatic Integration of BFT State-Machine Replication into IoT Systems
Authors:
Christian Berger,
Hans P. Reiser,
Franz J. Hauck,
Florian Held,
Jörg Domaschka
Abstract:
Byzantine fault tolerance (BFT) can preserve the availability and integrity of IoT systems where single components may suffer from random data corruption or attacks that can expose them to malicious behavior. While state-of-the-art BFT state-machine replication (SMR) libraries are often tailored to fit a standard request-response interaction model with dedicated client-server roles, in our design,…
▽ More
Byzantine fault tolerance (BFT) can preserve the availability and integrity of IoT systems where single components may suffer from random data corruption or attacks that can expose them to malicious behavior. While state-of-the-art BFT state-machine replication (SMR) libraries are often tailored to fit a standard request-response interaction model with dedicated client-server roles, in our design, we employ an IoT-fit interaction model that assumes a loosly-coupled, event-driven interaction between arbitrarily wired IoT components. In this paper, we explore the possibility of automating and streamlining the complete process of integrating BFT SMR into a component-based IoT execution environment. Our main goal is providing simplicity for the developer: We strive to decouple the specification of a logical application architecture from the difficulty of incorporating BFT replication mechanisms into it. Thus, our contributions address the automated configuration, re-wiring and deployment of IoT components, and their replicas, within a component-based, event-driven IoT platform.
△ Less
Submitted 6 July, 2022; v1 submitted 1 July, 2022;
originally announced July 2022.
-
A Survey on Resilience in the IoT: Taxonomy, Classification and Discussion of Resilience Mechanisms
Authors:
Christian Berger,
Philipp Eichhammer,
Hans P. Reiser,
Jörg Domaschka,
Franz J. Hauck,
Gerhard Habiger
Abstract:
Internet-of-Things (IoT) ecosystems tend to grow both in scale and complexity as they consist of a variety of heterogeneous devices, which span over multiple architectural IoT layers (e.g., cloud, edge, sensors). Further, IoT systems increasingly demand the resilient operability of services as they become part of critical infrastructures. This leads to a broad variety of research works that aim to…
▽ More
Internet-of-Things (IoT) ecosystems tend to grow both in scale and complexity as they consist of a variety of heterogeneous devices, which span over multiple architectural IoT layers (e.g., cloud, edge, sensors). Further, IoT systems increasingly demand the resilient operability of services as they become part of critical infrastructures. This leads to a broad variety of research works that aim to increase the resilience of these systems. In this paper, we create a systematization of knowledge about existing scientific efforts of making IoT systems resilient. In particular, we first discuss the taxonomy and classification of resilience and resilience mechanisms and subsequently survey state-of-the-art resilience mechanisms that have been proposed by research work and are applicable to IoT. As part of the survey, we also discuss questions that focus on the practical aspects of resilience, e.g., which constraints resilience mechanisms impose on developers when designing resilient systems by incorporating a specific mechanism into IoT systems.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
A Case for Data Centre Traffic Management on Software Programmable Ethernet Switches
Authors:
Kamil Tokmakov,
Mitalee Sarker,
Jörg Domaschka,
Stefan Wesner
Abstract:
Virtualisation first and cloud computing later has led to a consolidation of workload in data centres that also comprises latency-sensitive application domains such as High Performance Computing and telecommunication. These types of applications require strict latency guarantees to maintain their Quality of Service. In virtualised environments with their churn, this demands for adaptability and fl…
▽ More
Virtualisation first and cloud computing later has led to a consolidation of workload in data centres that also comprises latency-sensitive application domains such as High Performance Computing and telecommunication. These types of applications require strict latency guarantees to maintain their Quality of Service. In virtualised environments with their churn, this demands for adaptability and flexibility to satisfy. At the same time, the mere scale of the infrastructures favours commodity (Ethernet) over specialised (Infiniband) hardware. For that purpose, this paper introduces a novel traffic management algorithm that combines Rate-limited Strict Priority and Deficit round-robin for latency-aware and fair scheduling respectively. In addition, we present an implementation of this algorithm on the bmv2 P4 software switch by evaluating it against standard priority-based and best-effort scheduling.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Rapid Testing of IaaS Resource Management Algorithms via Cloud Middleware Simulation
Authors:
Christian Stier,
Jörg Domaschka,
Anne Koziolek,
Sebastian Krach,
Jakub Krzywda,
Ralf Reussner
Abstract:
Infrastructure as a Service (IaaS) Cloud services allow users to deploy distributed applications in a virtualized environment without having to customize their applications to a specific Platform as a Service (PaaS) stack. It is common practice to host multiple Virtual Machines (VMs) on the same server to save resources. Traditionally, IaaS data center management required manual effort for optimiz…
▽ More
Infrastructure as a Service (IaaS) Cloud services allow users to deploy distributed applications in a virtualized environment without having to customize their applications to a specific Platform as a Service (PaaS) stack. It is common practice to host multiple Virtual Machines (VMs) on the same server to save resources. Traditionally, IaaS data center management required manual effort for optimization, e.g. by consolidating VM placement based on changes in usage patterns. Many resource management algorithms and frameworks have been developed to automate this process. Resource management algorithms are typically tested via experimentation or using simulation. The main drawback of both approaches is the high effort required to conduct the testing. Existing Cloud or IaaS simulators require the algorithm engineer to reimplement their algorithm against the simulator's API. Furthermore, the engineer manually needs to define the workload model used for algorithm testing. We propose an approach for the simulative analysis of IaaS Cloud infrastructure that allows algorithm engineers and data center operators to eval- uate optimization algorithms without investing additional effort to reimplement them in a simulation environment. By leveraging runtime monitoring data, we automatically construct the simula- tion models used to test the algorithms. Our validation shows that algorithm tests conducted using our IaaS Cloud simulator match the measured behavior on actual hardware.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.