-
Out-of-Things Debugging: A Live Debugging Approach for Internet of Things
Authors:
Carlos Rojas Castillo,
Matteo Marra,
Jim Bauwens,
Elisa Gonzalez Boix
Abstract:
Context: Internet of Things (IoT) has become an important kind of distributed systems thanks to the wide-spread of cheap embedded devices equipped with different networking technologies. Although ubiquitous, developing IoT systems remains challenging.
Inquiry: A recent field study with 194 IoT developers identifies debugging as one of the main challenges faced when developing IoT systems. This c…
▽ More
Context: Internet of Things (IoT) has become an important kind of distributed systems thanks to the wide-spread of cheap embedded devices equipped with different networking technologies. Although ubiquitous, developing IoT systems remains challenging.
Inquiry: A recent field study with 194 IoT developers identifies debugging as one of the main challenges faced when developing IoT systems. This comes from the lack of debugging tools taking into account the unique properties of IoT systems such as non-deterministic data, and hardware restricted devices. On the one hand, offline debuggers allow developers to analyse post-failure recorded program information, but impose too much overhead on the devices while generating such information.
Furthermore, the analysis process is also time-consuming and might miss contextual information relevant to find the root cause of bugs. On the other hand, online debuggers do allow debugging a program upon a failure while providing contextual information (e.g., stack trace). In particular, remote online debuggers enable debugging of devices without physical access to them. However, they experience debugging interference due to network delays which complicates bug reproducibility, and have limited support for dynamic software updates on remote devices.
Approach: This paper proposes out-of-things debugging, an online debugging approach especially designed for IoT systems. The debugger is always-on as it ensures constant availability to for instance debug post-deployment situations. Upon a failure or breakpoint, out-of-things debugging moves the state of a deployed application to the developer's machine. Developers can then debug the application locally by applying operations (e.g., step commands) to the retrieved state. Once debugging is finished, developers can commit bug fixes to the device through live update capabilities. Finally, by means of a fine-grained flexible interface for accessing remote resources, developers have full control over the debugging overhead imposed on the device, and the access to device hardware resources (e.g., sensors) needed during local debugging.
Knowledge: Out-of-things debugging maintains good properties of remote debugging as it does not require physical access to the device to debug it, while reducing debugging interference since there are no network delays on operations (e.g., stepping) issued on the debugger since those happen locally. Furthermore, device resources are only accessed when requested by the user which further mitigates overhead and opens avenues for mocking or simulation of non-accessed resources.
Grounding: We implemented an out-of-things debugger as an extension to a WebAssembly Virtual Machine and benchmarked its suitability for IoT. In particular, we compared our solution to remote debugging alternatives based on metrics such as network overhead, memory usage, scalability, and usability in production settings. From the benchmarks, we conclude that our debugger exhibits competitive performance in addition to confining overhead without sacrificing debugging convenience and flexibility.
Importance: Out-of-things debugging enables debugging of IoT systems by means of classical online operations (e.g., stepwise execution) while addressing IoT-specific concerns (e.g., hardware limitations). We show that having the debugger always-on does not have to come at cost of performance loss or increased overhead but instead can enforce a smooth-going and flexible debugging experience of IoT systems.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
VeriFx: Correct Replicated Data Types for the Masses
Authors:
Kevin De Porre,
Carla Ferreira,
Elisa Gonzalez Boix
Abstract:
Distributed systems adopt weak consistency to ensure high availability and low latency, but state convergence is hard to guarantee due to conflicts. Experts carefully design replicated data types (RDTs) that resemble sequential data types and embed conflict resolution mechanisms that ensure convergence. Designing RDTs is challenging as their correctness depends on subtleties such as the ordering o…
▽ More
Distributed systems adopt weak consistency to ensure high availability and low latency, but state convergence is hard to guarantee due to conflicts. Experts carefully design replicated data types (RDTs) that resemble sequential data types and embed conflict resolution mechanisms that ensure convergence. Designing RDTs is challenging as their correctness depends on subtleties such as the ordering of concurrent operations. Currently, researchers manually verify RDTs, either by paper proofs or using proof assistants. Unfortunately, paper proofs are subject to reasoning flaws and mechanized proofs verify a formalisation instead of a real-world implementation. Furthermore, writing mechanized proofs is reserved to verification experts and is extremely time consuming. To simplify the design, implementation, and verification of RDTs, we propose VeriFx, a high-level programming language with automated proof capabilities. VeriFx lets programmers implement RDTs atop functional collections and express correctness properties that are verified automatically. Verified RDTs can be transpiled to mainstream languages (currently Scala or JavaScript). VeriFx also provides libraries for implementing and verifying Conflict-free Replicated Data Types (CRDTs) and Operational Transformation (OT) functions. These libraries implement the general execution model of those approaches and define their correctness properties. We use the libraries to implement and verify an extensive portfolio of 35 CRDTs and reproduce a study on the correctness of OT functions.
△ Less
Submitted 6 December, 2024; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Deriving Static Security Testing from Runtime Security Protection for Web Applications
Authors:
Angel Luis Scull Pupo,
Jens Nicolay,
Elisa Gonzalez Boix
Abstract:
Context: Static Application Security Testing (SAST) and Runtime Application Security Protection (RASP) are important and complementary techniques used for detecting and enforcing application-level security policies in web applications.
Inquiry: The current state of the art, however, does not allow a safe and efficient combination of SAST and RASP based on a shared set of security policies, forci…
▽ More
Context: Static Application Security Testing (SAST) and Runtime Application Security Protection (RASP) are important and complementary techniques used for detecting and enforcing application-level security policies in web applications.
Inquiry: The current state of the art, however, does not allow a safe and efficient combination of SAST and RASP based on a shared set of security policies, forcing developers to reimplement and maintain the same policies and their enforcement code in both tools.
Approach: In this work, we present a novel technique for deriving SAST from an existing RASP mechanism by using a two-phase abstract interpretation approach in the SAST component that avoids duplicating the effort of specifying security policies and implementing their semantics. The RASP mechanism enforces security policies by instrumenting a base program to trap security-relevant operations and execute the required policy enforcement code. The static analysis of security policies is then obtained from the RASP mechanism by first statically analyzing the base program without any traps. The results of this first phase are used in a second phase to detect trapped operations and abstractly execute the associated and unaltered RASP policy enforcement code.
Knowledge: Splitting the analysis into two phases enables running each phase with a specific analysis configuration, rendering the static analysis approach tractable while maintaining sufficient precision.
Grounding: We validate the applicability of our two-phase analysis approach by using it to both dynamically enforce and statically detect a range of security policies found in related work. Our experiments suggest that our two-phase analysis can enable faster and more precise policy violation detection compared to analyzing the full instrumented application under a single analysis configuration.
Importance: Deriving a SAST component from a RASP mechanism enables equivalent semantics for the security policies across the static and dynamic contexts in which policies are verified during the software development lifecycle. Moreover, our two-phase abstract interpretation approach does not require RASP developers to reimplement the enforcement code for static analysis.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Capturing High-level Nondeterminism in Concurrent Programs for Practical Concurrency Model Agnostic Record & Replay
Authors:
Dominik Aumayr,
Stefan Marr,
Sophie Kaleba,
Elisa Gonzalez Boix,
Hanspeter Mössenböck
Abstract:
With concurrency being integral to most software systems, developers combine high-level concurrency models in the same application to tackle each problem with appropriate abstractions. While languages and libraries offer a wide range of concurrency models, debugging support for applications that combine them has not yet gained much attention. Record & replay aids debugging by deterministically rep…
▽ More
With concurrency being integral to most software systems, developers combine high-level concurrency models in the same application to tackle each problem with appropriate abstractions. While languages and libraries offer a wide range of concurrency models, debugging support for applications that combine them has not yet gained much attention. Record & replay aids debugging by deterministically reproducing recorded bugs, but is typically designed for a single concurrency model only.
This paper proposes a practical concurrency-model-agnostic record & replay approach for multi-paradigm concurrent programs, i.e. applications that combine concurrency models. Our approach traces high-level nondeterministic events by using a uniform model-agnostic trace format and infrastructure. This enables orderingbased record & replay support for a wide range of concurrency models, and thereby enables debugging of applications that combine them. In addition, it allows language implementors to add new concurrency models and reuse the model-agnostic record & replay support.
We argue that a concurrency-model-agnostic record & replay is practical and enables advanced debugging support for a wide range of concurrency models. The evaluation shows that our approach is expressive and flexible enough to support record & replay of applications using threads & locks, communicating event loops, communicating sequential processes, software transactional memory and combinations of those concurrency models. For the actor model, we reach recording performance competitive with an optimized special-purpose record & replay solution. The average recording overhead on the Savina actor benchmark suite is 10% (min. 0%, max. 23%). The performance for other concurrency models and combinations thereof is at a similar level.
We believe our concurrency-model-agnostic approach helps developers of applications that mix and match concurrency models. We hope that this substrate inspires new tools and languages making building and maintaining of multi-paradigm concurrent applications simpler and safer.
△ Less
Submitted 26 February, 2021;
originally announced March 2021.
-
Asynchronous Snapshots of Actor Systems for Latency-Sensitive Applications
Authors:
Dominik Aumayr,
Stefan Marr,
Elisa Gonzalez Boix,
Hanspeter Mössenböck
Abstract:
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging purposes. A key issue is that snapshotting blocks all other operations. In modern latency-sensitive applications, stopping the application to persist its state ne…
▽ More
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging purposes. A key issue is that snapshotting blocks all other operations. In modern latency-sensitive applications, stopping the application to persist its state needs to be avoided, because users may not tolerate the increased request latency. In order to minimize the impact of snapshotting on request latency, our approach persists the application's state asynchronously by capturing partial heaps, completing snapshots step by step. Additionally, our solution is transparent and supports arbitrary object graphs. We prototyped our snapshotting approach on top of the Truffle/Graal platform and evaluated it with the Savina benchmarks and the Acme Air microservice application. When performing a snapshot every thousand Acme Air requests, the number of slow requests ( 0.007% of all requests) with latency above 100ms increases by 5.43%. Our Savina microbenchmark results detail how different utilization patterns impact snapshotting cost. To the best of our knowledge, this is the first system that enables asynchronous snapshotting of actor applications, i.e. without stop-the-world synchronization, and thereby minimizes the impact on latency. We thus believe it enables new deployment and debugging options for actor systems.
△ Less
Submitted 18 September, 2019; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Out-Of-Place debugging: a debugging architecture to reduce debugging interference
Authors:
Matteo Marra,
Guillermo Polito,
Elisa Gonzalez Boix
Abstract:
Context. Recent studies show that developers spend most of their programming time testing, verifying and debugging software. As applications become more and more complex, developers demand more advanced debugging support to ease the software development process.
Inquiry. Since the 70's many debugging solutions were introduced. Amongst them, online debuggers provide a good insight on the conditio…
▽ More
Context. Recent studies show that developers spend most of their programming time testing, verifying and debugging software. As applications become more and more complex, developers demand more advanced debugging support to ease the software development process.
Inquiry. Since the 70's many debugging solutions were introduced. Amongst them, online debuggers provide a good insight on the conditions that led to a bug, allowing inspection and interaction with the variables of the program. However, most of the online debugging solutions introduce \textit{debugging interference} to the execution of the program, i.e. pauses, latency, and evaluation of code containing side-effects.
Approach. This paper investigates a novel debugging technique called \outofplace debugging. The goal is to minimize the debugging interference characteristic of online debugging while allowing online remote capabilities. An \outofplace debugger transfers the program execution and application state from the debugged application to the debugger application, both running in different processes.
Knowledge. On the one hand, \outofplace debugging allows developers to debug applications remotely, overcoming the need of physical access to the machine where the debugged application is running. On the other hand, debugging happens locally on the remote machine avoiding latency. That makes it suitable to be deployed on a distributed system and handle the debugging of several processes running in parallel.
Grounding. We implemented a concrete out-of-place debugger for the Pharo Smalltalk programming language. We show that our approach is practical by performing several benchmarks, comparing our approach with a classic remote online debugger. We show that our prototype debugger outperforms by a 1000 times a traditional remote debugger in several scenarios. Moreover, we show that the presence of our debugger does not impact the overall performance of an application.
Importance. This work combines remote debugging with the debugging experience of a local online debugger. Out-of-place debugging is the first online debugging technique that can minimize debugging interference while debugging a remote application. Yet, it still keeps the benefits of online debugging ( e.g. step-by-step execution). This makes the technique suitable for modern applications which are increasingly parallel, distributed and reactive to streams of data from various sources like sensors, UI, network, etc.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Efficient and Deterministic Record & Replay for Actor Languages
Authors:
Dominik Aumayr,
Stefan Marr,
Clément Béra,
Elisa Gonzalez Boix,
Hanspeter Mössenböck
Abstract:
With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which is at the wrong abstraction level a…
▽ More
With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which is at the wrong abstraction level and may even introduce additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production.
△ Less
Submitted 20 August, 2018; v1 submitted 16 May, 2018;
originally announced May 2018.
-
A Study of Concurrency Bugs and Advanced Development Support for Actor-based Programs
Authors:
Carmen Torres Lopez,
Stefan Marr,
Hanspeter Mössenböck,
Elisa Gonzalez Boix
Abstract:
The actor model is an attractive foundation for developing concurrent applications because actors are isolated concurrent entities that communicate through asynchronous messages and do not share state. Thereby, they avoid concurrency bugs such as data races, but are not immune to concurrency bugs in general. This study taxonomizes concurrency bugs in actor-based programs reported in literature. Fu…
▽ More
The actor model is an attractive foundation for developing concurrent applications because actors are isolated concurrent entities that communicate through asynchronous messages and do not share state. Thereby, they avoid concurrency bugs such as data races, but are not immune to concurrency bugs in general. This study taxonomizes concurrency bugs in actor-based programs reported in literature. Furthermore, it analyzes the bugs to identify the patterns causing them as well as their observable behavior. Based on this taxonomy, we further analyze the literature and find that current approaches to static analysis and testing focus on communication deadlocks and message protocol violations. However, they do not provide solutions to identify livelocks and behavioral deadlocks. The insights obtained in this study can be used to improve debugging support for actor-based programs with new debugging techniques to identify the root cause of complex concurrency bugs.
△ Less
Submitted 24 April, 2018; v1 submitted 22 June, 2017;
originally announced June 2017.
-
A Concurrency-Agnostic Protocol for Multi-Paradigm Concurrent Debugging Tools
Authors:
Stefan Marr,
Carmen Torres Lopez,
Dominik Aumayr,
Elisa Gonzalez Boix,
Hanspeter Mössenböck
Abstract:
Today's complex software systems combine high-level concurrency models. Each model is used to solve a specific set of problems. Unfortunately, debuggers support only the low-level notions of threads and shared memory, forcing developers to reason about these notions instead of the high-level concurrency models they chose.
This paper proposes a concurrency-agnostic debugger protocol that decouple…
▽ More
Today's complex software systems combine high-level concurrency models. Each model is used to solve a specific set of problems. Unfortunately, debuggers support only the low-level notions of threads and shared memory, forcing developers to reason about these notions instead of the high-level concurrency models they chose.
This paper proposes a concurrency-agnostic debugger protocol that decouples the debugger from the concurrency models employed by the target application. As a result, the underlying language runtime can define custom breakpoints, stepping operations, and execution events for each concurrency model it supports, and a debugger can expose them without having to be specifically adapted.
We evaluated the generality of the protocol by applying it to SOMns, a Newspeak implementation, which supports a diversity of concurrency models including communicating sequential processes, communicating event loops, threads and locks, fork/join parallelism, and software transactional memory. We implemented 21 breakpoints and 20 stepping operations for these concurrency models. For none of these, the debugger needed to be changed. Furthermore, we visualize all concurrent interactions independently of a specific concurrency model. To show that tooling for a specific concurrency model is possible, we visualize actor turns and message sends separately.
△ Less
Submitted 29 October, 2017; v1 submitted 1 June, 2017;
originally announced June 2017.