-
Live Objects All The Way Down: Removing the Barriers between Applications and Virtual Machines
Authors:
Javier E. Pimás,
Stefan Marr,
Diego Garbervetsky
Abstract:
Object-oriented languages often use virtual machines (VMs) that provide mechanisms such as just-in-time (JIT) compilation and garbage collection (GC). These VM components are typically implemented in a separate layer, isolating them from the application. While this approach brings the software engineering benefits of clear separation and decoupling, it introduces barriers for both understanding VM…
▽ More
Object-oriented languages often use virtual machines (VMs) that provide mechanisms such as just-in-time (JIT) compilation and garbage collection (GC). These VM components are typically implemented in a separate layer, isolating them from the application. While this approach brings the software engineering benefits of clear separation and decoupling, it introduces barriers for both understanding VM behavior and evolving the VM implementation. For example, the GC and JIT compiler are typically fixed at VM build time, limiting arbitrary adaptation at run time. Furthermore, because of this separation, the implementation of the VM cannot typically be inspected and debugged in the same way as application code, enshrining a distinction in easy-to-work-with application and hard-to-work-with VM code. These characteristics pose a barrier for application developers to understand the engine on top of which their own code runs, and fosters a knowledge gap that prevents application developers to change the VM.
We propose Live Metacircular Runtimes (LMRs) to overcome this problem. LMRs are language runtime systems that seamlessly integrate the VM into the application in live programming environments. Unlike classic metacircular approaches, we propose to completely remove
the separation between application and VM. By systematically applying object-oriented design to VM components, we can build live runtime systems that are small and flexible enough, where VM engineers can benefit of live programming features such as short feedback loops, and application developers with fewer VM expertise can benefit of the stronger causal connections between their programs and the VM implementation.
To evaluate our proposal, we implemented Bee/LMR, a live VM for a Smalltalk-derivative environment in 22057 lines of code. We analyze case studies on tuning the garbage collector, avoiding recompilations by the just-in-time compiler, and adding support to optimize code with vector instructions to demonstrate the trade-offs of extending exploratory programming to VM development in the context of an industrial application used in production. Based on the case studies, we illustrate how our approach facilitates the daily development work of a small team of application developers.
Our approach enables VM developers to gain access to live programming tools traditionally reserved for application developers, while application developers can interact with the VM and modify it using the high-level tools they use every day. Both application and VM developers can seamlessly inspect, debug, understand, and modify the different parts of the VM with shorter feedback loops and higher-level tools.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Dynamic Slicing by On-demand Re-execution
Authors:
Ivan Postolski,
Victor Braberman,
Diego Garbervetsky,
Sebastian Uchitel
Abstract:
In this paper, we propose a novel approach that aims to offer an alternative to the prevalent paradigm to dynamic slicing construction. Dynamic slicing requires dynamic data and control dependencies that arise in an execution. During a single execution, memory reference information is recorded and then traversed to extract dependencies. Execute-once approaches and tools are challenged even by exec…
▽ More
In this paper, we propose a novel approach that aims to offer an alternative to the prevalent paradigm to dynamic slicing construction. Dynamic slicing requires dynamic data and control dependencies that arise in an execution. During a single execution, memory reference information is recorded and then traversed to extract dependencies. Execute-once approaches and tools are challenged even by executions of moderate size of simple and short programs. We propose to shift practical time complexity from execution size to slice size. In particular, our approach executes the program multiple times while tracking targeted information at each execution. We present a concrete algorithm that follows an on-demand re-execution paradigm that uses a novel concept of frontier dependency to incrementally build a dynamic slice. To focus dependency tracking, the algorithm relies on static analysis. We show results of an evaluation on the SV-COMP benchmark and Antrl4 unit tests that provide evidence that on-demand re-execution can provide performance gains particularly when slice size is small and execution size is large.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Focused Dynamic Slicing for Large Applications using an Abstract Memory-Model
Authors:
Alexis Soifer,
Diego Garbervetsky,
Victor Braberman,
Sebastian Uchitel
Abstract:
Dynamic slicing techniques compute program dependencies to find all statements that affect the value of a variable at a program point for a specific execution. Despite their many potential uses, applicability is limited by the fact that they typically cannot scale beyond small-sized applications. We believe that at the heart of this limitation is the use of memory references to identify data-depen…
▽ More
Dynamic slicing techniques compute program dependencies to find all statements that affect the value of a variable at a program point for a specific execution. Despite their many potential uses, applicability is limited by the fact that they typically cannot scale beyond small-sized applications. We believe that at the heart of this limitation is the use of memory references to identify data-dependencies. Particularly, working with memory references hinders distinct treatment of the code-to-be-sliced (e.g., classes the user has an interest in) from the rest of the code (including libraries and frameworks). The ability to perform a coarser-grained analysis for the code that is not under focus may provide performance gains and could become one avenue toward scalability. In this paper, we propose a novel approach that completely replaces memory reference registering and processing with a memory analysis model that works with program symbols (i.e., terms). In fact, this approach enables the alternative of not instrumenting -- thus, not generating any trace -- for code that is not part of the code-to-be-sliced. We report on an implementation of an abstract dynamic slicer for C\#, \textit{DynAbs}, and an evaluation that shows how large and relevant parts of Roslyn and Powershell -- two of the largest and modern C\# applications that can be found in GitHub -- can be sliced for their test cases assertions in at most a few minutes. We also show how reducing the code-to-be-sliced focus can bring important speedups with marginal relative precision loss.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
InspectJS: Leveraging Code Similarity and User-Feedback for Effective Taint Specification Inference for JavaScript
Authors:
Saikat Dutta,
Diego Garbervetsky,
Shuvendu Lahiri,
Max Schäfer
Abstract:
Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The ap…
▽ More
Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities.
But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort.
Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise.
In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Verification Coverage
Authors:
Rodrigo Castaño,
Victor Braberman,
Diego Garbervetsky,
Sebastian Uchitel
Abstract:
Software Model Checkers have shown outstanding performance improvements in recent times. Moreover, for specific use cases, formal verification techniques have shown to be highly effective, leading to a number of high-profile success stories. However, widespread adoption remains unlikely in the short term and one of the remaining obstacles in that direction is the vast number of instances which sof…
▽ More
Software Model Checkers have shown outstanding performance improvements in recent times. Moreover, for specific use cases, formal verification techniques have shown to be highly effective, leading to a number of high-profile success stories. However, widespread adoption remains unlikely in the short term and one of the remaining obstacles in that direction is the vast number of instances which software model checkers cannot fully analyze within reasonable memory and CPU bounds. The majority of verification tools fail to provide a measure of progress or any intermediate verification result when such situations occur.
Inspired in the success that coverage metrics have achieved in industry, we propose to adapt the definition of coverage to the context of verification. We discuss some of the challenges in pinning down a definition that resembles the deeply rooted semantics of test coverage. Subsequently we propose a definition for a broad family of verification techniques: those based on Abstract Reachability Trees. Moreover, we discuss a general approach to computing an under-approximation of such metric and a specific heuristic to improve the performance.
Finally, we conduct an empirical evaluation to assess the viability of our approach.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
Model Checker Execution Reports
Authors:
Rodrigo Castaño,
Victor Braberman,
Diego Garbervetsky,
Sebastian Uchitel
Abstract:
Software model checking constitutes an undecidable problem and, as such, even an ideal tool will in some cases fail to give a conclusive answer. In practice, software model checkers fail often and usually do not provide any information on what was effectively checked. The purpose of this work is to provide a conceptual framing to extend software model checkers in a way that allows users to access…
▽ More
Software model checking constitutes an undecidable problem and, as such, even an ideal tool will in some cases fail to give a conclusive answer. In practice, software model checkers fail often and usually do not provide any information on what was effectively checked. The purpose of this work is to provide a conceptual framing to extend software model checkers in a way that allows users to access information about incomplete checks. We characterize the information that model checkers themselves can provide, in terms of analyzed traces, i.e. sequences of statements, and safe cones, and present the notion of execution reports, which we also formalize. We instantiate these concepts for a family of techniques based on Abstract Reachability Trees and implement the approach using the software model checker CPAchecker. We evaluate our approach empirically and provide examples to illustrate the execution reports produced and the information that can be extracted.
△ Less
Submitted 17 August, 2017; v1 submitted 22 July, 2016;
originally announced July 2016.
-
The DynAlloy Visualizer
Authors:
Pablo Bendersky,
Juan Pablo Galeotti,
Diego Garbervetsky
Abstract:
We present an extension to the DynAlloy tool to navigate DynAlloy counterexamples: the DynAlloy Visualizer. The user interface mimics the functionality of a programming language debugger. Without this tool, a DynAlloy user is forced to deal with the internals of the Alloy intermediate representation in order to debug a flaw in her model.
We present an extension to the DynAlloy tool to navigate DynAlloy counterexamples: the DynAlloy Visualizer. The user interface mimics the functionality of a programming language debugger. Without this tool, a DynAlloy user is forced to deal with the internals of the Alloy intermediate representation in order to debug a flaw in her model.
△ Less
Submitted 5 January, 2014;
originally announced January 2014.
-
On Verifying Resource Contracts using Code Contracts
Authors:
Rodrigo Castaño,
Juan Pablo Galeotti,
Diego Garbervetsky,
Jonathan Tapicer,
Edgardo Zoppi
Abstract:
In this paper we present an approach to check resource consumption contracts using an off-the-shelf static analyzer.
We propose a set of annotations to support resource usage specifications, in particular, dynamic memory consumption constraints. Since dynamic memory may be recycled by a memory manager, the consumption of this resource is not monotone. The specification language can express both…
▽ More
In this paper we present an approach to check resource consumption contracts using an off-the-shelf static analyzer.
We propose a set of annotations to support resource usage specifications, in particular, dynamic memory consumption constraints. Since dynamic memory may be recycled by a memory manager, the consumption of this resource is not monotone. The specification language can express both memory consumption and lifetime properties in a modular fashion.
We develop a proof-of-concept implementation by extending Code Contracts' specification language. To verify the correctness of these annotations we rely on the Code Contracts static verifier and a points-to analysis. We also briefly discuss possible extensions of our approach to deal with non-linear expressions.
△ Less
Submitted 5 January, 2014;
originally announced January 2014.
-
Waterfall: Primitives Generation on the Fly
Authors:
Guido Chari,
Diego Garbervetsky,
Camillo Bruni,
Marcus Denker,
Stéphane Ducasse
Abstract:
Modern languages are typically supported by managed runtimes (Virtual Machines). Since VMs have to deal with many concepts such as memory management, abstract execution model and scheduling, they tend to be very complex. Additionally, VMs have to meet strong performance requirements. This demand of performance is one of the main reasons why many VMs are built statically. Thus, design decisions are…
▽ More
Modern languages are typically supported by managed runtimes (Virtual Machines). Since VMs have to deal with many concepts such as memory management, abstract execution model and scheduling, they tend to be very complex. Additionally, VMs have to meet strong performance requirements. This demand of performance is one of the main reasons why many VMs are built statically. Thus, design decisions are frozen at compile time preventing changes at runtime. One clear example is the impossibility to dynamically adapt or change primitives of the VM once it has been compiled. In this work we present a toolchain that allows for altering and configuring components such as primitives and plug-ins at runtime. The main contribution is Waterfall, a dynamic and reflective translator from Slang, a restricted subset of Smalltalk, to native code. Waterfall generates primitives on demand and executes them on the fly. We validate our approach by implementing dynamic primitive modification and runtime customization of VM plug-ins.
△ Less
Submitted 10 October, 2013;
originally announced October 2013.
-
Reducing the Number of Annotations in a Verification-oriented Imperative Language
Authors:
Guido de Caso,
Diego Garbervetsky,
Daniel Gorín
Abstract:
Automated software verification is a very active field of research which has made enormous progress both in theoretical and practical aspects. Recently, an important amount of research effort has been put into applying these techniques on top of mainstream programming languages. These languages typically provide powerful features such as reflection, aliasing and polymorphism which are handy for pr…
▽ More
Automated software verification is a very active field of research which has made enormous progress both in theoretical and practical aspects. Recently, an important amount of research effort has been put into applying these techniques on top of mainstream programming languages. These languages typically provide powerful features such as reflection, aliasing and polymorphism which are handy for practitioners but, in contrast, make verification a real challenge. In this work we present Pest, a simple experimental, while-style, multiprocedural, imperative programming language which was conceived with verifiability as one of its main goals. This language forces developers to concurrently think about both the statements needed to implement an algorithm and the assertions required to prove its correctness. In order to aid programmers, we propose several techniques to reduce the number and complexity of annotations required to successfully verify their programs. In particular, we show that high-level iteration constructs may alleviate the need for providing complex loop annotations.
△ Less
Submitted 15 November, 2010;
originally announced November 2010.