Search | arXiv e-print repository

Complete the Cycle: Reachability Types with Expressive Cyclic References

Authors: Haotian Deng, Siyuan He, Songlin Jia, Yuyan Bao, Tiark Rompf

Abstract: Reachability Types (RT) are a qualified type system for tracking aliasing and separation in functional and higher-order programming. By formalizing resource reachability with a sound static type system, RT enable higher-order programming patterns with runtime safety and non-interference guarantees. However, previous RT systems have been based on calculi that restrict cyclic dependencies and are sh… ▽ More Reachability Types (RT) are a qualified type system for tracking aliasing and separation in functional and higher-order programming. By formalizing resource reachability with a sound static type system, RT enable higher-order programming patterns with runtime safety and non-interference guarantees. However, previous RT systems have been based on calculi that restrict cyclic dependencies and are shown to be terminating in the absence of built-in recursive constructs. While termination is sometimes a desirable property, simplifying reasoning and ensuring predictable behavior, it implies an inability to encode expressive programs involving non-termination and advanced recursive patterns, such as mutual recursion and various fixed-point combinators. In this paper, we address this limitation by extending RT with an expressive cyclic reference type that permits the formation of cyclic dependencies through the store, thereby allowing the system to encode recursive programming patterns without relying on extra built-in constructs. In addition, we redesign qualifier typing in the reference introduction rule, allowing separate references to point to a shared and tracked referent. We formalize the system as the $λ^{\circ}_{<:}$-calculus, with a mechanized soundness proof via the standard progress and preservation lemmas. As a demonstration, we implement a well-typed fixpoint operator, proving that recursive patterns can be encoded using the novel cyclic reference type. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2404.08217 [pdf, other]

Escape with Your Self: A Solution to the Avoidance Problem with Decidable Bidirectional Typing for Reachability Types

Authors: Songlin Jia, Guannan Wei, Siyuan He, Yuyan Bao, Tiark Rompf

Abstract: Despite Rust's success in system programming, its ``shared XOR mutable'' principle significantly restricts how mutable values can be used, precluding many useful functional programming idioms. Reachability types are a recent proposal to address the key limitations of Rust-style approaches by tracking, rather than prohibiting, shared, escaping, and mutable data, even in the presence of higher-order… ▽ More Despite Rust's success in system programming, its ``shared XOR mutable'' principle significantly restricts how mutable values can be used, precluding many useful functional programming idioms. Reachability types are a recent proposal to address the key limitations of Rust-style approaches by tracking, rather than prohibiting, shared, escaping, and mutable data, even in the presence of higher-order functions and polymorphic types. The key to enabling tracking in the presence of avoidance is their notion of self-references. Similar to this pointers in OO languages, self-references expose the reachability of enclosing objects to internal components. While they help track escaped data, they present major challenges in designing expressive subtyping and decidable typing algorithms, as they involve subtle interactions with bounds and variance. This lack of an effective type checking algorithm is a key impediment toward making reachability types truly practical and leveraging them to bring the benefits of programming with lifetimes to practical higher-level languages. In this paper, we investigate the issues of subtyping and type checking of self-references, to fully enable this avoidance solution. We address key gaps in previous work by proposing a refined notion of subtyping, which supports encoding datatypes without resorting to term-level coercions, making the overall system more expressive. We also develop a sound and decidable bidirectional typing algorithm, formally verified in Coq. △ Less

Submitted 20 November, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2311.02781 [pdf, other]

Architecting Intermediate Layers for Efficient Composition of Data Management and Machine Learning Systems

Authors: Supun Abeysinghe, Fei Wang, Gregory Essertel, Tiark Rompf

Abstract: Modern data analytics workloads combine relational data processing with machine learning (ML). Most DBMS handle these workloads by offloading these ML operations to external specialized ML systems. While both DBMS and ML systems go to great lengths to optimize performance for their specific workloads, significant performance is lost when used in combination, due to data movement across system boun… ▽ More Modern data analytics workloads combine relational data processing with machine learning (ML). Most DBMS handle these workloads by offloading these ML operations to external specialized ML systems. While both DBMS and ML systems go to great lengths to optimize performance for their specific workloads, significant performance is lost when used in combination, due to data movement across system boundaries, conversions between incompatible internal data formats, and the lack of cross system optimizations. A key idea to remove these bottlenecks is to integrate existing data manipulation systems with ML systems by building a common intermediate layer (IR). Although this idea has been explored before (Weld, Delite), previous such attempts require significant re-engineering of prior systems and still fall short in achieving best-of-breed performance for individual tasks (e.g., SQL, Deep Learning). Specifically, they rely on re-implementing existing systems using a generic set of operators and fail to match best-of-breed individual performance due to the inability to recover high-level optimizations from this generic IR through compiler analysis. We present Flern, the first intermediate-layer integration between DB and ML systems that are best-of-breed individually, competitive with the best compiled query engines such as HyPer on comprehensive relational benchmarks (TPC-H) and competitive with TensorFlow and PyTorch in state-of-the-art ML models (e.g., DeepSpeech, SqueezeNet, Transformers) and also represents a new state-of-the-art for integration. A key realization is to architect intermediate layers based on generative programming capabilities, which preserves high-level contextual information for cross optimizations and enables the construction of a variety of complex structures and cross system optimizations with minimal effort. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2309.08118 [pdf, ps, other]

Graph IRs for Impure Higher-Order Languages (Technical Report)

Authors: Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang, Yuyan Bao, Tiark Rompf

Abstract: This is a companion report for the OOPSLA 2023 paper of the same title, presenting a detailed end-to-end account of the $λ^*_{\mathsf{G}}$ graph IR, at a level of detail beyond a regular conference paper. Our first concern is adequacy and soundness of $λ^*_{\mathsf{G}}$, which we derive from a direct-style imperative functional language (a variant of Bao et al.'s $λ^*$-calculus with reachability t… ▽ More This is a companion report for the OOPSLA 2023 paper of the same title, presenting a detailed end-to-end account of the $λ^*_{\mathsf{G}}$ graph IR, at a level of detail beyond a regular conference paper. Our first concern is adequacy and soundness of $λ^*_{\mathsf{G}}$, which we derive from a direct-style imperative functional language (a variant of Bao et al.'s $λ^*$-calculus with reachability types and a simple effect system) by a series of type-preserving translations into a calculus in monadic normalform (MNF). Static reachability types and effects entirely inform $λ^*_{\mathsf{G}}$'s dependency synthesis. We argue for its adequacy by proving its functional properties along with dependency safety via progress and preservation lemmas with respect to a notion of call-by-value (CBV) reduction that checks the observed order of effects. Our second concern is establishing the correctness of $λ^*_{\mathsf{G}}$'s equational rules that drive compiler optimizations (e.g., DCE, $λ$-hoisting, etc.), by proving contextual equivalence using logical relations. A key insight is that the functional properties of dependency synthesis permit a logical relation on $λ^*_{\mathsf{G}}$ in MNF in terms of previously developed logical relations for the direct-style $λ^*$-calculus. Finally, we also include a longer version of the conference paper's section on code generation and code motion for $λ^*_{\mathsf{G}}$ as implemented in Scala~LMS. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2309.05885

arXiv:2309.05885 [pdf, other]

Modeling Reachability Types with Logical Relations

Authors: Yuyan Bao, Songlin Jia, Guannan Wei, Oliver Bračevac, Tiark Rompf

Abstract: Reachability types are a recent proposal to bring Rust-style reasoning about memory properties to higher-level languages, with a focus on higher-order functions, parametric types, and shared mutable state -- features that are only partially supported by current techniques as employed in Rust. While prior work has established key type soundness results for reachability types using the usual syntact… ▽ More Reachability types are a recent proposal to bring Rust-style reasoning about memory properties to higher-level languages, with a focus on higher-order functions, parametric types, and shared mutable state -- features that are only partially supported by current techniques as employed in Rust. While prior work has established key type soundness results for reachability types using the usual syntactic techniques of progress and preservation, stronger metatheoretic properties have so far been unexplored. This paper presents an alternative semantic model of reachability types using logical relations, providing a framework in which we study key properties of interest: (1) semantic type soundness, including of not syntactically well-typed code fragments, (2) termination, especially in the presence of higher-order state, (3) effect safety, especially the absence of observable mutation, and, finally, (4) program equivalence, especially reordering of non-interfering expressions for parallelization or compiler optimization. △ Less

Submitted 21 February, 2025; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2307.13844 [pdf, other]

Polymorphic Reachability Types: Tracking Freshness, Aliasing, and Separation in Higher-Order Generic Programs

Authors: Guannan Wei, Oliver Bračevac, Songlin Jia, Yuyan Bao, Tiark Rompf

Abstract: Reachability types are a recent proposal that has shown promise in scaling to higher-order but monomorphic settings, tracking aliasing and separation on top of a substrate inspired by separation logic. The prior $λ^*$ reachability type system qualifies types with sets of reachable variables and guarantees separation if two terms have disjoint qualifiers. However, naive extensions with type polymor… ▽ More Reachability types are a recent proposal that has shown promise in scaling to higher-order but monomorphic settings, tracking aliasing and separation on top of a substrate inspired by separation logic. The prior $λ^*$ reachability type system qualifies types with sets of reachable variables and guarantees separation if two terms have disjoint qualifiers. However, naive extensions with type polymorphism and/or precise reachability polymorphism are unsound, making $λ^*$ unsuitable for adoption in real languages. Combining reachability and type polymorphism that is precise, sound, and parametric remains an open challenge. This paper presents a rethinking of the design of reachability tracking and proposes a solution to the key challenge of reachability polymorphism. Instead of always tracking the transitive closure of reachable variables as in the original design, we only track variables reachable in a single step and compute transitive closures only when necessary, thus preserving chains of reachability over known variables that can be refined using substitution. To enable this property, we introduce a new freshness qualifier, which indicates variables whose reachability sets may grow during evaluation steps. These ideas yield the simply-typed $λ^\diamond$-calculus with precise lightweight, i.e., quantifier-free, reachability polymorphism, and the $\mathsf{F}_{<:}^\diamond$-calculus with bounded parametric polymorphism over types and reachability qualifiers. We prove type soundness and a preservation of separation property in Coq. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2207.11649 [pdf, other]

OCTAL: Graph Representation Learning for LTL Model Checking

Authors: Prasita Mukherjee, Haoteng Yin, Susheel Suresh, Tiark Rompf

Abstract: Model Checking is widely applied in verifying the correctness of complex and concurrent systems against a specification. Pure symbolic approaches while popular, still suffer from the state space explosion problem that makes them impractical for large scale systems and/or specifications. In this paper, we propose to use graph representation learning (GRL) for solving linear temporal logic (LTL) mod… ▽ More Model Checking is widely applied in verifying the correctness of complex and concurrent systems against a specification. Pure symbolic approaches while popular, still suffer from the state space explosion problem that makes them impractical for large scale systems and/or specifications. In this paper, we propose to use graph representation learning (GRL) for solving linear temporal logic (LTL) model checking, where the system and the specification are expressed by a Büchi automaton and an LTL formula respectively. A novel GRL-based framework OCTAL, is designed to learn the representation of the graph-structured system and specification, which reduces the model checking problem to binary classification in the latent space. The empirical experiments show that OCTAL achieves comparable accuracy against canonical SOTA model checkers on three different datasets, with up to $5\times$ overall speedup and above $63\times$ for satisfiability checking alone. △ Less

Submitted 26 July, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

Comments: change the style of bibliography

arXiv:2110.14824

A GNN Based Approach to LTL Model Checking

Authors: Prasita Mukherjee, Tiark Rompf

Abstract: Model Checking is widely applied in verifying complicated and especially concurrent systems. Despite of its popularity, model checking suffers from the state space explosion problem that restricts it from being applied to certain systems, or specifications. Many works have been proposed in the past to address the state space explosion problem, and they have achieved some success, but the inherent… ▽ More Model Checking is widely applied in verifying complicated and especially concurrent systems. Despite of its popularity, model checking suffers from the state space explosion problem that restricts it from being applied to certain systems, or specifications. Many works have been proposed in the past to address the state space explosion problem, and they have achieved some success, but the inherent complexity still remains an obstacle for purely symbolic approaches. In this paper, we propose a Graph Neural Network (GNN) based approach for model checking, where the model is expressed using a B{ü}chi automaton and the property to be verified is expressed using Linear Temporal Logic (LTL). We express the model as a GNN, and propose a novel node embedding framework that encodes the LTL property and characteristics of the model. We reduce the LTL model checking problem to a graph classification problem, where there are two classes, 1 (if the model satisfies the specification) and 0 (if the model does not satisfy the specification). The experimental results show that our framework is up to 17 times faster than state-of-the-art tools. Our approach is particularly useful when dealing with very large LTL formulae and small to moderate sized models. △ Less

Submitted 7 September, 2023; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: This is an older submission. The updated submissions exist as a separate paper

arXiv:2009.01489 [pdf, other]

HACCLE: Metaprogramming for Secure Multi-Party Computation -- Extended Version

Authors: Yuyan Bao, Kirshanthan Sundararajah, Raghav Malik, Qianchuan Ye, Christopher Wagner, Nouraldin Jaber, Fei Wang, Mohammad Hassan Ameri, Donghang Lu, Alexander Seto, Benjamin Delaware, Roopsha Samanta, Aniket Kate, Christina Garman, Jeremiah Blocki, Pierre-David Letourneau, Benoit Meister, Jonathan Springer, Tiark Rompf, Milind Kulkarni

Abstract: Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of crypto… ▽ More Cryptographic techniques have the potential to enable distrusting parties to collaborate in fundamentally new ways, but their practical implementation poses numerous challenges. An important class of such cryptographic techniques is known as Secure Multi-Party Computation (MPC). Developing Secure MPC applications in realistic scenarios requires extensive knowledge spanning multiple areas of cryptography and systems. And while the steps to arrive at a solution for a particular application are often straightforward, it remains difficult to make the implementation efficient, and tedious to apply those same steps to a slightly different application from scratch. Hence, it is an important problem to design platforms for implementing Secure MPC applications with minimum effort and using techniques accessible to non-experts in cryptography. In this paper, we present the HACCLE (High Assurance Compositional Cryptography: Languages and Environments) toolchain, specifically targeted to MPC applications. HACCLE contains an embedded domain-specific language Harpoon, for software developers without cryptographic expertise to write MPC-based programs, and uses Lightweight Modular Staging (LMS) for code generation. Harpoon programs are compiled into acyclic circuits represented in HACCLE's Intermediate Representation (HIR) that serves as an abstraction over different cryptographic protocols such as secret sharing, homomorphic encryption, or garbled circuits. Implementations of different cryptographic protocols serve as different backends of our toolchain. The extensible design of HIR allows cryptographic experts to plug in new primitives and protocols to realize computation. And the use of standard metaprogramming techniques lowers the development effort significantly. △ Less

Submitted 30 September, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

arXiv:2001.00090 [pdf, other]

Resilient Cyberphysical Systems and their Application Drivers: A Technology Roadmap

Authors: Somali Chaterji, Parinaz Naghizadeh, Muhammad Ashraful Alam, Saurabh Bagchi, Mung Chiang, David Corman, Brian Henz, Suman Jana, Na Li, Shaoshuai Mou, Meeko Oishi, Chunyi Peng, Tiark Rompf, Ashutosh Sabharwal, Shreyas Sundaram, James Weimer, Jennifer Weller

Abstract: Cyberphysical systems (CPS) are ubiquitous in our personal and professional lives, and they promise to dramatically improve micro-communities (e.g., urban farms, hospitals), macro-communities (e.g., cities and metropolises), urban structures (e.g., smart homes and cars), and living structures (e.g., human bodies, synthetic genomes). The question that we address in this article pertains to designin… ▽ More Cyberphysical systems (CPS) are ubiquitous in our personal and professional lives, and they promise to dramatically improve micro-communities (e.g., urban farms, hospitals), macro-communities (e.g., cities and metropolises), urban structures (e.g., smart homes and cars), and living structures (e.g., human bodies, synthetic genomes). The question that we address in this article pertains to designing these CPS systems to be resilient-from-the-ground-up, and through progressive learning, resilient-by-reaction. An optimally designed system is resilient to both unique attacks and recurrent attacks, the latter with a lower overhead. Overall, the notion of resilience can be thought of in the light of three main sources of lack of resilience, as follows: exogenous factors, such as natural variations and attack scenarios; mismatch between engineered designs and exogenous factors ranging from DDoS (distributed denial-of-service) attacks or other cybersecurity nightmares, so called "black swan" events, disabling critical services of the municipal electrical grids and other connected infrastructures, data breaches, and network failures; and the fragility of engineered designs themselves encompassing bugs, human-computer interactions (HCI), and the overall complexity of real-world systems. In the paper, our focus is on design and deployment innovations that are broadly applicable across a range of CPS application areas. △ Less

Submitted 19 December, 2019; originally announced January 2020.

Comments: 36 pages, 2 figures, NSF-supported workshop on Grand Challenges in Resilience, held at Purdue, March 20-21, 2019

MSC Class: C.5.3; D.4.5; H.4.0 ACM Class: C.5.3; D.4.5; H.4.0

arXiv:1904.12084 [pdf, ps, other]

Graph Neural Reasoning for 2-Quantified Boolean Formula Solvers

Authors: Zhanfu Yang, Fei Wang, Ziliang Chen, Guannan Wei, Tiark Rompf

Abstract: In this paper, we investigate the feasibility of learning GNN (Graph Neural Network) based solvers and GNN-based heuristics for specified QBF (Quantified Boolean Formula) problems. We design and evaluate several GNN architectures for 2QBF formulae, and conjecture that GNN has limitations in learning 2QBF solvers. Then we show how to learn a heuristic CEGAR 2QBF solver. We further explore generaliz… ▽ More In this paper, we investigate the feasibility of learning GNN (Graph Neural Network) based solvers and GNN-based heuristics for specified QBF (Quantified Boolean Formula) problems. We design and evaluate several GNN architectures for 2QBF formulae, and conjecture that GNN has limitations in learning 2QBF solvers. Then we show how to learn a heuristic CEGAR 2QBF solver. We further explore generalizing GNN-based heuristics to larger unseen instances, and uncover some interesting challenges. In summary, this paper provides a comprehensive surveying view of applying GNN-embeddings to specified QBF solvers, and aims to offer guidance in applying ML to more complicated symbolic reasoning problems. △ Less

Submitted 26 April, 2019; originally announced April 2019.

Comments: 5 Pages

arXiv:1810.08061 [pdf, ps, other]

AutoGraph: Imperative-style Coding with Graph-based Performance

Authors: Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K Lee, Zachary Nado, D Sculley, Tiark Rompf, Alexander B Wiltschko

Abstract: There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano bene… ▽ More There is a perceived trade-off between machine learning code that is easy to write, and machine learning code that is scalable or fast to execute. In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings. Graph-based libraries like TensorFlow and Theano benefit from whole-program optimization and can be deployed broadly, but make expressing complex models more cumbersome. We describe how the use of staged programming in Python, via source code transformation, offers a midpoint between these two library design patterns, capturing the benefits of both. A key insight is to delay all type-dependent decisions until runtime, via dynamic dispatch. We instantiate these principles in AutoGraph, a software system that improves the programming experience of the TensorFlow library, and demonstrate usability improvements with no loss in performance compared to native TensorFlow graphs. We also show that our system is backend agnostic, and demonstrate targeting an alternate IR with characteristics not found in TensorFlow graphs. △ Less

Submitted 26 March, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

arXiv:1803.10228 [pdf, other]

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Authors: Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, Tiark Rompf

Abstract: Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging ques… ▽ More Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging quest for even more general forms of differentiable programming, where arbitrary parameterized computations can be trained by gradient descent. In this paper, we take a fresh look at automatic differentiation (AD) techniques, and especially aim to demystify the reverse-mode form of AD that generalizes backpropagation in neural networks. We uncover a tight connection between reverse-mode AD and delimited continuations, which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. We further show how this formulation of AD can be fruitfully combined with multi-stage programming (staging), leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch). △ Less

Submitted 28 August, 2019; v1 submitted 27 March, 2018; originally announced March 2018.

arXiv:1802.05340 [pdf, other]

From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) Zero

Authors: Fei Wang, Tiark Rompf

Abstract: Despite the recent successes of deep neural networks in various fields such as image and speech recognition, natural language processing, and reinforcement learning, we still face big challenges in bringing the power of numeric optimization to symbolic reasoning. Researchers have proposed different avenues such as neural machine translation for proof synthesis, vectorization of symbols and express… ▽ More Despite the recent successes of deep neural networks in various fields such as image and speech recognition, natural language processing, and reinforcement learning, we still face big challenges in bringing the power of numeric optimization to symbolic reasoning. Researchers have proposed different avenues such as neural machine translation for proof synthesis, vectorization of symbols and expressions for representing symbolic patterns, and coupling of neural back-ends for dimensionality reduction with symbolic front-ends for decision making. However, these initial explorations are still only point solutions, and bear other shortcomings such as lack of correctness guarantees. In this paper, we present our approach of casting symbolic reasoning as games, and directly harnessing the power of deep reinforcement learning in the style of Alpha(Go) Zero on symbolic problems. Using the Boolean Satisfiability (SAT) problem as showcase, we demonstrate the feasibility of our method, and the advantages of modularity, efficiency, and correctness guarantees. △ Less

Submitted 14 February, 2018; originally announced February 2018.

arXiv:1703.08219 [pdf, other]

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

Authors: Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf

Abstract: The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While Sp… ▽ More The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While Spark has made impressive progress, we show that for relational workloads, there is still a significant gap compared with best-of-breed query engines. And when stepping outside of the relational world, query optimization techniques are ineffective if large parts of a computation have to be treated as user-defined functions (UDFs). We present Flare: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark. We demonstrate order of magnitude speedups both for relational workloads such as TPC-H, as well as for a range of machine learning kernels that combine relational and iterative functional processing. Flare achieves these results through (1) compilation to native code, (2) replacing parts of the Spark runtime system, and (3) extending the scope of optimization and code generation to large classes of UDFs. △ Less

Submitted 23 March, 2017; originally announced March 2017.

arXiv:1510.05216 [pdf, other]

From F to DOT: Type Soundness Proofs with Definitional Interpreters

Authors: Tiark Rompf, Nada Amin

Abstract: Scala's type system unifies ML modules, object-oriented, and functional programming. The Dependent Object Types (DOT) family of calculi has been proposed as a new foundation for Scala and similar languages. Unfortunately, it is not clear how DOT relates to any well-known type systems, and type soundness has only been established for very restricted subsets. In fact, important Scala features are kn… ▽ More Scala's type system unifies ML modules, object-oriented, and functional programming. The Dependent Object Types (DOT) family of calculi has been proposed as a new foundation for Scala and similar languages. Unfortunately, it is not clear how DOT relates to any well-known type systems, and type soundness has only been established for very restricted subsets. In fact, important Scala features are known to break at least one key metatheoretic property such as environment narrowing or subtyping transitivity, which are usually required for a type soundness proof. First, and, perhaps surprisingly, we show how rich DOT calculi can still be proved sound. The key insight is that narrowing and subtyping transitivity only need to hold for runtime objects, but not for code that is never executed. Alas, the dominant method of proving type soundness, Wright and Felleisen's syntactic approach, is based on term rewriting, which does not a priori make a distinction between runtime and type assignment time. Second, we demonstrate how type soundness can be proved for advanced, polymorphic, type systems with respect to high-level, definitional interpreters, implemented in Coq. We present the first mechanized soundness proof in this style for System F<: and several extensions, including mutable references. Our proofs use only simple induction: another surprising result, as the combination of big-step semantics, mutable references, and polymorphism is commonly believed to require co-inductive proof techniques. Third, we show how DOT-like calculi emerge as generalizations of F<:, exposing a rich design space of calculi with path-dependent types which we collectively call System D. Armed with insights from the definitional interpreter semantics, we also show how equivalent small-step semantics and soundness proofs in Wright-Felleisen-style can be derived for these systems. △ Less

Submitted 4 February, 2016; v1 submitted 18 October, 2015; originally announced October 2015.

arXiv:1109.0778 [pdf, other]

doi 10.4204/EPTCS.66.5

Building-Blocks for Performance Oriented DSLs

Authors: Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, Kunle Olukotun

Abstract: Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware is bec… ▽ More Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware is becoming increasingly difficult, DSLs are an attractive way to capitalize on improved hardware performance, precisely by making the compiler reason on a higher level. Implementing efficient DSL compilers is a daunting task however, and support for building performance-oriented DSLs is urgently needed. To this end, we present the Delite Framework, an extensible toolkit that drastically simplifies building embedded DSLs and compiling DSL programs for execution on heterogeneous hardware. We discuss several building blocks in some detail and present experimental results for the OptiML machine-learning DSL implemented on top of Delite. △ Less

Submitted 4 September, 2011; originally announced September 2011.

Comments: In Proceedings DSL 2011, arXiv:1109.0323

Journal ref: EPTCS 66, 2011, pp. 93-117

Showing 1–17 of 17 results for author: Rompf, T