-
Incremental Bidirectional Typing via Order Maintenance
Authors:
Thomas J. Porter,
Marisa Kirisame,
Ivan Wei,
Pavel Panchekha,
Cyrus Omar
Abstract:
Live programming environments provide various semantic services, including type checking and evaluation, continuously as the user is editing the program. The live paradigm promises to improve the developer experience, but liveness is an implementation challenge particularly when working with large programs. This paper specifies and efficiently implements a system the is able to incrementally updat…
▽ More
Live programming environments provide various semantic services, including type checking and evaluation, continuously as the user is editing the program. The live paradigm promises to improve the developer experience, but liveness is an implementation challenge particularly when working with large programs. This paper specifies and efficiently implements a system the is able to incrementally update type information for a live program in response to fine-grained program edits. This information includes type error marks and information about the expected and actual type on every expression. The system is specified type-theoretically as a small-step dynamics that propagates updates through the marked and annotated program. Most updates flow according to a base bidirectional type system. Additional pointers are maintained to connect bound variables to their binding locations, with edits traversing these pointers directly. Order maintenance data structures are employed to efficiently maintain these pointers and to prioritize the order of update propagation. We prove this system is equivalent to naive re-analysis in the Agda theorem prover, along with other important metatheoretic properties. We then implement it efficiently in OCaml, detailing a number of impactful optimizations. We evaluate this implementation's performance with a large stress-test and find that it is able to achieve dramatic speed-ups of 275.96$\times$ compared to from-scratch reanalysis.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Spineless Traversal for Layout Invalidation
Authors:
Marisa Kirisame,
Tiezhi Wang,
Pavel Panchekha
Abstract:
Latency is a major concern for web rendering engines like those in Chrome, Safari, and Firefox. These engines reduce latency by using an incremental layout algorithm to redraw the page when the user interacts with it. In such an algorithm, elements that change frame-to-frame are marked dirty, and only those elements are processed to draw the next frame, dramatically reducing latency. However, the…
▽ More
Latency is a major concern for web rendering engines like those in Chrome, Safari, and Firefox. These engines reduce latency by using an incremental layout algorithm to redraw the page when the user interacts with it. In such an algorithm, elements that change frame-to-frame are marked dirty, and only those elements are processed to draw the next frame, dramatically reducing latency. However, the standard incremental layout algorithm must search the page for dirty elements, accessing auxiliary elements in the process. These auxiliary elements add cache misses and stalled cycles, and are responsible for a sizable fraction of all layout latency. We introduce a new, faster incremental layout algorithm called Spineless Traversal. Spineless Traversal uses a cache-friendlier priority queue algorithm that avoids accessing auxiliary nodes and thus reduces cache traffic and stalls. This leads to dramatic speedups on the most latency-critical interactions such as hovering, typing, and animation. Moreover, thanks to numerous low-level optimizations, Spineless Traversal is competitive across the whole spectrum of incremental layout workloads. Spineless Traversal is faster than the standard approach on 83.0% of 2216 benchmarks, with a mean speedup of 1.80x concentrated in the most latency-critical interactions.
△ Less
Submitted 14 May, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Optimal Heap Limits for Reducing Browser Memory Use
Authors:
Marisa Kirisame,
Pranav Shenoy,
Pavel Panchekha
Abstract:
Garbage-collected language runtimes carefully tune heap limits to reduce garbage collection time and memory usage. However, there's a trade-off: a lower heap limit reduces memory use but increases garbage collection time. Classic methods for setting heap limits include manually tuned heap limits and multiple-of-live-size rules of thumb, but it is not clear when one rule is better than another or h…
▽ More
Garbage-collected language runtimes carefully tune heap limits to reduce garbage collection time and memory usage. However, there's a trade-off: a lower heap limit reduces memory use but increases garbage collection time. Classic methods for setting heap limits include manually tuned heap limits and multiple-of-live-size rules of thumb, but it is not clear when one rule is better than another or how to compare them.
We address this problem with a new framework where heap limits are set for multiple heaps at once. Our key insight is that every heap limit rule induces a particular allocation of memory across multiple processes, and this allocation can be sub-optimal. We use our framework to derive an optimal "square-root" heap limit rule, which minimizes total memory usage for any amount of total garbage collection time. Paradoxically, the square-root heap limit rule achieves this coordination without communication: it allocates memory optimally across multiple heaps without requiring any communication between those heaps.
To demonstrate that this heap limit rule is effective, we prototype it for V8, the JavaScript runtime used in Google Chrome, Microsoft Edge, and other browsers, as well as in server-side frameworks like node.js and Deno. On real-world web pages, our prototype achieves reductions of approximately 16.0% of memory usage while keeping garbage collection time constant. On memory-intensive benchmarks, reductions of up to 30.0% of garbage collection time are possible with no change in total memory usage.
△ Less
Submitted 25 September, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Dynamic Tensor Rematerialization
Authors:
Marisa Kirisame,
Steven Lyubomirsky,
Altan Haan,
Jennifer Brennan,
Mike He,
Jared Roesch,
Tianqi Chen,
Zachary Tatlock
Abstract:
Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Re…
▽ More
Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an $N$-layer linear feedforward network on an $Ω(\sqrt{N})$ memory budget with only $\mathcal{O}(N)$ tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors.
△ Less
Submitted 18 March, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Relay: A High-Level Compiler for Deep Learning
Authors:
Jared Roesch,
Steven Lyubomirsky,
Marisa Kirisame,
Logan Weber,
Josh Pollock,
Luis Vega,
Ziheng Jiang,
Tianqi Chen,
Thierry Moreau,
Zachary Tatlock
Abstract:
Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler…
▽ More
Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler framework for DL. Relay's functional, statically typed intermediate representation (IR) unifies and generalizes existing DL IRs to express state-of-the-art models. The introduction of Relay's expressive IR requires careful design of domain-specific optimizations, addressed via Relay's extension mechanisms. Using these extension mechanisms, Relay supports a unified compiler that can target a variety of hardware platforms. Our evaluation demonstrates Relay's competitive performance for a broad class of models and devices (CPUs, GPUs, and emerging accelerators). Relay's design demonstrates how a unified IR can provide expressivity, composability, and portability without compromising performance.
△ Less
Submitted 24 August, 2019; v1 submitted 17 April, 2019;
originally announced April 2019.
-
Relay: A New IR for Machine Learning Frameworks
Authors:
Jared Roesch,
Steven Lyubomirsky,
Logan Weber,
Josh Pollock,
Marisa Kirisame,
Tianqi Chen,
Zachary Tatlock
Abstract:
Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation…
▽ More
Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet.
△ Less
Submitted 25 September, 2018;
originally announced October 2018.