Search | arXiv e-print repository

Partial Evaluation, Whole-Program Compilation

Authors: Chris Fallin, Maxwell Bernstein

Abstract: There is a tension in dynamic language runtime design between speed and correctness: state-of-the-art JIT compilation, the result of enormous industrial investment and significant research, achieves heroic speedups at the cost of complexity that can result in serious correctness bugs. Much of this complexity comes from the existence of multiple tiers and the need to maintain correspondence between… ▽ More There is a tension in dynamic language runtime design between speed and correctness: state-of-the-art JIT compilation, the result of enormous industrial investment and significant research, achieves heroic speedups at the cost of complexity that can result in serious correctness bugs. Much of this complexity comes from the existence of multiple tiers and the need to maintain correspondence between these separate definitions of the language's semantics; also, from the indirect nature of the semantics implicitly encoded in a compiler backend. One way to address this complexity is to automatically derive, as much as possible, the compiled code from a single source-of-truth; for example, the interpreter tier. In this work, we introduce a partial evaluator that can derive compiled code ``for free'' by specializing an interpreter with its bytecode. This transform operates on the interpreter body at a basic-block IR level and is applicable to almost unmodified existing interpreters in systems languages such as C or C++. We show the effectiveness of this new tool by applying it to the interpreter tier of an existing industrial JavaScript engine, SpiderMonkey, yielding $2.17\times$ speedups, and the PUC-Rio Lua interpreter, yielding $1.84\times$ speedups with only three hours' effort. Finally, we outline an approach to carry this work further, deriving more of the capabilities of a JIT backend from first principles while retaining semantics-preserving correctness. △ Less

Submitted 15 November, 2024; originally announced November 2024.

arXiv:1805.03502 [pdf, other]

RowClone: Accelerating Data Movement and Initialization Using DRAM

Authors: Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry

Abstract: In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRA… ▽ More In existing systems, to perform any bulk data movement operation (copy or initialization), the data has to first be read into the on-chip processor, all the way into the L1 cache, and the result of the operation must be written back to main memory. This is despite the fact that these operations do not involve any actual computation. RowClone exploits the organization and operation of commodity DRAM to perform these operations completely inside DRAM using two mechanisms. The first mechanism, Fast Parallel Mode, copies data between two rows inside the same DRAM subarray by issuing back-to-back activate commands to the source and the destination row. The second mechanism, Pipelined Serial Mode, transfers cache lines between two banks using the shared internal bus. RowClone significantly reduces the raw latency and energy consumption of bulk data copy and initialization. This reduction directly translates to improvement in performance and energy efficiency of systems running copy or initialization-intensive workloads △ Less

Submitted 7 May, 2018; originally announced May 2018.

Comments: arXiv admin note: text overlap with arXiv:1605.06483

arXiv:1603.00747 [pdf, other]

RowHammer: Reliability Analysis and Security Implications

Authors: Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, Onur Mutlu

Abstract: As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to disturbance, a phenomenon in which different DRAM cells interfere with each other's operation. For the first time in academic literature, our ISCA paper exposes the existence of disturbance errors in commodity DRAM chips that are sold and used today. We show that repeatedly reading from the same address c… ▽ More As process technology scales down to smaller dimensions, DRAM chips become more vulnerable to disturbance, a phenomenon in which different DRAM cells interfere with each other's operation. For the first time in academic literature, our ISCA paper exposes the existence of disturbance errors in commodity DRAM chips that are sold and used today. We show that repeatedly reading from the same address could corrupt data in nearby addresses. More specifically: When a DRAM row is opened (i.e., activated) and closed (i.e., precharged) repeatedly (i.e., hammered), it can induce disturbance errors in adjacent DRAM rows. This failure mode is popularly called RowHammer. We tested 129 DRAM modules manufactured within the past six years (2008-2014) and found 110 of them to exhibit RowHammer disturbance errors, the earliest of which dates back to 2010. In particular, all modules from the past two years (2012-2013) were vulnerable, which implies that the errors are a recent phenomenon affecting more advanced generations of process technology. Importantly, disturbance errors pose an easily-exploitable security threat since they are a breach of memory protection, wherein accesses to one page (mapped to one row) modifies the data stored in another page (mapped to an adjacent row). △ Less

Submitted 18 February, 2016; originally announced March 2016.

Comments: This is the summary of the paper titled "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" which appeared in ISCA in June 2014

arXiv:1602.06005 [pdf, other]

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing

Authors: Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Kai-Wei Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, Onur Mutlu

Abstract: Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by introducing more complex in-ring buffering and buffered flow control. Our goal in this paper is to design a new hierarchical ring interconnect that can ma… ▽ More Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by introducing more complex in-ring buffering and buffered flow control. Our goal in this paper is to design a new hierarchical ring interconnect that can maintain most of the simplicity of traditional ring designs (no in-ring buffering or buffered flow control) while achieving high scalability as more complex buffered hierarchical ring designs. Our design, called HiRD (Hierarchical Rings with Deflection), includes features that allow us to mostly maintain the simplicity of traditional simple ring topologies while providing higher energy efficiency and scalability. First, HiRD does not have any buffering or buffered flow control within individual rings, and requires only a small amount of buffering between the ring hierarchy levels. When inter-ring buffers are full, our design simply deflects flits so that they circle the ring and try again, which eliminates the need for in-ring buffering. Second, we introduce two simple mechanisms that provides an end-to-end delivery guarantee within the entire network without impacting the critical path or latency of the vast majority of network traffic. HiRD attains equal or better performance at better energy efficiency than multiple versions of both a previous hierarchical ring design and a traditional single ring design. We also analyze our design's characteristics and injection and delivery guarantees. We conclude that HiRD can be a compelling design point that allows higher energy efficiency and scalability while retaining the simplicity and appeal of conventional ring-based designs. △ Less

Submitted 18 February, 2016; originally announced February 2016.

ACM Class: C.1.2; B.4.3

Showing 1–4 of 4 results for author: Fallin, C