Skip to main content

Showing 1–34 of 34 results for author: Olukotun, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.08141  [pdf, other

    cs.LG cs.AR cs.CL cs.PF

    LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits

    Authors: Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun

    Abstract: Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive. We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss. LowRA optimizes fine-grained quantization - mapping, threshold… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  2. arXiv:2502.02534  [pdf, other

    cs.CL

    Adaptive Self-improvement LLM Agentic System for ML Library Development

    Authors: Genghan Zhang, Weixin Liang, Olivia Hsu, Kunle Olukotun

    Abstract: ML libraries, often written in architecture-specific programming languages (ASPLs) that target domain-specific architectures, are key to efficient ML systems. However, writing these high-performance ML libraries is challenging because it requires expert knowledge of ML algorithms and the ASPL. Large language models (LLMs), on the other hand, have shown general coding capabilities. However, challen… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  3. arXiv:2412.16432  [pdf, other

    cs.AR

    DFModel: Design Space Optimization of Large-Scale Systems Exploiting Dataflow Mappings

    Authors: Sho Ko, Nathan Zhang, Olivia Hsu, Ardavan Pedram, Kunle Olukotun

    Abstract: We propose DFModel, a modeling framework for mapping dataflow computation graphs onto large-scale systems. Mapping a workload to a system requires optimizing dataflow mappings at various levels, including the inter-chip (between chips) level and the intra-chip (within a chip) level. DFModel is, to the best of our knowledge, the first framework to perform the optimization at multiple levels of the… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  4. SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 4 November, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)

    ACM Class: C.1.3; C.0

  5. arXiv:2404.16629  [pdf, other

    cs.AR

    Implementing and Optimizing the Scaled Dot-Product Attention on Streaming Dataflow

    Authors: Gina Sohn, Nathan Zhang, Kunle Olukotun

    Abstract: Transformer models serve as the backbone of many state-ofthe-art language models, and most use the scaled dot-product attention (SDPA) mechanism to capture relationships between tokens. However, the straightforward implementation of SDPA has quadratic compute and memory complexity with respect to the sequence length. On processor architectures such as GPUs and TPUs, there is a robust body of prior… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 4 pages, 3 figures

  6. arXiv:2302.06124  [pdf, other

    cs.AR

    Revet: A Language and Compiler for Dataflow Threads

    Authors: Alexander Rucker, Shiv Sundram, Coleman Smith, Matthew Vilim, Raghu Prabhakar, Fredrik Kjolstad, Kunle Olukotun

    Abstract: Spatial dataflow architectures such as reconfigurable dataflow accelerators (RDA) can provide much higher performance and efficiency than CPUs and GPUs. In particular, vectorized reconfigurable dataflow accelerators (vRDA) in recent literature represent a design point that enhances the efficiency of dataflow architectures with vectorization. Today, vRDAs can be exploited using either hardcoded ker… ▽ More

    Submitted 30 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: To appear in HPCA 2024

  7. arXiv:2212.11142  [pdf, other

    cs.PL cs.LG cs.PF

    BaCO: A Fast and Portable Bayesian Compiler Optimization Framework

    Authors: Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, Luigi Nardi

    Abstract: We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO provides the flexibility needed to handle the requirements of modern autotuning tasks. Particularly, it deals with permutation, ordered, and continuous parameter types along with both known and unknown parameter constraints. To reason about these… ▽ More

    Submitted 11 April, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  8. arXiv:2211.03251  [pdf, other

    cs.PL cs.AR

    Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture

    Authors: Olivia Hsu, Alexander Rucker, Tian Zhao, Kunle Olukotun, Fredrik Kjolstad

    Abstract: We introduce Stardust, a compiler that compiles sparse tensor algebra to reconfigurable dataflow architectures (RDAs). Stardust introduces new user-provided data representation and scheduling language constructs for mapping to resource-constrained accelerated architectures. Stardust uses the information provided by these constructs to determine on-chip memory placement and to lower to the Capstan… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 15 pages, 13 figures, 6 tables,

    ACM Class: D.3

  9. The Sparse Abstract Machine

    Authors: Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel Emer, Mark Horowitz, Fredrik Kjolstad

    Abstract: We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressi… ▽ More

    Submitted 23 March, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 18 pages, 17 figures, 3 tables

    Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 3 (2023) 710-726

  10. arXiv:2206.05592  [pdf

    cs.NI

    Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks

    Authors: Tushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun

    Abstract: Support for Machine Learning (ML) applications in networks has significantly improved over the last decade. The availability of public datasets and programmable switching fabrics (including low-level languages to program them) present a full-stack to the programmer for deploying in-network ML. However, the diversity of tools involved, coupled with complex optimization tasks of ML model design and… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: 12 pages, 7 figures, 5 tables

  11. arXiv:2202.01261  [pdf, other

    cs.AR cs.LG

    Efficient Memory Partitioning in Software Defined Hardware

    Authors: Matthew Feldman, Tian Zhao, Kunle Olukotun

    Abstract: As programmers turn to software-defined hardware (SDH) to maintain a high level of productivity while programming hardware to run complex algorithms, heavy-lifting must be done by the compiler to automatically partition on-chip arrays. In this paper, we introduce an automatic memory partitioning system that can quickly compute more efficient partitioning schemes than prior systems. Our system empl… ▽ More

    Submitted 29 March, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

  12. Capstan: A Vector RDA for Sparsity

    Authors: Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, Kunle Olukotun

    Abstract: This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one application, we start with common sparse data formats, each of which supports multiple applications. Using a declarative programming model, Capstan supports application-independent sparse iteration and memory primitives t… ▽ More

    Submitted 22 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  13. arXiv:2006.14608  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization with a Prior for the Optimum

    Authors: Artur Souza, Luigi Nardi, Leonardo B. Oliveira, Kunle Olukotun, Marius Lindauer, Frank Hutter

    Abstract: While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Bayesian Optimization with a Prior for the Optimum… ▽ More

    Submitted 19 April, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

  14. arXiv:2002.08987  [pdf

    cs.NI cs.LG cs.PF

    Taurus: A Data Plane Architecture for Per-Packet ML

    Authors: Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, Kunle Olukotun

    Abstract: Emerging applications -- cloud computing, the internet of things, and augmented/virtual reality -- demand responsive, secure, and scalable datacenter networks. These networks currently implement simple, per-packet, data-plane heuristics (e.g., ECMP and sketches) under a slow, millisecond-latency control plane that runs data-driven performance and security policies. However, to meet applications' s… ▽ More

    Submitted 19 January, 2022; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: 16 pages

  15. arXiv:1909.13654  [pdf, other

    cs.DC cs.LG cs.PF

    Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

    Authors: Tian Zhao, Yaqi Zhang, Kunle Olukotun

    Abstract: Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads. Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization. We show that by supporting more general loop constructs that capture design parameters in accelerators, it is possib… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Journal ref: Proceedings of the 2 nd SysML Conference, Palo Alto, CA, USA, 2019. Copyright 2019 by the author(s)

  16. arXiv:1905.13376  [pdf, other

    cs.DB cs.DC

    Efficient Multiway Hash Join on Reconfigurable Hardware

    Authors: Kunle Olukotun, Raghu Prabhakar, Rekha Singhal, Jeffrey D. Ullman, Yaqi Zhang

    Abstract: We propose the algorithms for performing multiway joins using a new type of coarse grain reconfigurable hardware accelerator~-- ``Plasticine''~-- that, compared with other accelerators, emphasizes high compute capability and high on-chip communication bandwidth. Joining three or more relations in a single step, i.e. multiway join, is efficient when the join of any two relations yields too large an… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 20 pages

  17. arXiv:1905.10336  [pdf, other

    cs.AR cs.DB cs.DC cs.LG

    Polystore++: Accelerated Polystore System for Heterogeneous Workloads

    Authors: Rekha Singhal, Nathan Zhang, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun

    Abstract: Modern real-time business analytic consist of heterogeneous workloads (e.g, database queries, graph processing, and machine learning). These analytic applications need programming environments that can capture all aspects of the constituent workloads (including data models they work on and movement of data across processing engines). Polystore systems suit such applications; however, these systems… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

    Comments: 11 pages, Accepted in ICDCS 2019

    Journal ref: ICDCS 2019

  18. arXiv:1904.11834  [pdf, other

    cs.LG cs.CV stat.ML

    DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning

    Authors: Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi

    Abstract: Serial crystallography is the field of science that studies the structure and properties of crystals via diffraction patterns. In this paper, we introduce a new serial crystallography dataset comprised of real and synthetic images; the synthetic images are generated through the use of a simulator that is both scalable and accurate. The resulting dataset is called DiffraNet, and it is composed of 2… ▽ More

    Submitted 3 May, 2019; v1 submitted 26 April, 2019; originally announced April 2019.

  19. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  20. arXiv:1810.05236  [pdf, other

    cs.LG math.OC stat.ML

    Practical Design Space Exploration

    Authors: Luigi Nardi, David Koeplinger, Kunle Olukotun

    Abstract: Multi-objective optimization is a crucial matter in computer systems design space exploration because real-world applications often rely on a trade-off between several objectives. Derivatives are usually not available or impractical to compute and the feasibility of an experiment can not always be determined in advance. These problems are particularly difficult when the feasible region is relative… ▽ More

    Submitted 24 July, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: 12 pages, MASCOTS 2019 conference (https://sites.google.com/view/mascots-2019)

  21. arXiv:1806.01427  [pdf, other

    cs.LG stat.ML

    Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

    Authors: Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

    Abstract: Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of… ▽ More

    Submitted 1 December, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

  22. arXiv:1803.03383  [pdf, other

    cs.LG stat.ML

    High-Accuracy Low-Precision Training

    Authors: Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

    Abstract: Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, w… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

  23. arXiv:1708.07859  [pdf, other

    cs.DB

    LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case

    Authors: Christopher R. Aberger, Andrew Lamb, Kunle Olukotun, Christopher Ré

    Abstract: Pipelines combining SQL-style business intelligence (BI) queries and linear algebra (LA) are becoming increasingly common in industry. As a result, there is a growing need to unify these workloads in a single framework. Unfortunately, existing solutions either sacrifice the inherent benefits of exclusively using a relational database (e.g. logical and physical independence) or incur orders of magn… ▽ More

    Submitted 25 August, 2017; originally announced August 2017.

  24. arXiv:1705.07538  [pdf, other

    cs.LG cs.DB stat.ML

    Infrastructure for Usable Machine Learning: The Stanford DAWN Project

    Authors: Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

    Abstract: Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application developmen… ▽ More

    Submitted 8 June, 2017; v1 submitted 21 May, 2017; originally announced May 2017.

  25. arXiv:1703.08219  [pdf, other

    cs.DB cs.DC cs.PF cs.PL

    Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

    Authors: Grégory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, Tiark Rompf

    Abstract: The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While Sp… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.

  26. arXiv:1602.07415  [pdf, ps, other

    cs.LG

    Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling

    Authors: Christopher De Sa, Kunle Olukotun, Christopher Ré

    Abstract: Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions. To speed up Gibbs sampling, there has recently been interest in parallelizing it by executing asynchronously. While empirical results suggest that many models can be efficiently sampled asynchronously, traditional Markov chain analysis does not apply to the asynchronous case, and thus asynch… ▽ More

    Submitted 16 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

  27. arXiv:1602.03557  [pdf, other

    cs.DB

    Old Techniques for New Join Algorithms: A Case Study in RDF Processing

    Authors: Christopher R. Aberger, Susan Tu, Kunle Olukotun, Christopher Ré

    Abstract: Recently there has been significant interest around designing specialized RDF engines, as traditional query processing mechanisms incur orders of magnitude performance gaps on many RDF workloads. At the same time researchers have released new worst-case optimal join algorithms which can be asymptotically better than the join algorithms in traditional engines. In this paper we apply worst-case opti… ▽ More

    Submitted 10 February, 2016; originally announced February 2016.

  28. arXiv:1511.06968  [pdf, other

    cs.DC cs.PL

    Generating Configurable Hardware from Parallel Patterns

    Authors: Raghu Prabhakar, David Koeplinger, Kevin Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, Kunle Olukotun

    Abstract: In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements compared to CPUs for a wide class of applications and are far more flexible than fixed-function ASICs. However, FPGAs are difficult to program. Traditional programmi… ▽ More

    Submitted 22 November, 2015; originally announced November 2015.

  29. arXiv:1510.00756  [pdf, ps, other

    cs.LG

    Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

    Authors: Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

    Abstract: Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results. Theoretical guarantees for its performance are weak: even for tree structured graphs, the mixing time of Gibbs may be exponential in the number of variables. To help understand the behavior of Gibbs sampling, we introduce a new (hyper)graph property, called hierarchy width. We show th… ▽ More

    Submitted 2 October, 2015; originally announced October 2015.

  30. arXiv:1506.06438  [pdf, ps, other

    cs.LG math.OC stat.ML

    Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

    Authors: Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré

    Abstract: Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifica… ▽ More

    Submitted 2 October, 2015; v1 submitted 21 June, 2015; originally announced June 2015.

  31. arXiv:1503.02368  [pdf, other

    cs.DB

    EmptyHeaded: A Relational Engine for Graph Processing

    Authors: Christopher R. Aberger, Susan Tu, Kunle Olukotun, Christopher Ré

    Abstract: There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grai… ▽ More

    Submitted 5 January, 2017; v1 submitted 9 March, 2015; originally announced March 2015.

  32. arXiv:1411.1134  [pdf, ps, other

    cs.LG math.OC stat.ML

    Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

    Authors: Christopher De Sa, Kunle Olukotun, Christopher Ré

    Abstract: Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within… ▽ More

    Submitted 10 February, 2015; v1 submitted 4 November, 2014; originally announced November 2014.

  33. arXiv:1206.6466  [pdf

    cs.NE cs.MS cs.PL

    Utilizing Static Analysis and Code Generation to Accelerate Neural Networks

    Authors: Lawrence McAfee, Kunle Olukotun

    Abstract: As datasets continue to grow, neural network (NN) applications are becoming increasingly limited by both the amount of available computational power and the ease of developing high-performance applications. Researchers often must have expert systems knowledge to make their algorithms run efficiently. Although available computing power increases rapidly each year, algorithm efficiency is not able t… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  34. Building-Blocks for Performance Oriented DSLs

    Authors: Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, Kunle Olukotun

    Abstract: Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware is bec… ▽ More

    Submitted 4 September, 2011; originally announced September 2011.

    Comments: In Proceedings DSL 2011, arXiv:1109.0323

    Journal ref: EPTCS 66, 2011, pp. 93-117