Skip to main content

Showing 1–33 of 33 results for author: Aiken, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18791  [pdf, ps, other

    math.NA cs.LO

    Automatic Verification of Floating-Point Accumulation Networks

    Authors: David K. Zhang, Alex Aiken

    Abstract: Floating-point accumulation networks (FPANs) are key building blocks used in many floating-point algorithms, including compensated summation and double-double arithmetic. FPANs are notoriously difficult to analyze, and algorithms using FPANs are often published without rigorous correctness proofs. In fact, on at least one occasion, a published error bound for a widely used FPAN was later found to… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted at CAV 2025. Open-source implementation available at: https://github.com/dzhang314/FPANVerifier

  2. arXiv:2505.14615  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.LO

    SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas

    Authors: Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken

    Abstract: We introduce SATBench, a benchmark for evaluating the logical reasoning capabilities of large language models (LLMs) through logical puzzles derived from Boolean satisfiability (SAT) problems. Unlike prior work that focuses on inference rule-based reasoning, which often involves deducing conclusions from a set of premises, our approach leverages the search-based nature of SAT problems, where the o… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  3. arXiv:2505.11480  [pdf, ps, other

    cs.CL cs.AI cs.PF cs.PL cs.SE

    Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

    Authors: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken

    Abstract: Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  4. arXiv:2504.15659  [pdf, other

    cs.AR cs.AI cs.CL cs.LG cs.SE

    VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation

    Authors: Anjiang Wei, Huanmi Tan, Tarun Suresh, Daniel Mendoza, Thiago S. F. X. Teixeira, Ke Wang, Caroline Trippel, Alex Aiken

    Abstract: Recent advances in Large Language Models (LLMs) have sparked growing interest in applying them to Electronic Design Automation (EDA) tasks, particularly Register Transfer Level (RTL) code generation. While several RTL datasets have been introduced, most focus on syntactic validity rather than functional validation with tests, leading to training examples that compile but may not implement the inte… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  5. arXiv:2504.07004  [pdf, other

    cs.PL

    Task-Based Tensor Computations on Modern GPUs

    Authors: Rohan Yadav, Michael Garland, Alex Aiken, Michael Bauer

    Abstract: Domain-specific, fixed-function units are becoming increasingly common in modern processors. As the computational demands of applications evolve, the capabilities and programming interfaces of these fixed-function units continue to change. NVIDIA's Hopper GPU architecture contains multiple fixed-function units per compute unit, including an asynchronous data movement unit (TMA) and an asynchronous… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  6. arXiv:2503.23145  [pdf, other

    cs.PL cs.AI cs.CL cs.LG

    CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

    Authors: Anjiang Wei, Tarun Suresh, Jiannan Cao, Naveen Kannan, Yuheng Wu, Kai Yan, Thiago S. F. X. Teixeira, Ke Wang, Alex Aiken

    Abstract: Inductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural language, their ability to perform inductive program synthesis is underexplored. Existing evaluation protocols rely on static sets of examples and held-out tes… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  7. arXiv:2503.18952  [pdf

    cs.CY

    Reclaiming the Future: American Information Technology Leadership in an Era of Global Competition

    Authors: Alex Aiken, David Jensen, Catherine Gill, William Gropp, Peter Harsha, Brian Mosley, Daniel Reed, William Regli

    Abstract: The United States risks losing its global leadership in information technology research due to declining basic research funding, challenges in attracting talent, and tensions between research security and openness.

    Submitted 3 March, 2025; originally announced March 2025.

  8. arXiv:2502.12466  [pdf, other

    cs.LG cs.AI cs.CL cs.PL cs.SE

    EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking

    Authors: Anjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken

    Abstract: As large language models (LLMs) become integral to code-related tasks, a central question emerges: do LLMs truly understand program execution semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a… ▽ More

    Submitted 20 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  9. arXiv:2410.15625  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Improving Parallel Program Performance with LLM Optimizers via Agent-System Interface

    Authors: Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken

    Abstract: Modern scientific discovery increasingly relies on high-performance computing for complex modeling and simulation. A key challenge in improving parallel program performance is efficiently mapping tasks to processors and data to memory, a process dictated by intricate, low-level system code known as mappers. Developing high-performance mappers demands days of manual tuning, posing a significant bar… ▽ More

    Submitted 31 January, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

  10. arXiv:2408.05962  [pdf, other

    cs.DC

    HiCCL: A Hierarchical Collective Communication Library

    Authors: Mert Hidayetoglu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku Surana, Wen-mei Hwu, William Gropp, Alex Aiken

    Abstract: HiCCL (Hierarchical Collective Communication Library) addresses the growing complexity and diversity in high-performance network architectures. As GPU systems have envolved into networks of GPUs with different multilevel communication hierarchies, optimizing each collective function for a specific system has become a challenging task. Consequently, many collective libraries struggle to adapt to di… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2406.18111  [pdf, other

    cs.DC

    Automatic Tracing in Task-Based Runtime Systems

    Authors: Rohan Yadav, Michael Bauer, David Broman, Michael Garland, Alex Aiken, Fredrik Kjolstad

    Abstract: Implicitly parallel task-based runtime systems often perform dynamic analysis to discover dependencies in and extract parallelism from sequential programs. Dependence analysis becomes expensive as task granularity drops below a threshold. Tracing techniques have been developed where programmers annotate repeated program fragments (traces) issued by the application, and the runtime system memoizes… ▽ More

    Submitted 16 December, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  12. arXiv:2406.18109  [pdf, other

    cs.DC

    Composing Distributed Computations Through Task and Kernel Fusion

    Authors: Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within… ▽ More

    Submitted 16 December, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2403.08062  [pdf, other

    cs.DC cs.DB

    Efficient Fault Tolerance for Pipelined Query Engines via Write-ahead Lineage

    Authors: Ziheng Wang, Alex Aiken

    Abstract: Modern distributed pipelined query engines either do not support intra-query fault tolerance or employ high-overhead approaches such as persisting intermediate outputs or checkpointing state. In this work, we present write-ahead lineage, a novel fault recovery technique that combines Spark's lineage-based replay and write-ahead logging. Unlike Spark, where the lineage is determined before query ex… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: ICDE 2024 (copyright IEEE)

  14. arXiv:2304.14406  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

    Authors: Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh

    Abstract: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances. Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition. W… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. Project page with code: https://sumith1896.github.io/affordance-insertion/

  15. arXiv:2301.13464  [pdf, other

    cs.LG

    Training with Mixed-Precision Floating-Point Assignments

    Authors: Wonyeol Lee, Rahul Sharma, Alex Aiken

    Abstract: When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment -- a mapping from all tensors (arising in training) to precision levels (high or low) -- that keeps most of th… ▽ More

    Submitted 23 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Published in TMLR

  16. arXiv:2301.13370  [pdf, other

    cs.LG stat.ML

    On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters

    Authors: Wonyeol Lee, Sejun Park, Alex Aiken

    Abstract: Recent work has shown that forward- and reverse- mode automatic differentiation (AD) over the reals is almost always correct in a mathematically precise sense. However, actual programs work with machine-representable numbers (e.g., floating-point numbers), not reals. In this paper, we study the correctness of AD when the parameter space of a neural network consists solely of machine-representable… ▽ More

    Submitted 6 June, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: To appear at ICML 2023

  17. arXiv:2207.13901  [pdf, other

    cs.DC cs.PL

    SpDISTAL: Compiling Distributed Sparse Tensor Computations

    Authors: Rohan Yadav, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce SpDISTAL, a compiler for sparse tensor algebra that targets distributed systems. SpDISTAL combines separate descriptions of tensor algebra expressions, sparse data structures, data distribution, and computation distribution. Thus, it enables distributed execution of sparse tensor algebra expressions with a wide variety of sparse data structures and data distributions. SpDISTAL is impl… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  18. arXiv:2206.13502  [pdf, other

    cs.CV cs.AI cs.GR cs.LG stat.ML

    Programmatic Concept Learning for Human Motion Description and Synthesis

    Authors: Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

    Abstract: We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low-level motion and high-level description as motion concepts. This representation enables human motion description, interactive editing, and controlled synthesis of novel video sequences within a single framework. We present an architecture that learns this concept representation… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: CVPR 2022. Project page: https://sumith1896.github.io/motion-concepts/

  19. arXiv:2205.01848  [pdf, other

    cs.LG cs.DC

    Optimizing Mixture of Experts using Dynamic Recompilations

    Authors: Ferdinand Kossmann, Zhihao Jia, Alex Aiken

    Abstract: The Mixture of Experts architecture allows for outrageously large neural networks by scaling model parameter size independently from computational demand (FLOPs). However, current DNN frameworks cannot effectively support the dynamic data flow in Mixture of Experts, and implementations on top of these frameworks need to use workarounds that introduce significant overheads. To address the limitatio… ▽ More

    Submitted 2 August, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: 13 pages, 15 figures

  20. arXiv:2204.09033  [pdf, other

    cs.PL quant-ph

    Quartz: Superoptimization of Quantum Circuits (Extended Version)

    Authors: Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar, Zhihao Jia

    Abstract: Existing quantum compilers optimize quantum circuits by applying circuit transformations designed by experts. This approach requires significant manual effort to design and implement circuit transformations for different quantum devices, which use different gate sets, and can miss optimizations that are hard to find manually. We propose Quartz, a quantum circuit superoptimizer that automatically g… ▽ More

    Submitted 2 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: 28 pages. Extended version of the paper presented in PLDI 2022. Typos corrected and artifact reference updated

  21. DISTAL: The Distributed Tensor Algebra Compiler

    Authors: Rohan Yadav, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors and computation map onto target machines through separate format and scheduling languages. The combination of choices for data and computation distribution creates a large design space that includes many algorithms from both the pa… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  22. arXiv:2112.05304  [pdf, ps, other

    cs.PL

    Inferring Invariants with Quantifier Alternations: Taming the Search Space Explosion

    Authors: Jason R. Koenig, Oded Padon, Sharon Shoham, Alex Aiken

    Abstract: We present a PDR/IC3 algorithm for finding inductive invariants with quantifier alternations. We tackle scalability issues that arise due to the large search space of quantified invariants by combining a breadth-first search strategy and a new syntactic form for quantifier-free bodies. The breadth-first strategy prevents inductive generalization from getting stuck in regions of the search space th… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: 16 pages, 2 figures, submitted to TACAS 2022

    ACM Class: D.2.4; F.3.1

  23. arXiv:2111.11387  [pdf, other

    quant-ph cs.PL

    Quanto: Optimizing Quantum Circuits with Automatic Generation of Circuit Identities

    Authors: Jessica Pointing, Oded Padon, Zhihao Jia, Henry Ma, Auguste Hirth, Jens Palsberg, Alex Aiken

    Abstract: Existing quantum compilers focus on mapping a logical quantum circuit to a quantum device and its native quantum gates. Only simple circuit identities are used to optimize the quantum circuit during the compilation process. This approach misses more complex circuit identities, which could be used to optimize the quantum circuit further. We propose Quanto, the first quantum optimizer that automatic… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  24. arXiv:2104.11216  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Hierarchical Motion Understanding via Motion Programs

    Authors: Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu

    Abstract: Current approaches to video analysis of human motion focus on raw pixels or keypoints as the basic units of reasoning. We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks. This higher level of abstraction can also capture key features, such as loops of repeated p… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2021. First two authors contributed equally. Project page: https://sumith1896.github.io/motion2prog/

  25. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

    Authors: Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Wonchan Lee, Qinglei Cao, George Bosilca, Seema Mirchandaney, Sean Treichler, Patrick McCormick, Alex Aiken

    Abstract: We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple programming systems by making the implementation for a given system orthogonal to the benchmarks themselves: every benchmark constructed with Task Bench runs on every Tas… ▽ More

    Submitted 17 November, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: 14 pages, 13 figures, published in SC'20: Proceedings of the Conference for High Performance Computing, Networking, Storage and Analysis

  26. arXiv:1906.04908  [pdf, other

    cs.LG cs.CL cs.PL stat.ML

    SPoC: Search-based Pseudocode to Code

    Authors: Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang

    Abstract: We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Under submission to NeurIPS 2019

  27. arXiv:1906.03707  [pdf, other

    cs.LG cs.SI stat.ML

    Redundancy-Free Computation Graphs for Graph Neural Networks

    Authors: Zhihao Jia, Sina Lin, Rex Ying, Jiaxuan You, Jure Leskovec, Alex Aiken

    Abstract: Graph Neural Networks (GNNs) are based on repeated aggregations of information across nodes' neighbors in a graph. However, because common neighbors are shared between different nodes, this leads to repeated and inefficient computations. We propose Hierarchically Aggregated computation Graphs (HAGs), a new GNN graph representation that explicitly avoids redundancy by managing intermediate aggregat… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: 12 pages, 4 figures

  28. arXiv:1807.05358  [pdf, other

    cs.DC

    Beyond Data and Model Parallelism for Deep Neural Networks

    Authors: Zhihao Jia, Matei Zaharia, Alex Aiken

    Abstract: The computational requirements for training deep neural networks (DNNs) have grown to the point that it is now standard practice to parallelize training. Existing deep learning systems commonly use data or model parallelism, but unfortunately, these strategies often result in suboptimal parallelization performance. In this paper, we define a more comprehensive search space of parallelization str… ▽ More

    Submitted 14 July, 2018; originally announced July 2018.

    Report number: Proceedings of the 35th International Conference on Machine Learning (PMLR 80)

  29. arXiv:1802.04924  [pdf, other

    cs.LG cs.DC cs.NE

    Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks

    Authors: Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken

    Abstract: The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale di… ▽ More

    Submitted 9 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

  30. arXiv:1711.03436  [pdf, other

    cs.PL

    Eventually Sound Points-To Analysis with Missing Code

    Authors: Osbert Bastani, Lazaro Clapp, Saswat Anand, Rahul Sharma, Alex Aiken

    Abstract: Static analyses make the increasingly tenuous assumption that all source code is available for analysis; for example, large libraries often call into native code that cannot be analyzed. We propose a points-to analysis that initially makes optimistic assumptions about missing code, and then inserts runtime checks that report counterexamples to these assumptions that occur during execution. Our app… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

  31. arXiv:1711.03239  [pdf, other

    cs.PL

    Active Learning of Points-To Specifications

    Authors: Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang

    Abstract: When analyzing programs, large libraries pose significant challenges to static points-to analysis. A popular solution is to have a human analyst provide points-to specifications that summarize relevant behaviors of library code, which can substantially improve precision and handle missing code such as native code. We propose ATLAS, a tool that automatically infers points-to specifications. ATLAS s… ▽ More

    Submitted 21 May, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

  32. arXiv:1608.01723  [pdf, other

    cs.PL

    Synthesizing Program Input Grammars

    Authors: Osbert Bastani, Rahul Sharma, Alex Aiken, Percy Liang

    Abstract: We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fu… ▽ More

    Submitted 16 June, 2017; v1 submitted 4 August, 2016; originally announced August 2016.

  33. arXiv:1211.0557  [pdf, other

    cs.PF cs.PL

    Stochastic Superoptimization

    Authors: Eric Schkufza, Rahul Sharma, Alex Aiken

    Abstract: We formulate the loop-free, binary superoptimization task as a stochastic search problem. The competing constraints of transformation correctness and performance improvement are encoded as terms in a cost function, and a Markov Chain Monte Carlo sampler is used to rapidly explore the space of all possible programs to find one that is an optimization of a given target program. Although our method s… ▽ More

    Submitted 2 November, 2012; originally announced November 2012.

    Comments: To appear in ASPLOS 2013