Skip to main content

Showing 1–15 of 15 results for author: Wawrzynek, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.21226  [pdf, ps, other

    cs.AR cs.PF

    Recurrent CircuitSAT Sampling for Sequential Circuits

    Authors: Arash Ardakani, Kevin He, John Wawrzynek

    Abstract: In this work, we introduce a novel GPU-accelerated circuit satisfiability (CircuitSAT) sampling technique for sequential circuits. This work is motivated by the requirement in constrained random verification (CRV) to generate input stimuli to validate the functionality of digital hardware circuits. A major challenge in CRV is generating inputs for sequential circuits, along with the appropriate nu… ▽ More

    Submitted 3 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 7 pages

  2. arXiv:2502.08673  [pdf, ps, other

    cs.AI cs.LG

    High-Throughput SAT Sampling

    Authors: Arash Ardakani, Minwoo Kang, Kevin He, Qijing Huang, John Wawrzynek

    Abstract: In this work, we present a novel technique for GPU-accelerated Boolean satisfiability (SAT) sampling. Unlike conventional sampling algorithms that directly operate on conjunctive normal form (CNF), our method transforms the logical constraints of SAT problems by factoring their CNF representations into simplified multi-level, multi-output Boolean functions. It then leverages gradient-based optimiz… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 7 pages

  3. DEMOTIC: A Differentiable Sampler for Multi-Level Digital Circuits

    Authors: Arash Ardakani, Minwoo Kang, Kevin He, Qijing Huang, Vighnesh Iyer, Suhong Moon, John Wawrzynek

    Abstract: Efficient sampling of satisfying formulas for circuit satisfiability (CircuitSAT), a well-known NP-complete problem, is essential in modern front-end applications for thorough testing and verification of digital circuits. Generating such samples is a hard computational problem due to the inherent complexity of digital circuits, size of the search space, and resource constraints involved in the pro… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: 7 pages

  4. arXiv:2407.12282  [pdf, ps, other

    cs.LG cs.AI cs.AR

    Chip Placement with Diffusion Models

    Authors: Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek

    Abstract: Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip. Because key performance metrics of the chip are determined by the placement, optimizing it is crucial. Existing learning-based methods typically fall short because of their reliance on reinforcement learning (RL), which is slow and struggle… ▽ More

    Submitted 10 June, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/vint-1/chipdiffusion

  5. arXiv:2105.01898  [pdf, other

    cs.LG cs.AR

    CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

    Authors: Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao

    Abstract: Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect. While DNN accelerators can take advantage of data reuse and achieve high peak throughput, they also expose a large number of runtime para… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: in Proceedings of the International Symposium on Computer Architecture (ISCA), 2021

  6. arXiv:2104.12766  [pdf, other

    cs.CV

    HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

    Authors: Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K. H. So, Kurt Keutzer

    Abstract: Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to the intractable search space of neural network architectures and hardware accelerator implementation. Differing from existing hardware-aware neural architecture search (NAS) algorithms that rely solely on the expensive learning-based… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Journal ref: FCCM 2021

  7. arXiv:2006.08357  [pdf, other

    cs.CV eess.IV

    CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

    Authors: Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek

    Abstract: Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and… ▽ More

    Submitted 25 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Github repo: https://github.com/DequanWang/CoDeNet arXiv:2002.08357 is the preliminary version of this paper

    Journal ref: FPGA 2021

  8. arXiv:2005.13685  [pdf, other

    cs.DC cs.AI cs.LG cs.PF cs.PL

    ProTuner: Tuning Programs with Monte Carlo Tree Search

    Authors: Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica

    Abstract: We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing. We build our framework on top of Halide and show that MCTS can outperform the state-of-the-art beam-search algorithm. Unlike beam search, which is guided by greedy intermediate performance comparisons between partial and less mea… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

  9. arXiv:2003.00671  [pdf, other

    cs.DC cs.LG cs.PL

    AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning

    Authors: Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

    Abstract: The performance of the code a compiler generates depends on the order in which it applies the optimization passes. Choosing a good order--often referred to as the phase-ordering problem, is an NP-hard problem. As a result, existing solutions rely on a variety of heuristics. In this paper, we evaluate a new technique to address the phase-ordering problem: deep reinforcement learning. To this end, w… ▽ More

    Submitted 4 March, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: arXiv admin note: text overlap with arXiv:1901.04615

  10. arXiv:2002.08357  [pdf, other

    eess.IV cs.CV

    Algorithm-hardware Co-design for Deformable Convolution

    Authors: Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek

    Abstract: FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection and instance segmentation, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the s… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Journal ref: NeurIPS EMC2 2019

  11. arXiv:1901.04615  [pdf, other

    cs.PL cs.LG

    AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

    Authors: Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

    Abstract: The performance of the code generated by a compiler depends on the order in which the optimization passes are applied. In high-level synthesis, the quality of the generated circuit relates directly to the code generated by the front-end compiler. Choosing a good order--often referred to as the phase-ordering problem--is an NP-hard problem. In this paper, we evaluate a new technique to address the… ▽ More

    Submitted 3 April, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

  12. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

    Authors: Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer

    Abstract: Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In th… ▽ More

    Submitted 10 May, 2020; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: Update to the latest results

  13. arXiv:1704.08802  other

    cs.AR

    Proceedings of the 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017)

    Authors: Hayden Kwok-Hay So, John Wawrzynek

    Abstract: The 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) was held on 22 Feb, 2017 as a co-located workshop at the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017). This year, the program committee selected 3 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume.

    Submitted 5 March, 2019; v1 submitted 27 April, 2017; originally announced April 2017.

    Comments: 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) website: see http://olaf.eecs.berkeley.edu

    ACM Class: C.0; C.1; B.5.2; B.6.3; B.7.2

  14. arXiv:1606.06451  [pdf

    cs.AR

    High Level Synthesis with a Dataflow Architectural Template

    Authors: Shaoyi Cheng, John Wawrzynek

    Abstract: In this work, we present a new approach to high level synthesis (HLS), where high level functions are first mapped to an architectural template, before hardware synthesis is performed. As FPGA platforms are especially suitable for implementing streaming processing pipelines, we perform transformations on conventional high level programs where they are turned into multi-stage dataflow engines [1].… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.

    Comments: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.08149

    Report number: OLAF/2016/03

  15. arXiv:1605.08149  other

    cs.AR

    Proceedings of the 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016)

    Authors: Hayden Kwok-Hay So, John Wawrzynek

    Abstract: The 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) was held on 21 Mar, 2016 as a co-located workshop at the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016). This year, the program committee selected 6 papers and 3 extended abstracts to be presented at the workshop, which are subsequently collected in this online volume.

    Submitted 26 May, 2016; originally announced May 2016.

    Comments: 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) website: see http://olaf.eecs.berkeley.edu

    ACM Class: C.0; C.1; B.5.2; B.6.3; B.7.2