Skip to main content

Showing 1–7 of 7 results for author: Cheung, P Y K

.
  1. Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

    Authors: Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

    Abstract: FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency… ▽ More

    Submitted 2 January, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: Accepted manuscript uploaded 04/12/21. DOA 22/11/21

  2. arXiv:2102.04270  [pdf, other

    cs.LG cs.AR

    Enabling Binary Neural Network Training on the Edge

    Authors: Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

    Abstract: The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the co… ▽ More

    Submitted 24 September, 2023; v1 submitted 8 February, 2021; originally announced February 2021.

  3. arXiv:1910.12625  [pdf, other

    cs.LG cs.CV eess.SP stat.ML

    LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

    Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, i… ▽ More

    Submitted 2 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1904.00938. Accepted manuscript uploaded 02/03/20. DOA 01/03/20

  4. arXiv:1910.10075  [pdf, other

    eess.SP cs.AR cs.LG

    Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

    Authors: Yiren Zhao, Xitong Gao, Xuan Guo, Junyi Liu, Erwei Wang, Robert Mullins, Peter Y. K. Cheung, George Constantinides, Cheng-Zhong Xu

    Abstract: Modern deep Convolutional Neural Networks (CNNs) are computationally demanding, yet real applications often require high throughput and low latency. To help tackle these problems, we propose Tomato, a framework designed to automate the process of generating efficient CNN accelerators. The generated design is pipelined and each convolution layer uses different arithmetics at various precisions. Usi… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: To be published in International Conference on Field Programmable Technology 2019

  5. arXiv:1904.00938  [pdf, other

    cs.LG stat.ML

    LUTNet: Rethinking Inference in FPGA Soft Logic

    Authors: Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binary values. Network binarisation on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the K-LUT, is c… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted manuscript uploaded 01/04/19. DOA 03/03/19

  6. Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going

    Authors: Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms… ▽ More

    Submitted 8 July, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: Accepted manuscript uploaded 21/01/19. DOA 15/01/19

    Journal ref: ACM Comput. Surv. 52, 2, Article 40 (May 2019), 39 pages

  7. arXiv:1807.10577  [pdf, other

    cs.CV

    Accuracy to Throughput Trade-offs for Reduced Precision Neural Networks on Reconfigurable Logic

    Authors: Jiang Su, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Gianluca Durelli, David B. Thomas, Philip Leong, Peter Y. K. Cheung

    Abstract: Modern CNN are typically based on floating point linear algebra based implementations. Recently, reduced precision NN have been gaining popularity as they require significantly less memory and computational resources compared to floating point. This is particularly important in power constrained compute environments. However, in many cases a reduction in precision comes at a small cost to the accu… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: Accepted by ARC 2018