Skip to main content

Showing 1–4 of 4 results for author: Lemieux, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2009.10976  [pdf, other

    cs.NE cs.AR cs.LG

    Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

    Authors: Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, Mieszko Lis

    Abstract: The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are optimized for the access patterns during $\mathit{inference}$, however, they do not efficiently support the emerging sparse $\mathit{training}$ techniques. In this… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: Appears in the Proceedings of the 53$^\mathit{rd}$ IEEE/ACM International Symposium on Microarchitecture (MICRO 2020)

  2. arXiv:1903.06630  [pdf, other

    cs.DC cs.CV cs.OH

    TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW

    Authors: Guy G. F. Lemieux, Joe Edwards, Joel Vandergriendt, Aaron Severance, Ryan De Iaco, Abdullah Raouf, Hussein Osman, Tom Watzka, Satwant Singh

    Abstract: Reduced-precision arithmetic improves the size, cost, power and performance of neural networks in digital logic. In convolutional neural networks, the use of 1b weights can achieve state-of-the-art error rates while eliminating multiplication, reducing storage and improving power efficiency. The BinaryConnect binary-weighted system, for example, achieves 9.9% error using floating-point activations… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

    Comments: Presented at 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) arXiv:1704.08802

    Report number: OLAF/2017/06

  3. arXiv:1806.06949  [pdf, other

    cs.LG stat.ML

    Full deep neural network training on a pruned weight budget

    Authors: Maximilian Golub, Guy Lemieux, Mieszko Lis

    Abstract: We introduce a DNN training technique that learns only a fraction of the full parameter set without incurring an accuracy penalty. To do this, our algorithm constrains the total number of weights updated during backpropagation to those with the highest total gradients. The remaining weights are not tracked, and their initial value is regenerated at every access to avoid storing them in memory. Thi… ▽ More

    Submitted 23 November, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

    Journal ref: Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA, 2019

  4. arXiv:1606.03717  [pdf

    cs.AR

    Automated Space/Time Scaling of Streaming Task Graph

    Authors: Hossein Omidian, Guy G. F. Lemieux

    Abstract: In this paper, we describe a high-level synthesis (HLS) tool that automatically allows area/throughput trade-offs for implementing streaming task graphs (STG). Our tool targets a massively parallel processor array (MPPA) architecture, very similar to the Ambric MPPA chip architecture, which is to be implemented as an FPGA overlay. Similar to Ambric tools, our HLS tool accepts a STG as input writte… ▽ More

    Submitted 12 June, 2016; originally announced June 2016.

    Comments: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.08149

    Report number: OLAF/2016/01