Skip to main content

Showing 1–11 of 11 results for author: Vissers, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.02310  [pdf, other

    cs.LG cs.AR

    EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

    Authors: Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu

    Abstract: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  2. arXiv:2011.05873  [pdf, ps, other

    cs.LG cs.CV

    FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

    Authors: Ussama Zahid, Giulio Gambardella, Nicholas J. Fraser, Michaela Blott, Kees Vissers

    Abstract: Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy,… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  3. arXiv:2006.01331  [pdf, other

    cs.DC

    Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine

    Authors: Prasanth Chatarasi, Stephen Neuendorffer, Samuel Bayliss, Kees Vissers, Vivek Sarkar

    Abstract: Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that includes novel support for 2D SIMD datapaths and shuffle interconnection network. The current approach to programming the AI Engine relies on a C/C++ API for vector intrinsics. While an advance over assembly-level programming, it requires the programmer to specify a number of low-level operations based on de… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  4. arXiv:1912.07394  [pdf, ps, other

    eess.SP cs.CV cs.LG

    Efficient Error-Tolerant Quantized Neural Network Accelerators

    Authors: Giulio Gambardella, Johannes Kappauf, Michaela Blott, Christoph Doehring, Martin Kumm, Peter Zipf, Kees Vissers

    Abstract: Neural Networks are currently one of the most widely deployed machine learning algorithms. In particular, Convolutional Neural Networks (CNNs), are gaining popularity and are evaluated for deployment in safety critical applications such as self driving vehicles. Modern CNNs feature enormous memory bandwidth and high computational needs, challenging existing hardware platforms to meet throughput, l… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: 6 pages, 5 figures

    Journal ref: 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

  5. arXiv:1906.11879  [pdf, other

    cs.CV eess.IV

    Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

    Authors: Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, Phillip H. Jones

    Abstract: Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determin… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

    Comments: 8 pages, Design Automation Conference (DAC), The 15th IEEE International Conference on Embedded Software and Systems, 2019

  6. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

    Authors: Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer

    Abstract: Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In th… ▽ More

    Submitted 10 May, 2020; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: Update to the latest results

  7. arXiv:1701.03400  [pdf, other

    cs.CV cs.LG

    Scaling Binarized Neural Networks on Reconfigurable Logic

    Authors: Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. They are particularly well suited to reconfigurable logic devices, which contain an abundance of fine-grained compute resources and can result in smaller, lower power implementations, or conversely in higher classification rates. Towards this end, the… ▽ More

    Submitted 27 January, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

    Comments: To appear in the PARMA-DITAM workshop at HiPEAC 2017, January 2017

  8. arXiv:1612.07119  [pdf, other

    cs.CV cs.AR cs.LG

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Authors: Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optim… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.

    Comments: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 2017

  9. arXiv:1508.06830  [pdf

    cs.DC cs.PF

    Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC

    Authors: Daniel Jiménez-González, Carlos Álvarez, Antonio Filgueras, Xavier Martorell, Jan Langer, Juanjo Noguera, Kees Vissers

    Abstract: Heterogeneous computing is emerging as a mandatory requirement for power-efficient system design. With this aim, modern heterogeneous platforms like Zynq All-Programmable SoC, that integrates ARM-based SMP and programmable logic, have been designed. However, those platforms introduce large design cycles consisting on hardware/software partitioning, decisions on granularity and number of hardware a… ▽ More

    Submitted 27 August, 2015; originally announced August 2015.

    Comments: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320)

    Report number: FSP/2015/07

  10. arXiv:1408.5387  [pdf

    cs.OH

    High-Level Synthesis Case Study: Implementation of a Memcached Server

    Authors: Kimon Karras, Michaela Blott, Kees Vissers

    Abstract: High-Level Synthesis (HLS) aspires to raise the level of abstraction in hardware design without sacrificing hardware efficiency. It has so far been successfully employed in signal and video processing but has found only limited use in other areas. This paper utilizes a commercial HLS tool, namely Vivado(R) HLS, to implement the processing of a common data center application, the Key-Value Store (K… ▽ More

    Submitted 21 August, 2014; originally announced August 2014.

    Comments: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423)

    Report number: FSP/2014/15

  11. arXiv:0710.4716  [pdf

    cs.AR

    Optimized Generation of Data-Path from C Codes for FPGAs

    Authors: Zhi Guo, Betul Buyukkurt, Walid Najjar, Kees Vissers

    Abstract: FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers. ROCCC is a compiler designed to generate circuits from C source… ▽ More

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

    Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)