Skip to main content

Showing 1–10 of 10 results for author: Preußer, T B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.12359  [pdf, other

    cs.CV cs.AI cs.AR cs.LG cs.PF

    Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

    Authors: Shivam Aggarwal, Hans Jakob Damsgaard, Alessandro Pappalardo, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra

    Abstract: Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware c… ▽ More

    Submitted 5 July, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted in FPL (International Conference on Field-Programmable Logic and Applications) 2024 conference. Revised with updated results

  2. Understanding and Fixing Complex Faults in Embedded Cyberphysical Systems

    Authors: Alexander Weiss, Smitha Gautham, Athira Varma Jayakumar, Carl Elks, D. Richard Kuhn, Raghu N. Kacker, Thomas B. Preusser

    Abstract: Understanding fault types can lead to novel approaches to debugging and runtime verification. Dealing with complex faults, particularly in the challenging area of embedded systems, craves for more powerful tools, which are now becoming available to engineers.

    Submitted 5 February, 2021; originally announced February 2021.

  3. arXiv:2010.05894  [pdf, other

    cs.AR cs.AI cs.IR cs.LG

    MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

    Authors: Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso

    Abstract: Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this pape… ▽ More

    Submitted 19 February, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: Accepted by MLSys'21 (the 4th Conference on Machine Learning and Systems)

  4. arXiv:2005.13332  [pdf, other

    cs.DC

    HyperLogLog Sketch Acceleration on FPGA

    Authors: Amit Kulkarni, Monica Chiosa, Thomas B. Preußer, Kaan Kara, David Sidler, Gustavo Alonso

    Abstract: Data sketches are a set of widely used approximated data summarizing techniques. Their fundamental property is sub-linear memory complexity on the input cardinality, an important aspect when processing streams or data sets with a vast base domain (URLs, IP addresses, user IDs, etc.). Among the many data sketches available, HyperLogLog has become the reference for cardinality counting (how many dis… ▽ More

    Submitted 20 October, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: This paper was accepted as a full paper to FPL 2020. The latest/full version of this paper is available: https://ieeexplore.ieee.org/document/9221525

  5. arXiv:2004.11080  [pdf, ps, other

    cs.AR eess.SY

    Using DSP Slices as Content-Addressable Update Queues

    Authors: Thomas B. Preußer, Monica Chiosa, Alexander Weiss, Gustavo Alonso

    Abstract: Content-Addressable Memory (CAM) is a powerful abstraction for building memory caches, routing tables and hazard detection logic. Without a native CAM structure available on FPGA devices, their functionality must be emulated using the structural primitives at hand. Such an emulation causes significant overhead in the consumption of the underlying resources, typically general-purpose fabric and on-… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: Submitted to FPL 2020

  6. Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

    Authors: Yaman Umuroglu, Davide Conficconi, Lahiru Rasnayake, Thomas B. Preusser, Magnus Sjalander

    Abstract: Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offerin… ▽ More

    Submitted 11 June, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

    Comments: Invited paper at ACM TRETS as extension of FPL'18 paper arXiv:1806.08862

  7. arXiv:1807.03123  [pdf, other

    cs.CV

    Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic

    Authors: Michaela Blott, Thomas B. Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth OBrien, Yaman Umuroglu, Miriam Leeser

    Abstract: Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point op… ▽ More

    Submitted 26 June, 2018; originally announced July 2018.

  8. Generic and Universal Parallel Matrix Summation with a Flexible Compression Goal for Xilinx FPGAs

    Authors: Thomas B. Preußer

    Abstract: Bit matrix compression is a highly relevant operation in computer arithmetic. Essentially being a multi-operand addition, it is the key operation behind fast multiplication and many higher-level operations such as multiply-accumulate, the computation of the dot product or the implementation of FIR filters. Compressor implementations have been constantly evolving for greater efficiency both in gene… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

  9. Inference of Quantized Neural Networks on Heterogeneous All-Programmable Devices

    Authors: Thomas B. Preußer, Giulio Gambardella, Nicholas Fraser, Michaela Blott

    Abstract: Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasi… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

  10. arXiv:1801.02075  [pdf, ps, other

    cs.LO cs.SE

    QBM - Mapping User-Specified Functions to Programmable Logic through a QBF Satisfiability Problem

    Authors: Thomas B. Preußer

    Abstract: This is a brief overview on the background behind the test set formulas generated by the QBM tool. After establishing its application context, its formal approach to the generation of QBF formulas and the concrete test set formulas are described. Finally, some related work will be credited and the source to obtain the open-source tool will be identified.

    Submitted 6 January, 2018; originally announced January 2018.

    Comments: Instance in Prenex CNF Track of QBFEVAL'17 competition: http://www.qbflib.org/family_detail.php?idFamily=775