Skip to main content

Showing 1–11 of 11 results for author: Ansaloni, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08421  [pdf, other

    cs.AR

    e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications

    Authors: Simone Machetti, Pasquale Davide Schiavone, Lara Orlandic, Darong Huang, Deniz Kasap, Giovanni Ansaloni, David Atienza

    Abstract: Graphics processing units (GPUs) excel at parallel processing, but remain largely unexplored in ultra-low-power edge devices (TinyAI) due to their power and area limitations, as well as the lack of suitable programming frameworks. To address these challenges, this work introduces embedded GPU (e-GPU), an open-source and configurable RISC-V GPU platform designed for TinyAI devices. Its extensive co… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. A flexible framework for early power and timing comparison of time-multiplexed CGRA kernel executions

    Authors: Maxime Henri Aspros, Juan Sapriza, Giovanni Ansaloni, David Atienza

    Abstract: At the intersection between traditional CPU architectures and more specialized options such as FPGAs or ASICs lies the family of reconfigurable hardware architectures, termed Coarse-Grained Reconfigurable Arrays (CGRAs). CGRAs are composed of a 2-dimensional array of processing elements (PE), tightly integrated with each other, each capable of performing arithmetic and logic operations. The vast d… ▽ More

    Submitted 14 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: 22nd ACM International Conference on Computing Frontiers (CF Companion '25), May 28--30, 2025, Cagliari, Italy

    ACM Class: C.3; B.7.1; I.6.5

  3. Systolic Arrays and Structured Pruning Co-design for Efficient Transformers in Edge Systems

    Authors: Pedro Palacios, Rafael Medina, Jean-Luc Rouas, Giovanni Ansaloni, David Atienza

    Abstract: Efficient deployment of resource-intensive transformers on edge devices necessitates cross-stack optimization. We thus study the interrelation between structured pruning and systolic acceleration, matching the size of pruned blocks with the systolic array dimensions. In this setting, computations of pruned weight blocks can be skipped, reducing run-time and energy consumption, but potentially impa… ▽ More

    Submitted 12 May, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

    Comments: 8 pages, GLSVLSI'25

    MSC Class: 68T50 ACM Class: C.3; B.5.1; I.2.7

  4. arXiv:2408.01988  [pdf, other

    cs.LG cs.AI cs.AR

    MetaWearS: A Shortcut in Wearable Systems Lifecycle with Only a Few Shots

    Authors: Alireza Amirshahi, Maedeh H. Toosi, Siamak Mohammadi, Stefano Albini, Pasquale Davide Schiavone, Giovanni Ansaloni, Amir Aminifar, David Atienza

    Abstract: Wearable systems provide continuous health monitoring and can lead to early detection of potential health issues. However, the lifecycle of wearable systems faces several challenges. First, effective model training for new wearable devices requires substantial labeled data from various subjects collected directly by the wearable. Second, subsequent model updates require further extensive labeled d… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  5. arXiv:2402.12834  [pdf, other

    cs.AR

    SAT-based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs

    Authors: Cristian Tirelli, Juan Sapriza, Rubén Rodríguez Álvarez, Lorenzo Ferretti, Benoît Denkinger, Giovanni Ansaloni, José Miranda Calero, David Atienza, Laura Pozzi

    Abstract: Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is compiled onto the platform. State of the Art (SoA) compilation techniques utilize modulo scheduling to minimize the Iteration Interval (II) and u… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  6. LionHeart: A Layer-based Mapping Framework for Heterogeneous Systems with Analog In-Memory Computing Tiles

    Authors: Corey Lammie, Yuxuan Wang, Flavio Ponzina, Joshua Klein, Hadjer Benmeziane, Marina Zapater, Irem Boybat, Abu Sebastian, Giovanni Ansaloni, David Atienza

    Abstract: When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit… ▽ More

    Submitted 24 March, 2025; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Emerging Topics in Computing

  7. arXiv:2312.13000  [pdf, other

    cs.AR cs.AI

    Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

    Authors: Alireza Amirshahi, Giovanni Ansaloni, David Atienza

    Abstract: The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware acc… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  8. arXiv:2212.09358  [pdf, other

    cs.AR

    A Soft SIMD Based Energy Efficient Computing Microarchitecture

    Authors: Pengbo Yu, Alexandre Levisse, Mohit Gupta, Evenblij Timon, Giovanni Ansaloni, Francky Catthoor, David Atienza

    Abstract: The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy efficiency by leveraging opportunities (such as intrinsic parallelism and robustness to quantization errors) exposed by algorithms. We herein address this challen… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 6 pages, 10 figures

  9. arXiv:2209.06108  [pdf, other

    cs.AR eess.IV

    Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

    Authors: Marco Rios, Flavio Ponzina, Alexandre Levisse, Giovanni Ansaloni, David Atienza

    Abstract: By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in m… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  10. ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

    Authors: Joshua Klein, Irem Boybat, Yasir Qureshi, Martino Dazzi, Alexandre Levisse, Giovanni Ansaloni, Marina Zapater, Abu Sebastian, David Atienza

    Abstract: Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support… ▽ More

    Submitted 13 December, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted by IEEE Transactions on Computers, December 2022

    ACM Class: C.4; I.6.0

  11. arXiv:2101.00587  [pdf, other

    cs.AR

    DB4HLS: A Database of High-Level Synthesis Design Space Explorations

    Authors: Lorenzo Ferretti, Jihye Kwon, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca Carloni, Laura Pozzi

    Abstract: High-Level Synthesis (HLS) frameworks allow to easily specify a large number of variants of the same hardware design by only acting on optimization directives. Nonetheless, the hardware synthesis of implementations for all possible combinations of directive values is impractical even for simple designs. Addressing this shortcoming, many HLS Design Space Exploration (DSE) strategies have been propo… ▽ More

    Submitted 3 January, 2021; originally announced January 2021.