-
STRELA: STReaming ELAstic CGRA Accelerator for Embedded Systems
Authors:
Daniel Vazquez,
Jose Miranda,
Alfonso Rodriguez,
Andres Otero,
Pascuale Davide Schiavone,
David Atienza
Abstract:
Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the parallelizable sections of an application throughout the reconfigurable device while the CPU is in charge of control-intensive sections. This work introduces an elastic Coar…
▽ More
Reconfigurable computing offers a good balance between flexibility and energy efficiency. When combined with software-programmable devices such as CPUs, it is possible to obtain higher performance by spatially distributing the parallelizable sections of an application throughout the reconfigurable device while the CPU is in charge of control-intensive sections. This work introduces an elastic Coarse-Grained Reconfigurable Architecture (CGRA) integrated into an energy-efficient RISC-V-based SoC designed for the embedded domain. The microarchitecture of CGRA supports conditionals and irregular loops, making it adaptable to domain-specific applications. Additionally, we propose specific mapping strategies that enable the efficient utilization of the CGRA for both simple applications, where the fabric is only reconfigured once (one-shot kernel), and more complex ones, where it is necessary to reconfigure the CGRA multiple times to complete them (multi-shot kernels). Large kernels also benefit from the independent memory nodes incorporated to streamline data accesses. Due to the integration of CGRA as an accelerator of the RISC-V processor enables a versatile and efficient framework, providing adaptability, processing capacity, and overall performance across various applications.
The design has been implemented in TSMC 65 nm, achieving a maximum frequency of 250 MHz. It achieves a peak performance of 1.22 GOPs computing one-shot kernels and 1.17 GOPs computing multi-shot kernels. The best energy efficiency is 72.68 MOPs/mW for one-shot kernels and 115.96 MOPs/mW for multi-shot kernels. The design integrates power and clock-gating techniques to tailor the architecture to the embedded domain while maintaining performance. The best speed-ups are 17.63x and 18.61x for one-shot and multi-shot kernels. The best energy savings in the SoC are 9.05x and 11.10x for one-shot and multi-shot kernels.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite
Authors:
Jose Nunez-Yanez,
Andres Otero,
Eduardo de la Torre
Abstract:
In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the T…
▽ More
In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
A Framework for Fast Prototyping of Photo-realistic Environments with Multiple Pedestrians
Authors:
Sara Casao,
Andrés Otero,
Álvaro Serra-Gómez,
Ana C. Murillo,
Javier Alonso-Mora,
Eduardo Montijano
Abstract:
Robotic applications involving people often require advanced perception systems to better understand complex real-world scenarios. To address this challenge, photo-realistic and physics simulators are gaining popularity as a means of generating accurate data labeling and designing scenarios for evaluating generalization capabilities, e.g., lighting changes, camera movements or different weather co…
▽ More
Robotic applications involving people often require advanced perception systems to better understand complex real-world scenarios. To address this challenge, photo-realistic and physics simulators are gaining popularity as a means of generating accurate data labeling and designing scenarios for evaluating generalization capabilities, e.g., lighting changes, camera movements or different weather conditions. We develop a photo-realistic framework built on Unreal Engine and AirSim to generate easily scenarios with pedestrians and mobile robots. The framework is capable to generate random and customized trajectories for each person and provides up to 50 ready-to-use people models along with an API for their metadata retrieval. We demonstrate the usefulness of the proposed framework with a use case of multi-target tracking, a popular problem in real pedestrian scenarios. The notable feature variability in the obtained perception data is presented and evaluated.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Stochastic embeddings of dynamical phenomena through variational autoencoders
Authors:
Constantino A. Garcia,
Paulo Felix,
Jesus M. Presedo,
Abraham Otero
Abstract:
System identification in scenarios where the observed number of variables is less than the degrees of freedom in the dynamics is an important challenge. In this work we tackle this problem by using a recognition network to increase the observed space dimensionality during the reconstruction of the phase space. The phase space is forced to have approximately Markovian dynamics described by a Stocha…
▽ More
System identification in scenarios where the observed number of variables is less than the degrees of freedom in the dynamics is an important challenge. In this work we tackle this problem by using a recognition network to increase the observed space dimensionality during the reconstruction of the phase space. The phase space is forced to have approximately Markovian dynamics described by a Stochastic Differential Equation (SDE), which is also to be discovered. To enable robust learning from stochastic data we use the Bayesian paradigm and place priors on the drift and diffusion terms. To handle the complexity of learning the posteriors, a set of mean field variational approximations to the true posteriors are introduced, enabling efficient statistical inference. Finally, a decoder network is used to obtain plausible reconstructions of the experimental data. The main advantage of this approach is that the resulting model is interpretable within the paradigm of statistical physics. Our validation shows that this approach not only recovers a state space that resembles the original one, but it is also able to synthetize new time series capturing the main properties of the experimental data.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Extreme coverage in 5G Narrowband IoT: a LUT-based strategy to optimize shared channels
Authors:
Emmanuel Luján,
Juan A. Zuloaga Mellino,
Alejandro D. Otero,
Leonardo Rey Vega,
Cecilia G. Galarza,
Esteban E. Mocskos
Abstract:
One of the main challenges in IoT is providing communication support to an increasing number of connected devices. In recent years, narrowband radio technology has emerged to address this situation: Narrowband Internet of Things (NB-IoT), which is now part of 5G. Supporting massive connectivity becomes particularly demanding in extreme coverage scenarios such as underground or deep inside building…
▽ More
One of the main challenges in IoT is providing communication support to an increasing number of connected devices. In recent years, narrowband radio technology has emerged to address this situation: Narrowband Internet of Things (NB-IoT), which is now part of 5G. Supporting massive connectivity becomes particularly demanding in extreme coverage scenarios such as underground or deep inside buildings sites. We propose a novel strategy for these situations focused on optimizing NB-IoT shared channels through the selection of link parameters: modulation and coding scheme, as well as the number of repetitions. These parameters are established by the base station (BS) for each block transmitted until reaching a target block error rate (BLER_t ). A wrong selection of these magnitudes leads to radio resource waste and a decrease in the number of possible concurrent connections. Specifically, our strategy is based on a look-up table (LUT) scheme which is used for rapidly delivering the optimal link parameters given a target QoS. To validate our proposal, we compare with alternative strategies using an open source NB-IoT uplink simulator. The experiments are based on transmitting blocks of 256 bits using an AWGN channel over the NPUSCH. Results show that, especially under extreme conditions, only a few options for link parameters are available, favoring robustness against measurement uncertainties. Our strategy minimizes resource usage in all scenarios of acknowledged mode and remarkably reduces losses in the unacknowledged mode, presenting also substantial gains in performance. We expect to influence future BS software design and implementation, favoring connection support under extreme environments.
△ Less
Submitted 24 December, 2019; v1 submitted 7 August, 2019;
originally announced August 2019.