-
Wildcat: Educational RISC-V Microprocessors
Authors:
Martin Schoeberl
Abstract:
In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes a…
▽ More
In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes and for implementing embedded processors in FPGAs and ASICs. We analyze resource costs and maximum clock frequency of various designs implemented in an FPGA, using clock frequency as a performance proxy. Additionally, we validate these results with ASIC designs synthesized using the open-source SkyWater130 process.
Contradictory to common wisdom, a longer pipeline (up to 5 stages) does not necessarily always increase the maximum clock frequency. In two FPGA and one ASIC implementation, we discovered that a four- or five-stage pipeline leads to a slower clock frequency than a three-stage implementation. The reason is that the width of the forwarding multiplexer in the execution stage increases with longer pipelines, which is on the critical path. We also argue that a 3-stage pipeline organization is more adequate for teaching a pipeline organization of a microprocessor.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting
Authors:
Riccardo Miccini,
Alaa Zniber,
Clément Laroche,
Tobias Piechowiak,
Martin Schoeberl,
Luca Pezzarossa,
Ouassim Karrakchou,
Jens Sparsø,
Mounir Ghogho
Abstract:
Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the in…
▽ More
Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Energy Consumption and Performance of Heapsort in Hardware and Software
Authors:
Maja H. Kirkeby,
Thomas Krabben,
Mathias Larsen,
Maria B. Mikkelsen,
Tjark Petersen,
Mads Rosendahl,
Martin Schoeberl,
Martin Sundman
Abstract:
In this poster abstract we will report on a case study on implementing the Heapsort algorithm in hardware and software and comparing their time and energy consumption. Our experiment shows that the Hardware implementation is more energy efficient, but slower than the Software implementation due to a low clock frequency. It also indicate that the optimal degree of parallelization differs when optim…
▽ More
In this poster abstract we will report on a case study on implementing the Heapsort algorithm in hardware and software and comparing their time and energy consumption. Our experiment shows that the Hardware implementation is more energy efficient, but slower than the Software implementation due to a low clock frequency. It also indicate that the optimal degree of parallelization differs when optimizing for time compared to optimizing for time.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Towards Comparing Performance of Algorithms in Hardware and Software
Authors:
Maja H. Kirkeby,
Martin Schoeberl
Abstract:
In this paper, we report on a preliminary investigation of the potential performance gain of programs implemented in field-programmable gate arrays (FPGAs) using a high-level language Chisel compared to ordinary high-level software implementations executed on general-purpose computers and small and cheap computers. FPGAs inherently support parallel evaluations, while sequential computers do not. F…
▽ More
In this paper, we report on a preliminary investigation of the potential performance gain of programs implemented in field-programmable gate arrays (FPGAs) using a high-level language Chisel compared to ordinary high-level software implementations executed on general-purpose computers and small and cheap computers. FPGAs inherently support parallel evaluations, while sequential computers do not. For this preliminary investigation, we have chosen a highly parallelizable program as a case study to show an upper bound of performance gain. The purpose is to demonstrate whether or not programming FPGAs has the potential for performance optimizations of ordinary programs. We have developed and evaluated Conway's Game of Life for an FPGA, a small and cheap computer Raspberry Pi 4, and a MacBook Pro Laptop. We have compared the performance of programs over different input sizes to decide the relative increase in runtime.
△ Less
Submitted 25 May, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Open-Source Verification with Chisel and Scala
Authors:
Andrew Dobis,
Tjark Petersen,
Kasper Juul Hesse Rasmussen,
Enrico Tolotto,
Hans Jakob Damsgaard,
Simon Thye Andersen,
Richard Lin,
Martin Schoeberl
Abstract:
Performance increase with general-purpose processors has come to a halt. We can no longer depend on Moore's Law to increase computing performance. The only way to achieve higher performance or lower energy consumption is by building domain-specific hardware accelerators. To efficiently design and verify those domain-specific accelerators, we need agile hardware development. One of the main obstacl…
▽ More
Performance increase with general-purpose processors has come to a halt. We can no longer depend on Moore's Law to increase computing performance. The only way to achieve higher performance or lower energy consumption is by building domain-specific hardware accelerators. To efficiently design and verify those domain-specific accelerators, we need agile hardware development. One of the main obstacles when proposing such a modern method is the lack of modern tools to attack it. To be able to verify a design in such a time-constrained development method, one needs to have efficient tools both for design and verification.
This paper thus proposes ChiselVerify, an open-source tool for verifying circuits described in any Hardware Description Language. It builds on top of the Chisel hardware construction language and uses Scala to drive the verification using a testing strategy inspired by the Universal Verification Methodology (UVM) and adapted for designs described in Chisel. ChiselVerify is created based on three key ideas. First, our solution highly increases the productivity of the verification engineer, by allowing hardware testing to be done in a modern high-level programming environment. Second, the framework functions with any hardware description language thanks to the flexibility of Chisel blackboxes. Finally, the solution is well integrated into the existing Chisel universe, making it an extension of currently existing testing libraries.
We implement ChiselVerify in a way inspired by the functionalities found in SystemVerilog. This allows one to use functional coverage, constrained-random verification, bus functional models, transaction-level modeling and much more during the verification process of a design in a contemporary high-level programming ecosystem.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
Embedded-physics machine learning for coarse-graining and collective variable discovery without data
Authors:
Markus Schöberl,
Nicholas Zabaras,
Phaedon-Stelios Koutsourelakis
Abstract:
We present a novel learning framework that consistently embeds underlying physics while bypassing a significant drawback of most modern, data-driven coarse-grained approaches in the context of molecular dynamics (MD), i.e., the availability of big data. The generation of a sufficiently large training dataset poses a computationally demanding task, while complete coverage of the atomistic configura…
▽ More
We present a novel learning framework that consistently embeds underlying physics while bypassing a significant drawback of most modern, data-driven coarse-grained approaches in the context of molecular dynamics (MD), i.e., the availability of big data. The generation of a sufficiently large training dataset poses a computationally demanding task, while complete coverage of the atomistic configuration space is not guaranteed. As a result, the explorative capabilities of data-driven coarse-grained models are limited and may yield biased "predictive" tools. We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field. Rather than separating model learning from the data-generation procedure - the latter relies on simulating atomistic motions governed by force fields - we query the atomistic force field at sample configurations proposed by the predictive coarse-grained model. Thus, learning relies on the evaluation of the force field but does not require any MD simulation. The resulting generative coarse-grained model serves as an efficient surrogate model for predicting atomistic configurations and estimating relevant observables. Beyond obtaining a predictive coarse-grained model, we demonstrate that in the discovered lower-dimensional representation, the collective variables (CVs) are related to physicochemical properties, which are essential for gaining understanding of unexplored complex systems. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Predictive Collective Variable Discovery with Deep Bayesian Models
Authors:
Markus Schöberl,
Nicholas Zabaras,
Phaedon-Stelios Koutsourelakis
Abstract:
Extending spatio-temporal scale limitations of models for complex atomistic systems considered in biochemistry and materials science necessitates the development of enhanced sampling methods. The potential acceleration in exploring the configurational space by enhanced sampling methods depends on the choice of collective variables (CVs). In this work, we formulate the discovery of CVs as a Bayesia…
▽ More
Extending spatio-temporal scale limitations of models for complex atomistic systems considered in biochemistry and materials science necessitates the development of enhanced sampling methods. The potential acceleration in exploring the configurational space by enhanced sampling methods depends on the choice of collective variables (CVs). In this work, we formulate the discovery of CVs as a Bayesian inference problem and consider the CVs as hidden generators of the full-atomistic trajectory. The ability to generate samples of the fine-scale atomistic configurations using limited training data allows us to compute estimates of observables as well as our probabilistic confidence on them. The methodology is based on emerging methodological advances in machine learning and variational inference. The discovered CVs are related to physicochemical properties which are essential for understanding mechanisms especially in unexplored complex systems. We provide a quantitative assessment of the CVs in terms of their predictive ability for alanine dipeptide (ALA-2) and ALA-15 peptide.
△ Less
Submitted 16 January, 2019; v1 submitted 18 September, 2018;
originally announced September 2018.