-
Computing with Printed and Flexible Electronics
Authors:
Mehdi B. Tahoori,
Emre Ozer,
Georgios Zervakis,
Konstantinos Balaskas,
Priyanjana Pal
Abstract:
Printed and flexible electronics (PFE) have emerged as the ubiquitous solution for application domains at the extreme edge, where the demands for low manufacturing and operational cost cannot be met by silicon-based computing. Built on mechanically flexible substrates, printed and flexible devices offer unparalleled advantages in terms of form factor, bio-compatibility and sustainability, making t…
▽ More
Printed and flexible electronics (PFE) have emerged as the ubiquitous solution for application domains at the extreme edge, where the demands for low manufacturing and operational cost cannot be met by silicon-based computing. Built on mechanically flexible substrates, printed and flexible devices offer unparalleled advantages in terms of form factor, bio-compatibility and sustainability, making them ideal for emerging and uncharted applications, such as wearable healthcare products or fast-moving consumer goods. Their desirable attributes stem from specialized fabrication technologies, e.g., Pragmatic's FlexIC, where advancements like ultra-thin substrates and specialized printing methods expand their hardware efficiency, and enable penetration to previously unexplored application domains. In recent years, significant focus has been on machine learning (ML) circuits for resource-constrained on-sensor and near-sensor processing, both in the digital and analog domains, as they meet the requirements of target applications by PFE. Despite their advancements, challenges like reliability, device integration and efficient memory design are still prevalent in PFE, spawning several research efforts towards cross-layer optimization and co-design, whilst showing promise for advancing printed and flexible electronics to new domains.
△ Less
Submitted 21 April, 2025;
originally announced May 2025.
-
FLARE: Fault Attack Leveraging Address Reconfiguration Exploits in Multi-Tenant FPGAs
Authors:
Jayeeta Chaudhuri,
Hassan Nassar,
Dennis R. E. Gnad,
Jorg Henkel,
Mehdi B. Tahoori,
Krishnendu Chakrabarty
Abstract:
Modern FPGAs are increasingly supporting multi-tenancy to enable dynamic reconfiguration of user modules. While multi-tenant FPGAs improve utilization and flexibility, this paradigm introduces critical security threats. In this paper, we present FLARE, a fault attack that exploits vulnerabilities in the partial reconfiguration process, specifically while a user bitstream is being uploaded to the F…
▽ More
Modern FPGAs are increasingly supporting multi-tenancy to enable dynamic reconfiguration of user modules. While multi-tenant FPGAs improve utilization and flexibility, this paradigm introduces critical security threats. In this paper, we present FLARE, a fault attack that exploits vulnerabilities in the partial reconfiguration process, specifically while a user bitstream is being uploaded to the FPGA by a reconfiguration manager. Unlike traditional fault attacks that operate during module runtime, FLARE injects faults in the bitstream during its reconfiguration, altering the configuration address and redirecting it to unintended partial reconfigurable regions (PRRs). This enables the overwriting of pre-configured co-tenant modules, disrupting their functionality. FLARE leverages power-wasters that activate briefly during the reconfiguration process, making the attack stealthy and more challenging to detect with existing countermeasures. Experimental results on a Xilinx Pynq FPGA demonstrate the effectiveness of FLARE in compromising multiple user bitstreams during the reconfiguration process.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Modeling and Simulating Emerging Memory Technologies: A Tutorial
Authors:
Yun-Chih Chen,
Tristan Seidl,
Nils Hölscher,
Christian Hakert,
Minh Duy Truong,
Jian-Jia Chen,
João Paulo C. de Lima,
Asif Ali Khan,
Jeronimo Castrillon,
Ali Nezhadi,
Lokesh Siddhu,
Hassan Nassar,
Mahta Mayahinia,
Mehdi Baradaran Tahoori,
Jörg Henkel,
Nils Wilbert,
Stefan Wildermann,
Jürgen Teich
Abstract:
Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a crucial role in architectural exploration and hardware-software co-design. This tutorial presents a simulation toolchain through four detailed case studies, showcasing its applicability to various domai…
▽ More
Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a crucial role in architectural exploration and hardware-software co-design. This tutorial presents a simulation toolchain through four detailed case studies, showcasing its applicability to various domains of system design, including hybrid main-memory and cache, compute-in-memory, and wear-leveling design. These case studies provide the reader with practical insights on customizing the toolchain for their specific research needs. The source code is open-sourced.
△ Less
Submitted 10 March, 2025; v1 submitted 14 February, 2025;
originally announced February 2025.
-
Sequential Printed MLP Circuits for Super TinyML Multi-Sensory Applications
Authors:
Gurol Saglam,
Florentia Afentaki,
Georgios Zervakis,
Mehdi B. Tahoori
Abstract:
Super-TinyML aims to optimize machine learning models for deployment on ultra-low-power application domains such as wearable technologies and implants. Such domains also require conformality, flexibility, and non-toxicity which traditional silicon-based systems cannot fulfill. Printed Electronics (PE) offers not only these characteristics, but also cost-effective and on-demand fabrication. However…
▽ More
Super-TinyML aims to optimize machine learning models for deployment on ultra-low-power application domains such as wearable technologies and implants. Such domains also require conformality, flexibility, and non-toxicity which traditional silicon-based systems cannot fulfill. Printed Electronics (PE) offers not only these characteristics, but also cost-effective and on-demand fabrication. However, Neural Networks (NN) with hundreds of features -- often necessary for target applications -- have not been feasible in PE because of its restrictions such as limited device count due to its large feature sizes. In contrast to the state of the art using fully parallel architectures and limited to smaller classifiers, in this work we implement a super-TinyML architecture for bespoke (application-specific) NNs that surpasses the previous limits of state of the art and enables NNs with large number of parameters. With the introduction of super-TinyML into PE technology, we address the area and power limitations through resource sharing with multi-cycle operation and neuron approximation. This enables, for the first time, the implementation of NNs with up to $35.9\times$ more features and $65.4\times$ more coefficients than the state of the art solutions.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons
Authors:
Florentia Afentaki,
Paula Carolina Lozano Duarte,
Georgios Zervakis,
Mehdi B. Tahoori
Abstract:
Printed electronics technology offers a cost-effectiveand fully-customizable solution to computational needs beyondthe capabilities of traditional silicon technologies, offering ad-vantages such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of printedelectronics, which results in large feature sizes, poses a challengefor integrating complex de…
▽ More
Printed electronics technology offers a cost-effectiveand fully-customizable solution to computational needs beyondthe capabilities of traditional silicon technologies, offering ad-vantages such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of printedelectronics, which results in large feature sizes, poses a challengefor integrating complex designs like those of machine learn-ing (ML) classification systems. Current literature optimizes onlythe Multilayer Perceptron (MLP) circuit within the classificationsystem, while the cost of analog-to-digital converters (ADCs)is overlooked. Printed applications frequently require on-sensorprocessing, yet while the digital classifier has been extensivelyoptimized, the analog-to-digital interfacing, specifically the ADCs,dominates the total area and energy consumption. In this work,we target digital printed MLP classifiers and we propose thedesign of customized ADCs per MLP's input which involvesminimizing the distinct represented numbers for each input,simplifying thus the ADC's circuitry. Incorporating this ADCoptimization in the MLP training, enables eliminating ADC levelsand the respective comparators, while still maintaining highclassification accuracy. Our approach achieves 11.2x lower ADCarea for less than 5% accuracy drop across varying MLPs.
△ Less
Submitted 9 December, 2024; v1 submitted 13 November, 2024;
originally announced November 2024.
-
Hacking the Fabric: Targeting Partial Reconfiguration for Fault Injection in FPGA Fabrics
Authors:
Jayeeta Chaudhuri,
Hassan Nassar,
Dennis R. E. Gnad,
Jorg Henkel,
Mehdi B. Tahoori,
Krishnendu Chakrabarty
Abstract:
FPGAs are now ubiquitous in cloud computing infrastructures and reconfigurable system-on-chip, particularly for AI acceleration. Major cloud service providers such as Amazon and Microsoft are increasingly incorporating FPGAs for specialized compute-intensive tasks within their data centers. The availability of FPGAs in cloud data centers has opened up new opportunities for users to improve applica…
▽ More
FPGAs are now ubiquitous in cloud computing infrastructures and reconfigurable system-on-chip, particularly for AI acceleration. Major cloud service providers such as Amazon and Microsoft are increasingly incorporating FPGAs for specialized compute-intensive tasks within their data centers. The availability of FPGAs in cloud data centers has opened up new opportunities for users to improve application performance by implementing customizable hardware accelerators directly on the FPGA fabric. However, the virtualization and sharing of FPGA resources among multiple users open up new security risks and threats. We present a novel fault attack methodology capable of causing persistent fault injections in partial bitstreams during the process of FPGA reconfiguration. This attack leverages power-wasters and is timed to inject faults into bitstreams as they are being loaded onto the FPGA through the reconfiguration manager, without needing to remain active throughout the entire reconfiguration process. Our experiments, conducted on a Pynq FPGA setup, demonstrate the feasibility of this attack on various partial application bitstreams, such as a neural network accelerator unit and a signal processing accelerator unit.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Design and In-training Optimization of Binary Search ADC for Flexible Classifiers
Authors:
Paula Carolina Lozano Duarte,
Florentia Afentaki,
Georgios Zervakis,
Mehdi B. Tahoori
Abstract:
Flexible Electronics (FE) offer distinct advantages, including mechanical flexibility and low process temperatures, enabling extremely low-cost production. To address the demands of applications such as smart sensors and wearables, flexible devices must be small and operate at low supply voltages. Additionally, target applications often require classifiers to operate directly on analog sensory inp…
▽ More
Flexible Electronics (FE) offer distinct advantages, including mechanical flexibility and low process temperatures, enabling extremely low-cost production. To address the demands of applications such as smart sensors and wearables, flexible devices must be small and operate at low supply voltages. Additionally, target applications often require classifiers to operate directly on analog sensory input, necessitating the use of Analog to Digital Converters (ADCs) to process the sensory data. However, ADCs present serious challenges, particularly in terms of high area and power consumption, especially when considering stringent area and energy budget. In this work, we target common classifiers in this domain such as MLPs and SVMs and present a holistic approach to mitigate the elevated overhead of analog to digital interfacing in FE. First, we propose a novel design for Binary Search ADC that reduces area overhead 2X compared with the state-of-the-art Binary design and up to 5.4X compared with Flash ADC. Next, we present an in-training ADC optimization in which we keep the bare-minimum representations required and simplifying ADCs by removing unnecessary components. Our in-training optimization further reduces on average the area in terms of transistor count of the required ADCs by 5X for less than 1% accuracy loss.
△ Less
Submitted 9 December, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights
Authors:
Soyed Tuhin Ahmed,
Michael Hefenbrock,
Mehdi B. Tahoori
Abstract:
The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model pr…
▽ More
The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs
Authors:
Florentia Afentaki,
Michael Hefenbrock,
Georgios Zervakis,
Mehdi B. Tahoori
Abstract:
Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (…
▽ More
Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.
△ Less
Submitted 14 November, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Enhancing Reliability of Neural Networks at the Edge: Inverted Normalization with Stochastic Affine Transformations
Authors:
Soyed Tuhin Ahmed,
Kamal Danouchi,
Guillaume Prenat,
Lorena Anghel,
Mehdi B. Tahoori
Abstract:
Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their predictions, making them a suitable choice in safety-critical applications. Additionally, their realization using memristor-based in-memory computing (IMC) architectures enables them for resource-constrained edge applications. In addition to predictive uncertainty, however, the ability to be inherently robust to noise in comp…
▽ More
Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their predictions, making them a suitable choice in safety-critical applications. Additionally, their realization using memristor-based in-memory computing (IMC) architectures enables them for resource-constrained edge applications. In addition to predictive uncertainty, however, the ability to be inherently robust to noise in computation is also essential to ensure functional safety. In particular, memristor-based IMCs are susceptible to various sources of non-idealities such as manufacturing and runtime variations, drift, and failure, which can significantly reduce inference accuracy. In this paper, we propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in IMC architectures. To achieve this, we introduce a novel normalization layer combined with stochastic affine transformations. Empirical results in various benchmark datasets show a graceful degradation in inference accuracy, with an improvement of up to $58.11\%$.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
NeuSpin: Design of a Reliable Edge Neuromorphic System Based on Spintronics for Green AI
Authors:
Soyed Tuhin Ahmed,
Kamal Danouchi,
Guillaume Prenat,
Lorena Anghel,
Mehdi B. Tahoori
Abstract:
Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks…
▽ More
Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks (BayNNs) are resource-intensive and face challenges with traditional computing architectures due to the memory wall problem. Computing-in-Memory (CIM) with emerging resistive memories offers a solution by combining memory blocks and computing units for higher efficiency and lower power consumption. However, implementing BayNNs on CIM hardware, particularly with spintronic technologies, presents technical challenges due to variability and manufacturing defects. The NeuSPIN project aims to address these challenges through full-stack hardware and software co-design, developing novel algorithmic and circuit design approaches to enhance the performance, energy-efficiency and robustness of BayNNs on sprintronic-based CIM platforms.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Testing Spintronics Implemented Monte Carlo Dropout-Based Bayesian Neural Networks
Authors:
Soyed Tuhin Ahmed,
Michael Hefenbrock,
Guillaume Prenat,
Lorena Anghel,
Mehdi B. Tahoori
Abstract:
Bayesian Neural Networks (BayNNs) can inherently estimate predictive uncertainty, facilitating informed decision-making. Dropout-based BayNNs are increasingly implemented in spintronics-based computation-in-memory architectures for resource-constrained yet high-performance safety-critical applications. Although uncertainty estimation is important, the reliability of Dropout generation and BayNN co…
▽ More
Bayesian Neural Networks (BayNNs) can inherently estimate predictive uncertainty, facilitating informed decision-making. Dropout-based BayNNs are increasingly implemented in spintronics-based computation-in-memory architectures for resource-constrained yet high-performance safety-critical applications. Although uncertainty estimation is important, the reliability of Dropout generation and BayNN computation is equally important for target applications but is overlooked in existing works. However, testing BayNNs is significantly more challenging compared to conventional NNs, due to their stochastic nature. In this paper, we present for the first time the model of the non-idealities of the spintronics-based Dropout module and analyze their impact on uncertainty estimates and accuracy. Furthermore, we propose a testing framework based on repeatability ranking for Dropout-based BayNN with up to $100\%$ fault coverage while using only $0.2\%$ of training data as test vectors.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Concurrent Self-testing of Neural Networks Using Uncertainty Fingerprint
Authors:
Soyed Tuhin Ahmed,
Mehdi B. tahoori
Abstract:
Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal ef…
▽ More
Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons
Authors:
Florentia Afentaki,
Gurol Saglam,
Argyris Kokkinis,
Kostas Siozios,
Georgios Zervakis,
Mehdi B Tahoori
Abstract:
Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as n…
▽ More
Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.
△ Less
Submitted 14 November, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
On-sensor Printed Machine Learning Classification via Bespoke ADC and Decision Tree Co-Design
Authors:
Giorgos Armeniakos,
Paula L. Duarte,
Priyanjana Pal,
Georgios Zervakis,
Mehdi B. Tahoori,
Dimitrios Soudris
Abstract:
Printed electronics (PE) technology provides cost-effective hardware with unmet customization, due to their low non-recurring engineering and fabrication costs. PE exhibit features such as flexibility, stretchability, porosity, and conformality, which make them a prominent candidate for enabling ubiquitous computing. Still, the large feature sizes in PE limit the realization of complex printed cir…
▽ More
Printed electronics (PE) technology provides cost-effective hardware with unmet customization, due to their low non-recurring engineering and fabrication costs. PE exhibit features such as flexibility, stretchability, porosity, and conformality, which make them a prominent candidate for enabling ubiquitous computing. Still, the large feature sizes in PE limit the realization of complex printed circuits, such as machine learning classifiers, especially when processing sensor inputs is necessary, mainly due to the costly analog-to-digital converters (ADCs). To this end, we propose the design of fully customized ADCs and present, for the first time, a co-design framework for generating bespoke Decision Tree classifiers. Our comprehensive evaluation shows that our co-design enables self-powered operation of on-sensor printed classifiers in all benchmark cases.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Scale-Dropout: Estimating Uncertainty in Deep Neural Networks Using Stochastic Scale
Authors:
Soyed Tuhin Ahmed,
Kamal Danouchi,
Michael Hefenbrock,
Guillaume Prenat,
Lorena Anghel,
Mehdi B. Tahoori
Abstract:
Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. Bayesian Neural Networks (BayNNs) with Dropout as an approximation offer a systematic approach to quantifying uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and computation. Thus, the applicabi…
▽ More
Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. Bayesian Neural Networks (BayNNs) with Dropout as an approximation offer a systematic approach to quantifying uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and computation. Thus, the applicability of BayNNs to edge devices with limited resources or to high-performance applications is challenging. Some of the inherent costs of BayNNs can be reduced by accelerating them in hardware on a Computation-In-Memory (CIM) architecture with spintronic memories and binarizing their parameters. However, numerous stochastic units are required to implement conventional dropout-based BayNN. In this paper, we propose the Scale Dropout, a novel regularization technique for Binary Neural Networks (BNNs), and Monte Carlo-Scale Dropout (MC-Scale Dropout)-based BayNNs for efficient uncertainty estimation. Our approach requires only one stochastic unit for the entire model, irrespective of the model size, leading to a highly scalable Bayesian NN. Furthermore, we introduce a novel Spintronic memory-based CIM architecture for the proposed BayNN that achieves more than $100\times$ energy savings compared to the state-of-the-art. We validated our method to show up to a $1\%$ improvement in predictive performance and superior uncertainty estimates compared to related works.
△ Less
Submitted 11 January, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Spatial-SpinDrop: Spatial Dropout-based Binary Bayesian Neural Network with Spintronics Implementation
Authors:
Soyed Tuhin Ahmed,
Kamal Danouchi,
Michael Hefenbrock,
Guillaume Prenat,
Lorena Anghel,
Mehdi B. Tahoori
Abstract:
Recently, machine learning systems have gained prominence in real-time, critical decision-making domains, such as autonomous driving and industrial automation. Their implementations should avoid overconfident predictions through uncertainty estimation. Bayesian Neural Networks (BayNNs) are principled methods for estimating predictive uncertainty. However, their computational costs and power consum…
▽ More
Recently, machine learning systems have gained prominence in real-time, critical decision-making domains, such as autonomous driving and industrial automation. Their implementations should avoid overconfident predictions through uncertainty estimation. Bayesian Neural Networks (BayNNs) are principled methods for estimating predictive uncertainty. However, their computational costs and power consumption hinder their widespread deployment in edge AI. Utilizing Dropout as an approximation of the posterior distribution, binarizing the parameters of BayNNs, and further to that implementing them in spintronics-based computation-in-memory (CiM) hardware arrays provide can be a viable solution. However, designing hardware Dropout modules for convolutional neural network (CNN) topologies is challenging and expensive, as they may require numerous Dropout modules and need to use spatial information to drop certain elements. In this paper, we introduce MC-SpatialDropout, a spatial dropout-based approximate BayNNs with spintronics emerging devices. Our method utilizes the inherent stochasticity of spintronic devices for efficient implementation of the spatial dropout module compared to existing implementations. Furthermore, the number of dropout modules per network layer is reduced by a factor of $9\times$ and energy consumption by a factor of $94.11\times$, while still achieving comparable predictive performance and uncertainty estimates compared to related works.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
One-Shot Online Testing of Deep Neural Networks Based on Distribution Shift Detection
Authors:
Soyed Tuhin Ahmed,
Mehdi B. Tahoori
Abstract:
Neural networks (NNs) are capable of learning complex patterns and relationships in data to make predictions with high accuracy, making them useful for various tasks. However, NNs are both computation-intensive and memory-intensive methods, making them challenging for edge applications. To accelerate the most common operations (matrix-vector multiplication) in NNs, hardware accelerator architectur…
▽ More
Neural networks (NNs) are capable of learning complex patterns and relationships in data to make predictions with high accuracy, making them useful for various tasks. However, NNs are both computation-intensive and memory-intensive methods, making them challenging for edge applications. To accelerate the most common operations (matrix-vector multiplication) in NNs, hardware accelerator architectures such as computation-in-memory (CiM) with non-volatile memristive crossbars are utilized. Although they offer benefits such as power efficiency, parallelism, and nonvolatility, they suffer from various faults and variations, both during manufacturing and lifetime operations. This can lead to faulty computations and, in turn, degradation of post-mapping inference accuracy, which is unacceptable for many applications, including safety-critical applications. Therefore, proper testing of NN hardware accelerators is required. In this paper, we propose a \emph{one-shot} testing approach that can test NNs accelerated on memristive crossbars with only one test vector, making it very suitable for online testing applications. Our approach can consistently achieve $100\%$ fault coverage across several large topologies with up to $201$ layers and challenging tasks like semantic segmentation. Nevertheless, compared to existing methods, the fault coverage is improved by up to $24\%$, the memory overhead is only $0.0123$ MB, a reduction of up to $19980\times$ and the number of test vectors is reduced by $10000\times$.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Model-to-Circuit Cross-Approximation For Printed Machine Learning Classifiers
Authors:
Giorgos Armeniakos,
Georgios Zervakis,
Dimitrios Soudris,
Mehdi B. Tahoori,
Jörg Henkel
Abstract:
Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs. It also allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. Nevertheless, large feature sizes in PE prohibit the realization of complex ML models in…
▽ More
Printed electronics (PE) promises on-demand fabrication, low non-recurring engineering costs, and sub-cent fabrication costs. It also allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. Nevertheless, large feature sizes in PE prohibit the realization of complex ML models in PE, even with bespoke architectures. In this work, we present an automated, cross-layer approximation framework tailored to bespoke architectures that enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), in PE. Our framework adopts cooperatively a hardware-driven coefficient approximation of the ML model at algorithmic level, a netlist pruning at logic level, and a voltage over-scaling at the circuit level. Extensive experimental evaluation on 12 MLPs and 12 SVMs and more than 6000 approximate and exact designs demonstrates that our model-to-circuit cross-approximation delivers power and area optimal designs that, compared to the state-of-the-art exact designs, feature on average 51% and 66% area and power reduction, respectively, for less than 5% accuracy loss. Finally, we demonstrate that our framework enables 80% of the examined classifiers to be battery-powered with almost identical accuracy with the exact designs, paving thus the way towards smart complex printed applications.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Co-Design of Approximate Multilayer Perceptron for Ultra-Resource Constrained Printed Circuits
Authors:
Giorgos Armeniakos,
Georgios Zervakis,
Dimitrios Soudris,
Mehdi B. Tahoori,
Jörg Henkel
Abstract:
Printed Electronics (PE) exhibits on-demand, extremely low-cost hardware due to its additive manufacturing process, enabling machine learning (ML) applications for domains that feature ultra-low cost, conformity, and non-toxicity requirements that silicon-based systems cannot deliver. Nevertheless, large feature sizes in PE prohibit the realization of complex printed ML circuits. In this work, we…
▽ More
Printed Electronics (PE) exhibits on-demand, extremely low-cost hardware due to its additive manufacturing process, enabling machine learning (ML) applications for domains that feature ultra-low cost, conformity, and non-toxicity requirements that silicon-based systems cannot deliver. Nevertheless, large feature sizes in PE prohibit the realization of complex printed ML circuits. In this work, we present, for the first time, an automated printed-aware software/hardware co-design framework that exploits approximate computing principles to enable ultra-resource constrained printed multilayer perceptrons (MLPs). Our evaluation demonstrates that, compared to the state-of-the-art baseline, our circuits feature on average 6x (5.7x) lower area (power) and less than 1% accuracy loss.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Hardware-Aware Automated Neural Minimization for Printed Multilayer Perceptrons
Authors:
Argyris Kokkinis,
Georgios Zervakis,
Kostas Siozios,
Mehdi B. Tahoori,
Jörg Henkel
Abstract:
The demand of many application domains for flexibility, stretchability, and porosity cannot be typically met by the silicon VLSI technologies. Printed Electronics (PE) has been introduced as a candidate solution that can satisfy those requirements and enable the integration of smart devices on consumer goods at ultra low-cost enabling also in situ and ondemand fabrication. However, the large featu…
▽ More
The demand of many application domains for flexibility, stretchability, and porosity cannot be typically met by the silicon VLSI technologies. Printed Electronics (PE) has been introduced as a candidate solution that can satisfy those requirements and enable the integration of smart devices on consumer goods at ultra low-cost enabling also in situ and ondemand fabrication. However, the large features sizes in PE constraint those efforts and prohibit the design of complex ML circuits due to area and power limitations. Though, classification is mainly the core task in printed applications. In this work, we examine, for the first time, the impact of neural minimization techniques, in conjunction with bespoke circuit implementations, on the area-efficiency of printed Multilayer Perceptron classifiers. Results show that for up to 5% accuracy loss up to 8x area reduction can be achieved.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Approximate Computing and the Efficient Machine Learning Expedition
Authors:
Jörg Henkel,
Hai Li,
Anand Raghunathan,
Mehdi B. Tahoori,
Swagath Venkataramani,
Xiaoxuan Yang,
Georgios Zervakis
Abstract:
Approximate computing (AxC) has been long accepted as a design alternative for efficient system implementation at the cost of relaxed accuracy requirements. Despite the AxC research activities in various application domains, AxC thrived the past decade when it was applied in Machine Learning (ML). The by definition approximate notion of ML models but also the increased computational overheads asso…
▽ More
Approximate computing (AxC) has been long accepted as a design alternative for efficient system implementation at the cost of relaxed accuracy requirements. Despite the AxC research activities in various application domains, AxC thrived the past decade when it was applied in Machine Learning (ML). The by definition approximate notion of ML models but also the increased computational overheads associated with ML applications-that were effectively mitigated by corresponding approximations-led to a perfect matching and a fruitful synergy. AxC for AI/ML has transcended beyond academic prototypes. In this work, we enlighten the synergistic nature of AxC and ML and elucidate the impact of AxC in designing efficient ML systems. To that end, we present an overview and taxonomy of AxC for ML and use two descriptive application scenarios to demonstrate how AxC boosts the efficiency of ML systems.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Approximate Decision Trees For Machine Learning Classification on Tiny Printed Circuits
Authors:
Konstantinos Balaskas,
Georgios Zervakis,
Kostas Siozios,
Mehdi B. Tahoori,
Joerg Henkel
Abstract:
Although Printed Electronics (PE) cannot compete with silicon-based systems in conventional evaluation metrics, e.g., integration density, area and performance, PE offers attractive properties such as on-demand ultra-low-cost fabrication, flexibility and non-toxicity. As a result, it targets application domains that are untouchable by lithography-based silicon electronics and thus have not yet see…
▽ More
Although Printed Electronics (PE) cannot compete with silicon-based systems in conventional evaluation metrics, e.g., integration density, area and performance, PE offers attractive properties such as on-demand ultra-low-cost fabrication, flexibility and non-toxicity. As a result, it targets application domains that are untouchable by lithography-based silicon electronics and thus have not yet seen much proliferation of computing. However, despite the attractive characteristics of PE, the large feature sizes in PE prohibit the realization of complex printed circuits, such as Machine Learning (ML) classifiers. In this work, we exploit the hardware-friendly nature of Decision Trees for machine learning classification and leverage the hardware-efficiency of the approximate design in order to generate approximate ML classifiers that are suitable for tiny, ultra-resource constrained, and battery-powered printed applications.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Cross-Layer Approximation For Printed Machine Learning Circuits
Authors:
Giorgos Armeniakos,
Georgios Zervakis,
Dimitrios Soudris,
Mehdi B. Tahoori,
Jörg Henkel
Abstract:
Printed electronics (PE) feature low non-recurring engineering costs and low per unit-area fabrication costs, enabling thus extremely low-cost and on-demand hardware. Such low-cost fabrication allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. However, even with bespoke ar…
▽ More
Printed electronics (PE) feature low non-recurring engineering costs and low per unit-area fabrication costs, enabling thus extremely low-cost and on-demand hardware. Such low-cost fabrication allows for high customization that would be infeasible in silicon, and bespoke architectures prevail to improve the efficiency of emerging PE machine learning (ML) applications. However, even with bespoke architectures, the large feature sizes in PE constraint the complexity of the ML models that can be implemented. In this work, we bring together, for the first time, approximate computing and PE design targeting to enable complex ML models, such as Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), in PE. To this end, we propose and implement a cross-layer approximation, tailored for bespoke ML architectures. At the algorithmic level we apply a hardware-driven coefficient approximation of the ML model and at the circuit level we apply a netlist pruning through a full search exploration. In our extensive experimental evaluation we consider 14 MLPs and SVMs and evaluate more than 4300 approximate and exact designs. Our results demonstrate that our cross approximation delivers Pareto optimal designs that, compared to the state-of-the-art exact designs, feature 47% and 44% average area and power reduction, respectively, and less than 1% accuracy loss.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Towards Cross-layer Reliability Analysis of Transient and Permanent Faults
Authors:
Hananeh Aliee,
Liang Chen,
Mojtaba Ebrahimi,
Michael Glaß,
Faramarz Khosravi,
Mehdi B. Tahoori
Abstract:
Due to the increasing complexity of Multi-Processor Systems on Chip (MPSoCs), system-level design methodologies have got a lot of attention in recent years. However, the significant gap between the system-level reliability analysis and the level where the actual faults occur necessitates a cross-layer approach in which the sufficient data about the effects of faults at low levels are passed to the…
▽ More
Due to the increasing complexity of Multi-Processor Systems on Chip (MPSoCs), system-level design methodologies have got a lot of attention in recent years. However, the significant gap between the system-level reliability analysis and the level where the actual faults occur necessitates a cross-layer approach in which the sufficient data about the effects of faults at low levels are passed to the system level. So far, the cross-layer reliability analysis techniques focus on a specific type of faults, e.g., either permanent or transient faults. In this work, we aim at proposing a cross-layer reliability analysis which considers different fault types concurrently and connects reliability analysis techniques at different levels of abstraction using adapters.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
An Accurate SER Estimation Method Based on Propagation Probability
Authors:
Ghazanfar Asadi,
Mehdi B. Tahoori
Abstract:
In this paper, we present an accurate but very fast soft error rate (SER) estimation technique for digital circuits based on error propagation probability (EPP) computation. Experiments results and comparison of the results with the random simulation technique show that our proposed method is on average within 6% of the random simulation method and four to five orders of magnitude faster.
In this paper, we present an accurate but very fast soft error rate (SER) estimation technique for digital circuits based on error propagation probability (EPP) computation. Experiments results and comparison of the results with the random simulation technique show that our proposed method is on average within 6% of the random simulation method and four to five orders of magnitude faster.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.