Search | arXiv e-print repository

Compressing Deep Neural Networks Using Explainable AI

Authors: Kimia Soroush, Mohsen Raji, Behnam Ghavami

Abstract: Deep neural networks (DNNs) have demonstrated remarkable performance in many tasks but it often comes at a high computational cost and memory usage. Compression techniques, such as pruning and quantization, are applied to reduce the memory footprint of DNNs and make it possible to accommodate them on resource-constrained edge devices. Recently, explainable artificial intelligence (XAI) methods hav… ▽ More Deep neural networks (DNNs) have demonstrated remarkable performance in many tasks but it often comes at a high computational cost and memory usage. Compression techniques, such as pruning and quantization, are applied to reduce the memory footprint of DNNs and make it possible to accommodate them on resource-constrained edge devices. Recently, explainable artificial intelligence (XAI) methods have been introduced with the purpose of understanding and explaining AI methods. XAI can be utilized to get to know the inner functioning of DNNs, such as the importance of different neurons and features in the overall performance of DNNs. In this paper, a novel DNN compression approach using XAI is proposed to efficiently reduce the DNN model size with negligible accuracy loss. In the proposed approach, the importance score of DNN parameters (i.e. weights) are computed using a gradient-based XAI technique called Layer-wise Relevance Propagation (LRP). Then, the scores are used to compress the DNN as follows: 1) the parameters with the negative or zero importance scores are pruned and removed from the model, 2) mixed-precision quantization is applied to quantize the weights with higher/lower score with higher/lower number of bits. The experimental results show that, the proposed compression approach reduces the model size by 64% while the accuracy is improved by 42% compared to the state-of-the-art XAI-based compression method. △ Less

Submitted 4 July, 2025; originally announced July 2025.

arXiv:2412.09450 [pdf, other]

A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information

Authors: Behnam Ghavami, Mani Sadati, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton

Abstract: Despite the rising prevalence of deep neural networks (DNNs) in cyber-physical systems, their vulnerability to adversarial bit-flip attacks (BFAs) is a noteworthy concern. This paper proposes B3FA, a semi-black-box BFA-based parameter attack on DNNs, assuming the adversary has limited knowledge about the model. We consider practical scenarios often feature a more restricted threat model for real-w… ▽ More Despite the rising prevalence of deep neural networks (DNNs) in cyber-physical systems, their vulnerability to adversarial bit-flip attacks (BFAs) is a noteworthy concern. This paper proposes B3FA, a semi-black-box BFA-based parameter attack on DNNs, assuming the adversary has limited knowledge about the model. We consider practical scenarios often feature a more restricted threat model for real-world systems, contrasting with the typical BFA models that presuppose the adversary's full access to a network's inputs and parameters. The introduced bit-flip approach utilizes a magnitude-based ranking method and a statistical re-construction technique to identify the vulnerable bits. We demonstrate the effectiveness of B3FA on several DNN models in a semi-black-box setting. For example, B3FA could drop the accuracy of a MobileNetV2 from 69.84% to 9% with only 20 bit-flips in a real-world setting. △ Less

Submitted 12 December, 2024; originally announced December 2024.

arXiv:2411.15442 [pdf, other]

Automatic High-quality Verilog Assertion Generation through Subtask-Focused Fine-Tuned LLMs and Iterative Prompting

Authors: Mohammad Shahidzadeh, Behnam Ghavami, Steve Wilton, Lesley Shannon

Abstract: Formal Property Verification (FPV), using SystemVerilog Assertions (SVA), is crucial for ensuring the completeness of design with respect to the specification. However, writing SVA is a laborious task and has a steep learning curve. In this work, we present a large language model (LLM) -based flow to automatically generate high-quality SVA from the design specification documents, named \ToolName.… ▽ More Formal Property Verification (FPV), using SystemVerilog Assertions (SVA), is crucial for ensuring the completeness of design with respect to the specification. However, writing SVA is a laborious task and has a steep learning curve. In this work, we present a large language model (LLM) -based flow to automatically generate high-quality SVA from the design specification documents, named \ToolName. We introduce a novel sub-task-focused fine-tuning approach that effectively addresses functionally incorrect assertions produced by baseline LLMs, leading to a remarkable 7.3-fold increase in the number of functionally correct assertions. Recognizing the prevalence of syntax and semantic errors, we also developed an iterative refinement method that enhances the LLM's initial outputs by systematically re-prompting it to correct identified issues. This process is further strengthened by a custom compiler that generates meaningful error messages, guiding the LLM towards improved accuracy. The experiments demonstrate a 26\% increase in the number of assertions free from syntax errors using this approach, showcasing its potential to streamline the FPV process. △ Less

Submitted 22 November, 2024; originally announced November 2024.

arXiv:2407.04964 [pdf, other]

ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters

Authors: Behnam Ghavami, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton

Abstract: Low-precision weights and activations in deep neural networks (DNNs) outperform their full-precision counterparts in terms of hardware efficiency. When implemented with low-precision operations, specifically in the extreme case where network parameters are binarized (i.e. BNNs), the two most frequently mentioned benefits of quantization are reduced memory consumption and a faster inference process… ▽ More Low-precision weights and activations in deep neural networks (DNNs) outperform their full-precision counterparts in terms of hardware efficiency. When implemented with low-precision operations, specifically in the extreme case where network parameters are binarized (i.e. BNNs), the two most frequently mentioned benefits of quantization are reduced memory consumption and a faster inference process. In this paper, we introduce a third advantage of very low-precision neural networks: improved fault-tolerance attribute. We investigate the impact of memory faults on state-of-the-art binary neural networks (BNNs) through comprehensive analysis. Despite the inclusion of floating-point parameters in BNN architectures to improve accuracy, our findings reveal that BNNs are highly sensitive to deviations in these parameters caused by memory faults. In light of this crucial finding, we propose a technique to improve BNN dependability by restricting the range of float parameters through a novel deliberately uniform quantization. The introduced quantization technique results in a reduction in the proportion of floating-point parameters utilized in the BNN, without incurring any additional computational overheads during the inference stage. The extensive experimental fault simulation on the proposed BNN architecture (i.e. ZOBNN) reveal a remarkable 5X enhancement in robustness compared to conventional floating-point DNN. Notably, this improvement is achieved without incurring any computational overhead. Crucially, this enhancement comes without computational overhead. \ToolName~excels in critical edge applications characterized by limited computational resources, prioritizing both dependability and real-time performance. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04943 [pdf]

doi 10.1109/CSICC58665.2023.10105310

Quantizing YOLOv7: A Comprehensive Study

Authors: Mohammadamin Baghbanbashi, Mohsen Raji, Behnam Ghavami

Abstract: YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devic… ▽ More YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devices is a severe challenge in practice. To overcome this limitation, model compression techniques, such as quantizing parameters to lower-precision values, can be adopted. As the most recent version of YOLO, YOLOv7 achieves such state-of-the-art performance in speed and accuracy in the range of 5 FPS to 160 FPS that it surpasses all former versions of YOLO and other existing models in this regard. So far, the robustness of several quantization schemes has been evaluated on older versions of YOLO. These methods may not necessarily yield similar results for YOLOv7 as it utilizes a different architecture. In this paper, we conduct in-depth research on the effectiveness of a variety of quantization schemes on the pre-trained weights of the state-of-the-art YOLOv7 model. Experimental results demonstrate that using 4-bit quantization coupled with the combination of different granularities results in ~3.92x and ~3.86x memory-saving for uniform and non-uniform quantization, respectively, with only 2.5% and 1% accuracy loss compared to the full-precision baseline model. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Presented at the "2023 28th International Computer Conference, Computer Society of Iran (CSICC)" and indexed in IEEE

ACM Class: I.2.10; I.4.0; I.5.1; E.4

arXiv:2404.02947 [pdf, other]

DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization

Authors: Behnam Ghavami, Amin Kamjoo, Lesley Shannon, Steve Wilton

Abstract: The imperative to deploy Deep Neural Network (DNN) models on resource-constrained edge devices, spurred by privacy concerns, has become increasingly apparent. To facilitate the transition from cloud to edge computing, this paper introduces a technique that effectively reduces the memory footprint of DNNs, accommodating the limitations of resource-constrained edge devices while preserving model acc… ▽ More The imperative to deploy Deep Neural Network (DNN) models on resource-constrained edge devices, spurred by privacy concerns, has become increasingly apparent. To facilitate the transition from cloud to edge computing, this paper introduces a technique that effectively reduces the memory footprint of DNNs, accommodating the limitations of resource-constrained edge devices while preserving model accuracy. Our proposed technique, named Post-Training Intra-Layer Multi-Precision Quantization (PTILMPQ), employs a post-training quantization approach, eliminating the need for extensive training data. By estimating the importance of layers and channels within the network, the proposed method enables precise bit allocation throughout the quantization process. Experimental results demonstrate that PTILMPQ offers a promising solution for deploying DNNs on edge devices with restricted memory resources. For instance, in the case of ResNet50, it achieves an accuracy of 74.57\% with a memory footprint of 9.5 MB, representing a 25.49\% reduction compared to previous similar methods, with only a minor 1.08\% decrease in accuracy. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: The 25th International Symposium on Quality Electronic Design (ISQED'24)

arXiv:2303.12269 [pdf, other]

A Cycle-Accurate Soft Error Vulnerability Analysis Framework for FPGA-based Designs

Authors: Eduardo Rhod, Behnam Ghavami, Zhenman Fang, Lesley Shannon

Abstract: Many aerospace and automotive applications use FPGAs in their designs due to their low power and reconfigurability requirements. Meanwhile, such applications also pose a high standard on system reliability, which makes the early-stage reliability analysis for FPGA-based designs very critical. In this paper, we present a framework that enables fast and accurate early-stage analysis of soft error… ▽ More Many aerospace and automotive applications use FPGAs in their designs due to their low power and reconfigurability requirements. Meanwhile, such applications also pose a high standard on system reliability, which makes the early-stage reliability analysis for FPGA-based designs very critical. In this paper, we present a framework that enables fast and accurate early-stage analysis of soft error vulnerability for small FPGA-based designs. Our framework first extracts the post-synthesis netlist from an FPGA design. Then it inserts the bit-flip configuration faults into the design netlist using our proposed interface software. After that, it seamlessly feeds the golden copy and fault copies of the netlist into the open source simulator Verilator for cycle-accurate simulation. Finally, it generates a histogram of vulnerability scores of the original design to guide the reliability analysis. Experimental results show that our framework runs up to 53x faster than the Xilinx Vivado fault simulation with cycle-level accuracy, when analyzing the injected bit-flip faults on the ITC'99 benchmarks. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.10535 [pdf]

A Decision Making Approach for Chemotherapy Planning based on Evolutionary Processing

Authors: Mina Jafari, Behnam Ghavami, Vahid Sattari Naeini

Abstract: The problem of chemotherapy treatment optimization can be defined in order to minimize the size of the tumor without endangering the patient's health; therefore, chemotherapy requires to achieve a number of objectives, simultaneously. For this reason, the optimization problem turns to a multi-objective problem. In this paper, a multi-objective meta-heuristic method is provided for cancer chemother… ▽ More The problem of chemotherapy treatment optimization can be defined in order to minimize the size of the tumor without endangering the patient's health; therefore, chemotherapy requires to achieve a number of objectives, simultaneously. For this reason, the optimization problem turns to a multi-objective problem. In this paper, a multi-objective meta-heuristic method is provided for cancer chemotherapy with the aim of balancing between two objectives: the amount of toxicity and the number of cancerous cells. The proposed method uses mathematical models in order to measure the drug concentration, tumor growth and the amount of toxicity. This method utilizes a Multi-Objective Particle Swarm Optimization (MOPSO) algorithm to optimize cancer chemotherapy plan using cell-cycle specific drugs. The proposed method can be a good model for personalized medicine as it returns a set of solutions as output that have balanced between different objectives and provided the possibility to choose the most appropriate therapeutic plan based on some information about the status of the patient. Experimental results confirm that the proposed method is able to explore the search space efficiently in order to find out the suitable treatment plan with minimal side effects. This main objective is provided using a desirable designing of chemotherapy drugs and controlling the injection dose. Moreover, results show that the proposed method achieve to a better therapeutic performance compared to a more recent similar method [1]. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.10508 [pdf, ps, other]

Unraveling the Integration of Deep Machine Learning in FPGA CAD Flow: A Concise Survey and Future Insights

Authors: Behnam Ghavami, Lesley Shannon

Abstract: This paper presents an overview of the integration of deep machine learning (DL) in FPGA CAD design flow, focusing on high-level and logic synthesis, placement, and routing. Our analysis identifies key research areas that require more attention in FPGA CAD design, including the development of open-source benchmarks optimized for end-to-end machine learning experiences and the potential of knowledg… ▽ More This paper presents an overview of the integration of deep machine learning (DL) in FPGA CAD design flow, focusing on high-level and logic synthesis, placement, and routing. Our analysis identifies key research areas that require more attention in FPGA CAD design, including the development of open-source benchmarks optimized for end-to-end machine learning experiences and the potential of knowledge-sharing among researchers and industry practitioners to incorporate more intelligence in FPGA CAD decision-making steps. By providing insights into the integration of deep machine learning in FPGA CAD flow, this paper aims to inform future research directions in this exciting and rapidly evolving field. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.02495 [pdf, other]

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

Authors: Ebrahim Farahmand, Ali Mahani, Behnam Ghavami, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on th… ▽ More Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on their leading one positions. Then, a curve-fitting method is employed to map the product term to a linear function, and a piecewise constant error-correction term is used to reduce the approximation error. For computing the piecewise constant error-compensation term, we partition the function space into M segments and compute the compensation factor for each segment by averaging the errors in the segment. The multiplier supports various degrees of truncation and error-compensation to exploit accuracy-efficiency trade-off. The proposed approximate multiplier offers better error metrics such as mean and standard deviation of absolute relative error (MARED and StdARED) compare to a state-of-the-art integer approximate multiplier. The proposed approximate multiplier improves the MARED and StdARED by about 38% and 32% when its energy consumption is about equal to the state-of-the-art approximate multiplier. Moreover, the performance of the proposed approximate multiplier is evaluated in image classification applications using a Deep Neural Network (DNN). The results indicate that the degradation of DNN accuracy is negligible especially due to the compensation properties of our approximate multiplier. △ Less

Submitted 4 May, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2112.13544 [pdf, other]

FitAct: Error Resilient Deep Neural Networks via Fine-Grained Post-Trainable Activation Functions

Authors: Behnam Ghavami, Mani Sadati, Zhenman Fang, Lesley Shannon

Abstract: Deep neural networks (DNNs) are increasingly being deployed in safety-critical systems such as personal healthcare devices and self-driving cars. In such DNN-based systems, error resilience is a top priority since faults in DNN inference could lead to mispredictions and safety hazards. For latency-critical DNN inference on resource-constrained edge devices, it is nontrivial to apply conventional r… ▽ More Deep neural networks (DNNs) are increasingly being deployed in safety-critical systems such as personal healthcare devices and self-driving cars. In such DNN-based systems, error resilience is a top priority since faults in DNN inference could lead to mispredictions and safety hazards. For latency-critical DNN inference on resource-constrained edge devices, it is nontrivial to apply conventional redundancy-based fault tolerance techniques. In this paper, we propose FitAct, a low-cost approach to enhance the error resilience of DNNs by deploying fine-grained post-trainable activation functions. The main idea is to precisely bound the activation value of each individual neuron via neuron-wise bounded activation functions so that it could prevent fault propagation in the network. To avoid complex DNN model re-training, we propose to decouple the accuracy training and resilience training and develop a lightweight post-training phase to learn these activation functions with precise bound values. Experimental results on widely used DNN models such as AlexNet, VGG16, and ResNet50 demonstrate that FitAct outperforms state-of-the-art studies such as Clip-Act and Ranger in enhancing the DNN error resilience for a wide range of fault rates while adding manageable runtime and memory space overheads. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: Accepted in the Design, Automation and Test in Europe Conference (DATE 2022)

arXiv:2112.13162 [pdf, other]

Stealthy Attack on Algorithmic-Protected DNNs via Smart Bit Flipping

Authors: Behnam Ghavami, Seyd Movi, Zhenman Fang, Lesley Shannon

Abstract: Recently, deep neural networks (DNNs) have been deployed in safety-critical systems such as autonomous vehicles and medical devices. Shortly after that, the vulnerability of DNNs were revealed by stealthy adversarial examples where crafted inputs -- by adding tiny perturbations to original inputs -- can lead a DNN to generate misclassification outputs. To improve the robustness of DNNs, some algor… ▽ More Recently, deep neural networks (DNNs) have been deployed in safety-critical systems such as autonomous vehicles and medical devices. Shortly after that, the vulnerability of DNNs were revealed by stealthy adversarial examples where crafted inputs -- by adding tiny perturbations to original inputs -- can lead a DNN to generate misclassification outputs. To improve the robustness of DNNs, some algorithmic-based countermeasures against adversarial examples have been introduced thereafter. In this paper, we propose a new type of stealthy attack on protected DNNs to circumvent the algorithmic defenses: via smart bit flipping in DNN weights, we can reserve the classification accuracy for clean inputs but misclassify crafted inputs even with algorithmic countermeasures. To fool protected DNNs in a stealthy way, we introduce a novel method to efficiently find their most vulnerable weights and flip those bits in hardware. Experimental results show that we can successfully apply our stealthy attack against state-of-the-art algorithmic-protected DNNs. △ Less

Submitted 24 December, 2021; originally announced December 2021.

Comments: Accepted for the 23rd International Symposium on Quality Electronic Design (ISQED'22)

arXiv:2112.04136 [pdf, other]

SeaPlace: Process Variation Aware Placement for Reliable Combinational Circuits against SETs and METs

Authors: Kiarash Saremi, Hossein Pedram, Behnam Ghavami, Mohsen Raji, Zhenman Fang, Lesley Shannon

Abstract: Nowadays nanoscale combinational circuits are facing significant reliability challenges including soft errors and process variations. This paper presents novel process variation-aware placement strategies that include two algorithms to increase the reliability of combinational circuits against both Single Event Transients (SETs) and Multiple Event Transients (METs). The first proposed algorithm is… ▽ More Nowadays nanoscale combinational circuits are facing significant reliability challenges including soft errors and process variations. This paper presents novel process variation-aware placement strategies that include two algorithms to increase the reliability of combinational circuits against both Single Event Transients (SETs) and Multiple Event Transients (METs). The first proposed algorithm is a global placement method (called SeaPlace-G) that places the cells for hardening the circuit against SETs by solving a quadratic formulation. Afterwards, a detailed placement algorithm (named SeaPlace-D) is proposed to increase the circuit reliability against METs by solving a linear programming optimization problem. Experimental results show that SeaPlace-G and SeaPlace-D averagely achieve 41.78% and 32.04% soft error reliability improvement against SET and MET, respectively. Moreover, when SeaPlace-D is followed by SeaPlace-G, MET reduction can be improved by up to 53.3%. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 14 pages

arXiv:2112.03477 [pdf, other]

BDFA: A Blind Data Adversarial Bit-flip Attack on Deep Neural Networks

Authors: Behnam Ghavami, Mani Sadati, Mohammad Shahidzadeh, Zhenman Fang, Lesley Shannon

Abstract: Adversarial bit-flip attack (BFA) on Neural Network weights can result in catastrophic accuracy degradation by flipping a very small number of bits. A major drawback of prior bit flip attack techniques is their reliance on test data. This is frequently not possible for applications that contain sensitive or proprietary data. In this paper, we propose Blind Data Adversarial Bit-flip Attack (BDFA),… ▽ More Adversarial bit-flip attack (BFA) on Neural Network weights can result in catastrophic accuracy degradation by flipping a very small number of bits. A major drawback of prior bit flip attack techniques is their reliance on test data. This is frequently not possible for applications that contain sensitive or proprietary data. In this paper, we propose Blind Data Adversarial Bit-flip Attack (BDFA), a novel technique to enable BFA without any access to the training or testing data. This is achieved by optimizing for a synthetic dataset, which is engineered to match the statistics of batch normalization across different layers of the network and the targeted label. Experimental results show that BDFA could decrease the accuracy of ResNet50 significantly from 75.96\% to 13.94\% with only 4 bits flips. △ Less

Submitted 6 January, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2101.08754 [pdf]

An Efficient Communication Protocol for FPGA IP Protection

Authors: Farzane Khajuyi, Behnam Ghavami, Human Nikmehr

Abstract: We introduce a protection-based IP security scheme to protect soft and firm IP cores which are used on FPGA devices. The scheme is based on Finite State Machin (FSM) obfuscation and exploits Physical Unclonable Function (PUF) for FPGA unique identification (ID) generation which help pay-per-device licensing. We introduce a communication protocol to protect the rights of parties in this market. On… ▽ More We introduce a protection-based IP security scheme to protect soft and firm IP cores which are used on FPGA devices. The scheme is based on Finite State Machin (FSM) obfuscation and exploits Physical Unclonable Function (PUF) for FPGA unique identification (ID) generation which help pay-per-device licensing. We introduce a communication protocol to protect the rights of parties in this market. On standard benchmark circuits, the experimental results show that our scheme is secure, attack-resilient and can be implemented with low area, power and delay overheads. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Showing 1–15 of 15 results for author: Ghavami, B