-
Quantum Graph Transformer for NLP Sentiment Classification
Authors:
Shamminuj Aktar,
Andreas Bärtschi,
Abdel-Hameed A. Badawy,
Stephan Eidenbenz
Abstract:
Quantum machine learning is a promising direction for building more efficient and expressive models, particularly in domains where understanding complex, structured data is critical. We present the Quantum Graph Transformer (QGT), a hybrid graph-based architecture that integrates a quantum self-attention mechanism into the message-passing framework for structured language modeling. The attention m…
▽ More
Quantum machine learning is a promising direction for building more efficient and expressive models, particularly in domains where understanding complex, structured data is critical. We present the Quantum Graph Transformer (QGT), a hybrid graph-based architecture that integrates a quantum self-attention mechanism into the message-passing framework for structured language modeling. The attention mechanism is implemented using parameterized quantum circuits (PQCs), which enable the model to capture rich contextual relationships while significantly reducing the number of trainable parameters compared to classical attention mechanisms. We evaluate QGT on five sentiment classification benchmarks. Experimental results show that QGT consistently achieves higher or comparable accuracy than existing quantum natural language processing (QNLP) models, including both attention-based and non-attention-based approaches. When compared with an equivalent classical graph transformer, QGT yields an average accuracy improvement of 5.42% on real-world datasets and 4.76% on synthetic datasets. Additionally, QGT demonstrates improved sample efficiency, requiring nearly 50% fewer labeled samples to reach comparable performance on the Yelp dataset. These results highlight the potential of graph-based QNLP techniques for advancing efficient and scalable language understanding.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs
Authors:
Mohamed R. Elshamy,
Mehdi Elahi,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Efficient thermal and power management in modern multiprocessor systems-on-chip (MPSoCs) demands accurate power consumption estimation. One of the state-of-the-art approaches, Alternative Blind Power Identification (ABPI), theoretically eliminates the dependence on steady-state temperatures, addressing a major shortcoming of previous approaches. However, ABPI performance has remained unverified in…
▽ More
Efficient thermal and power management in modern multiprocessor systems-on-chip (MPSoCs) demands accurate power consumption estimation. One of the state-of-the-art approaches, Alternative Blind Power Identification (ABPI), theoretically eliminates the dependence on steady-state temperatures, addressing a major shortcoming of previous approaches. However, ABPI performance has remained unverified in actual hardware implementations. In this study, we conduct the first empirical validation of ABPI on commercial hardware using the NVIDIA Jetson Xavier AGX platform. Our findings reveal that, while ABPI provides computational efficiency and independence from steady-state temperature, it exhibits considerable accuracy deficiencies in real-world scenarios. To overcome these limitations, we introduce a novel approach that integrates Custom Physics-Informed Neural Networks (CPINNs) with the underlying thermal model of ABPI. Our approach employs a specialized loss function that harmonizes physical principles with data-driven learning, complemented by multi-objective genetic algorithm optimization to balance estimation accuracy and computational cost. In experimental validation, CPINN-ABPI achieves a reduction of 84.7\% CPU and 73.9\% GPU in the mean absolute error (MAE) relative to ABPI, with the weighted mean absolute percentage error (WMAPE) improving from 47\%--81\% to $\sim$12\%. The method maintains real-time performance with 195.3~$μ$s of inference time, with similar 85\%--99\% accuracy gains across heterogeneous SoCs.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs
Authors:
Nazmus Sakib,
Tarun Prabhu,
Nandakishore Santhi,
John Shalf,
Abdel-Hameed A. Badawy
Abstract:
Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critical to effectively using these units. Understanding this capability is important for anyone writing compute-intensive, high-performance, and portable code. We tested the ability of several compilers to…
▽ More
Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critical to effectively using these units. Understanding this capability is important for anyone writing compute-intensive, high-performance, and portable code. We tested the ability of several compilers to vectorize code on x86 and ARM. We used the TSVC2 suite, with modifications that made it more representative of real-world code. On x86, GCC reported 54% of the loops in the suite as having been vectorized, ICX reported 50%, and Clang, 46%. On ARM, GCC reported 56% of the loops as having been vectorized, ACFL reported 54%, and Clang, 47%. We found that the vectorized code did not always outperform the unvectorized code. In some cases, given two very similar vectorizable loops, a compiler would vectorize one but not the other. We also report cases where a compiler vectorized a loop on only one of the two platforms. Based on our experiments, we cannot definitively say if any one compiler is significantly better than the others at vectorizing code on any given platform.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
HPC Application Parameter Autotuning on Edge Devices: A Bandit Learning Approach
Authors:
Abrar Hossain,
Abdel-Hameed A. Badawy,
Mohammad A. Islam,
Tapasya Patki,
Kishwar Ahmed
Abstract:
The growing necessity for enhanced processing capabilities in edge devices with limited resources has led us to develop effective methods for improving high-performance computing (HPC) applications. In this paper, we introduce LASP (Lightweight Autotuning of Scientific Application Parameters), a novel strategy designed to address the parameter search space challenge in edge devices. Our strategy e…
▽ More
The growing necessity for enhanced processing capabilities in edge devices with limited resources has led us to develop effective methods for improving high-performance computing (HPC) applications. In this paper, we introduce LASP (Lightweight Autotuning of Scientific Application Parameters), a novel strategy designed to address the parameter search space challenge in edge devices. Our strategy employs a multi-armed bandit (MAB) technique focused on online exploration and exploitation. Notably, LASP takes a dynamic approach, adapting seamlessly to changing environments. We tested LASP with four HPC applications: Lulesh, Kripke, Clomp, and Hypre. Its lightweight nature makes it particularly well-suited for resource-constrained edge devices. By employing the MAB framework to efficiently navigate the search space, we achieved significant performance improvements while adhering to the stringent computational limits of edge devices. Our experimental results demonstrate the effectiveness of LASP in optimizing parameter search on edge devices.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans
Authors:
Md Omar Faruque,
Peter Jamieson,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, and formal verification methods suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction for HT detection by leveraging their natur…
▽ More
Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, and formal verification methods suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction for HT detection by leveraging their natural language understanding and reasoning capabilities. For the first time, this paper explores the potential of general-purpose LLMs in detecting various HTs inserted in Register Transfer Level (RTL) designs, including SRAM, AES, and UART modules. We propose a novel tool for this goal that systematically assesses state-of-the-art LLMs (GPT-4o, Gemini 1.5 pro, and Llama 3.1) in detecting HTs without prior fine-tuning. To address potential training data bias, the tool implements perturbation techniques, i.e., variable name obfuscation, and design restructuring, that make the cases more sophisticated for the used LLMs. Our experimental evaluation demonstrates perfect detection rates by GPT-4o and Gemini 1.5 pro in baseline scenarios (100%/100% precision/recall), with both models achieving better trigger line coverage (TLC: 0.82-0.98) than payload line coverage (PLC: 0.32-0.46). Under code perturbation, while Gemini 1.5 pro maintains perfect detection performance (100%/100%), GPT-4o (100%/85.7%) and Llama 3.1 (66.7%/85.7%) show some degradation in detection rates, and all models experience decreased accuracy in localizing both triggers and payloads. This paper validates the potential of LLM approaches for hardware security applications, highlighting areas for future improvement.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design
Authors:
Md Omar Faruque,
Peter Jamieson,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Traditionally, inserting realistic Hardware Trojans (HTs) into complex hardware systems has been a time-consuming and manual process, requiring comprehensive knowledge of the design and navigating intricate Hardware Description Language (HDL) codebases. Machine Learning (ML)-based approaches have attempted to automate this process but often face challenges such as the need for extensive training d…
▽ More
Traditionally, inserting realistic Hardware Trojans (HTs) into complex hardware systems has been a time-consuming and manual process, requiring comprehensive knowledge of the design and navigating intricate Hardware Description Language (HDL) codebases. Machine Learning (ML)-based approaches have attempted to automate this process but often face challenges such as the need for extensive training data, long learning times, and limited generalizability across diverse hardware design landscapes. This paper addresses these challenges by proposing GHOST (Generator for Hardware-Oriented Stealthy Trojans), an automated attack framework that leverages Large Language Models (LLMs) for rapid HT generation and insertion. Our study evaluates three state-of-the-art LLMs - GPT-4, Gemini-1.5-pro, and Llama-3-70B - across three hardware designs: SRAM, AES, and UART. According to our evaluations, GPT-4 demonstrates superior performance, with 88.88% of HT insertion attempts successfully generating functional and synthesizable HTs. This study also highlights the security risks posed by LLM-generated HTs, showing that 100% of GHOST-generated synthesizable HTs evaded detection by an ML-based HT detection tool. These results underscore the urgent need for advanced detection and prevention mechanisms in hardware security to address the emerging threat of LLM-generated HTs. The GHOST HT benchmarks are available at: https://github.com/HSTRG1/GHOSTbenchmarks.git
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Static Reuse Profile Estimation for Array Applications
Authors:
Abdur Razzak,
Atanu Barai,
Nandakishore Santhi,
Abdel-Hameed A. Badawy
Abstract:
Reuse distance analysis is a widely recognized method for application characterization that illustrates cache locality. Although there are various techniques to calculate the reuse profile from dynamic memory traces, it is both time and space-consuming due to the requirement to collect dynamic memory traces at runtime. In contrast, static analysis reuse profile estimation is a promisingly faster a…
▽ More
Reuse distance analysis is a widely recognized method for application characterization that illustrates cache locality. Although there are various techniques to calculate the reuse profile from dynamic memory traces, it is both time and space-consuming due to the requirement to collect dynamic memory traces at runtime. In contrast, static analysis reuse profile estimation is a promisingly faster approach since it is calculated at compile time without running the program or collecting memory traces. This work presents a static analysis technique to estimate the reuse profile of loop-based programs. For an input program, we generate a basic block-level control flow graph and the execution count by analyzing the LLVM IR of the program. We present the memory accesses of the application kernel in a compact bracketed format and use a recursive algorithm to predict the reuse distance histogram. We deploy a separate predictor that unrolls the loop(s) for smaller bounds and generates a temporary reuse distance profile for those small cases. Using these smaller profiles, the reuse profile is extrapolated for the actual loop bound(s). We use this reuse profile to predict the cache hit rate. Results show that our model can predict cache hit rates with an average accuracy of 95% relative to the dynamic reuse profile methods.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Fine-Grained Clustering-Based Power Identification for Multicores
Authors:
Mohamed R. Elshamy,
Mehdi Elahi,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Fine-grained power estimation in multicore Systems on Chips (SoCs) is crucial for efficient thermal management. BPI (Blind Power Identification) is a recent approach that determines the power consumption of different cores and the thermal model of the chip using only thermal sensor measurements and total power consumption. BPI relies on steady-state thermal data along with a naive initialization i…
▽ More
Fine-grained power estimation in multicore Systems on Chips (SoCs) is crucial for efficient thermal management. BPI (Blind Power Identification) is a recent approach that determines the power consumption of different cores and the thermal model of the chip using only thermal sensor measurements and total power consumption. BPI relies on steady-state thermal data along with a naive initialization in its Non-negative Matrix Factorization (NMF) process, which negatively impacts the power estimation accuracy of BPI. This paper proposes a two-fold approach to reduce these impacts on BPI. First, this paper introduces an innovative approach for NMF initializing, i.e., density-oriented spatial clustering to identify centroid data points of active cores as initial values. This enhances BPI accuracy by focusing on dense regions in the dataset and excluding outlier data points. Second, it proposes the utilization of steady-state temperature data points to enhance the power estimation accuracy by leveraging the physical relationship between temperature and power consumption. Our extensive simulations of real-world cases demonstrate that our approach enhances BPI accuracy in estimating the power per core with no performance cost. For instance, in a four-core processor, the proposed approach reduces the error rate by 76% compared to BPI and by 24% compared to the state of the art in the literature, namely, Blind Power Identification Steady State (BPISS). The results underline the potential of integrating advanced clustering techniques in thermal model identification, paving the way for more accurate and reliable thermal management in multicores and SoCs.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
P-YOLOv8: Efficient and Accurate Real-Time Detection of Distracted Driving
Authors:
Mohamed R. Elshamy,
Heba M. Emara,
Mohamed R. Shoaib,
Abdel-Hameed A. Badawy
Abstract:
Distracted driving is a critical safety issue that leads to numerous fatalities and injuries worldwide. This study addresses the urgent need for efficient and real-time machine learning models to detect distracted driving behaviors. Leveraging the Pretrained YOLOv8 (P-YOLOv8) model, a real-time object detection system is introduced, optimized for both speed and accuracy. This approach addresses th…
▽ More
Distracted driving is a critical safety issue that leads to numerous fatalities and injuries worldwide. This study addresses the urgent need for efficient and real-time machine learning models to detect distracted driving behaviors. Leveraging the Pretrained YOLOv8 (P-YOLOv8) model, a real-time object detection system is introduced, optimized for both speed and accuracy. This approach addresses the computational constraints and latency limitations commonly associated with conventional detection models. The study demonstrates P-YOLOv8 versatility in both object detection and image classification tasks using the Distracted Driver Detection dataset from State Farm, which includes 22,424 images across ten behavior categories. Our research explores the application of P-YOLOv8 for image classification, evaluating its performance compared to deep learning models such as VGG16, VGG19, and ResNet. Some traditional models often struggle with low accuracy, while others achieve high accuracy but come with high computational costs and slow detection speeds, making them unsuitable for real-time applications. P-YOLOv8 addresses these issues by achieving competitive accuracy with significant computational cost and efficiency advantages. In particular, P-YOLOv8 generates a lightweight model with a size of only 2.84 MB and a lower number of parameters, totaling 1,451,098, due to its innovative architecture. It achieves a high accuracy of 99.46 percent with this small model size, opening new directions for deployment on inexpensive and small embedded devices using Tiny Machine Learning (TinyML). The experimental results show robust performance, making P-YOLOv8 a cost-effective solution for real-time deployment. This study provides a detailed analysis of P-YOLOv8's architecture, training, and performance benchmarks, highlighting its potential for real-time use in detecting distracted driving.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Hiding in Plain Sight: Reframing Hardware Trojan Benchmarking as a Hide&Seek Modification
Authors:
Amin Sarihi,
Ahmad Patooghy,
Peter Jamieson,
Abdel-Hameed A. Badawy
Abstract:
This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as The Seeker's Dilemma where a detecting agent is unaware of whether circuits are infected by HTs or not. Using this theoretical problem formulation…
▽ More
This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as The Seeker's Dilemma where a detecting agent is unaware of whether circuits are infected by HTs or not. Using this theoretical problem formulation, we create a benchmark that consists of a mixture of HT-free and HT-infected restructured circuits while preserving their original functionalities. The restructured circuits are randomly infected by HTs, causing a situation where the defender is uncertain if a circuit is infected or not. We believe that our innovative benchmark and methodology of creating benchmarks will help the community judge the detection quality of different methods by comparing their success rates in circuit classification. We use our developed benchmark to evaluate three state-of-the-art HT detection tools to show baseline results for this approach. We use Principal Component Analysis to assess the strength of our benchmark, where we observe that some restructured HT-infected circuits are mapped closely to HT-free circuits, leading to significant label misclassification by detectors.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Cluster-BPI: Efficient Fine-Grain Blind Power Identification for Defending against Hardware Thermal Trojans in Multicore SoCs
Authors:
Mohamed R. Elshamy,
Mehdi Elahi,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Modern multicore System-on-Chips (SoCs) feature hardware monitoring mechanisms that measure total power consumption. However, these aggregate measurements are often insufficient for fine-grained thermal and power management. This paper presents an enhanced Clustering Blind Power Identification (ICBPI) approach, designed to improve the sensitivity and robustness of the traditional Blind Power Ident…
▽ More
Modern multicore System-on-Chips (SoCs) feature hardware monitoring mechanisms that measure total power consumption. However, these aggregate measurements are often insufficient for fine-grained thermal and power management. This paper presents an enhanced Clustering Blind Power Identification (ICBPI) approach, designed to improve the sensitivity and robustness of the traditional Blind Power Identification (BPI) method. BPI estimates the power consumption of individual cores and models the thermal behavior of an SoC using only thermal sensor data and total power measurements. The proposed ICBPI approach refines BPI's initialization process, particularly improving the non-negative matrix factorization (NNMF) step, which is critical to the accuracy of BPI. ICBPI introduces density-based spatial clustering of applications with noise (DBSCAN) to better align temperature and power consumption data, thereby providing more accurate power consumption estimates. We validate the ICBPI method through two key tasks. The first task evaluates power estimation accuracy across four different multicore architectures, including a heterogeneous processor. Results show that ICBPI significantly enhances accuracy, reducing error rates by 77.56% compared to the original BPI and by 68.44% compared to the state-of-the-art BPISS method. The second task focuses on improving the detection and localization of malicious thermal sensor attacks in heterogeneous processors. The results demonstrate that ICBPI enhances the security and robustness of multicore SoCs against such attacks.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
TrojanForge: Generating Adversarial Hardware Trojan Examples Using Reinforcement Learning
Authors:
Amin Sarihi,
Peter Jamieson,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently played a key role in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and dete…
▽ More
The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently played a key role in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and detection capabilities. HT insertion with ML techniques, specifically, has seen a spike in research activity due to the shortcomings of conventional HT benchmarks and the inherent human design bias that occurs when we create them. This work continues this innovation by presenting a tool called TrojanForge, capable of generating HT adversarial examples that defeat HT detectors; demonstrating the capabilities of GAN-like adversarial tools for automatic HT insertion. We introduce an RL environment where the RL insertion agent interacts with HT detectors in an insertion-detection loop where the agent collects rewards based on its success in bypassing HT detectors. Our results show that this process helps inserted HTs evade various HT detectors, achieving high attack success percentages. This tool provides insight into why HT insertion fails in some instances and how we can leverage this knowledge in defense.
△ Less
Submitted 8 December, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Graph Neural Networks for Parameterized Quantum Circuits Expressibility Estimation
Authors:
Shamminuj Aktar,
Andreas Bärtschi,
Diane Oyen,
Stephan Eidenbenz,
Abdel-Hameed A. Badawy
Abstract:
Parameterized quantum circuits (PQCs) are fundamental to quantum machine learning (QML), quantum optimization, and variational quantum algorithms (VQAs). The expressibility of PQCs is a measure that determines their capability to harness the full potential of the quantum state space. It is thus a crucial guidepost to know when selecting a particular PQC ansatz. However, the existing technique for…
▽ More
Parameterized quantum circuits (PQCs) are fundamental to quantum machine learning (QML), quantum optimization, and variational quantum algorithms (VQAs). The expressibility of PQCs is a measure that determines their capability to harness the full potential of the quantum state space. It is thus a crucial guidepost to know when selecting a particular PQC ansatz. However, the existing technique for expressibility computation through statistical estimation requires a large number of samples, which poses significant challenges due to time and computational resource constraints. This paper introduces a novel approach for expressibility estimation of PQCs using Graph Neural Networks (GNNs). We demonstrate the predictive power of our GNN model with a dataset consisting of 25,000 samples from the noiseless IBM QASM Simulator and 12,000 samples from three distinct noisy quantum backends. The model accurately estimates expressibility, with root mean square errors (RMSE) of 0.05 and 0.06 for the noiseless and noisy backends, respectively. We compare our model's predictions with reference circuits [Sim and others, QuTe'2019] and IBM Qiskit's hardware-efficient ansatz sets to further evaluate our model's performance. Our experimental evaluation in noiseless and noisy scenarios reveals a close alignment with ground truth expressibility values, highlighting the model's efficacy. Moreover, our model exhibits promising extrapolation capabilities, predicting expressibility values with low RMSE for out-of-range qubit circuits trained solely on only up to 5-qubit circuit sets. This work thus provides a reliable means of efficiently evaluating the expressibility of diverse PQCs on noiseless simulators and hardware.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection
Authors:
Amin Sarihi,
Ahmad Patooghy,
Abdel-Hameed A. Badawy,
Peter Jamieson
Abstract:
This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as "The Seeker's Dilemma" (an extension of Hide&Seek on a graph), where a detecting agent is unaware of whether circuits are infected by HTs or not.…
▽ More
This work focuses on advancing security research in the hardware design space by formally defining the realistic problem of Hardware Trojan (HT) detection. The goal is to model HT detection more closely to the real world, i.e., describing the problem as "The Seeker's Dilemma" (an extension of Hide&Seek on a graph), where a detecting agent is unaware of whether circuits are infected by HTs or not. Using this theoretical problem formulation, we create a benchmark that consists of a mixture of HT-free and HT-infected restructured circuits while preserving their original functionalities. The restructured circuits are randomly infected by HTs, causing a situation where the defender is uncertain if a circuit is infected or not. We believe that our innovative dataset will help the community better judge the detection quality of different methods by comparing their success rates in circuit classification. We use our developed benchmark to evaluate three state-of-the-art HT detection tools to show baseline results for this approach. We use Principal Component Analysis to assess the strength of our benchmark, where we observe that some restructured HT-infected circuits are mapped closely to HT-free circuits, leading to significant label misclassification by detectors.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
LLVM Static Analysis for Program Characterization and Memory Reuse Profile Estimation
Authors:
Atanu Barai,
Nandakishore Santhi,
Abdur Razzak,
Stephan Eidenbenz,
Abdel-Hameed A. Badawy
Abstract:
Profiling various application characteristics, including the number of different arithmetic operations performed, memory footprint, etc., dynamically is time- and space-consuming. On the other hand, static analysis methods, although fast, can be less accurate. This paper presents an LLVM-based probabilistic static analysis method that accurately predicts different program characteristics and estim…
▽ More
Profiling various application characteristics, including the number of different arithmetic operations performed, memory footprint, etc., dynamically is time- and space-consuming. On the other hand, static analysis methods, although fast, can be less accurate. This paper presents an LLVM-based probabilistic static analysis method that accurately predicts different program characteristics and estimates the reuse distance profile of a program by analyzing the LLVM IR file in constant time, regardless of program input size. We generate the basic-block-level control flow graph of the target application kernel and determine basic-block execution counts by solving the linear balance equation involving the adjacent basic blocks' transition probabilities. Finally, we represent the kernel memory accesses in a bracketed format and employ a recursive algorithm to calculate the reuse distance profile. The results show that our approach can predict application characteristics accurately compared to another LLVM-based dynamic code analysis tool, Byfl.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Trojan Playground: A Reinforcement Learning Framework for Hardware Trojan Insertion and Detection
Authors:
Amin Sarihi,
Ahmad Patooghy,
Peter Jamieson,
Abdel-Hameed A. Badawy
Abstract:
Current Hardware Trojan (HT) detection techniques are mostly developed based on a limited set of HT benchmarks. Existing HT benchmark circuits are generated with multiple shortcomings, i.e., i) they are heavily biased by the designers' mindset when created, and ii) they are created through a one-dimensional lens, mainly the signal activity of nets. We introduce the first automated Reinforcement Le…
▽ More
Current Hardware Trojan (HT) detection techniques are mostly developed based on a limited set of HT benchmarks. Existing HT benchmark circuits are generated with multiple shortcomings, i.e., i) they are heavily biased by the designers' mindset when created, and ii) they are created through a one-dimensional lens, mainly the signal activity of nets. We introduce the first automated Reinforcement Learning (RL) HT insertion and detection framework to address these shortcomings. In the HT insertion phase, an RL agent explores the circuits and finds locations best for keeping inserted HTs hidden. On the defense side, we introduce a multi-criteria RL-based HT detector that generates test vectors to discover the existence of HTs. Using the proposed framework, one can explore the HT insertion and detection design spaces to break the limitations of human mindset and benchmark issues, ultimately leading toward the next generation of innovative detectors. We demonstrate the efficacy of our framework on ISCAS-85 benchmarks, provide the attack and detection success rates, and define a methodology for comparing our techniques.
△ Less
Submitted 20 March, 2024; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach
Authors:
Amin Sarihi,
Peter Jamieson,
Ahmad Patooghy,
Abdel-Hameed A. Badawy
Abstract:
Hardware Trojans (HTs) are undesired design or manufacturing modifications that can severely alter the security and functionality of digital integrated circuits. HTs can be inserted according to various design criteria, e.g., nets switching activity, observability, controllability, etc. However, to our knowledge, most HT detection methods are only based on a single criterion, i.e., nets switching…
▽ More
Hardware Trojans (HTs) are undesired design or manufacturing modifications that can severely alter the security and functionality of digital integrated circuits. HTs can be inserted according to various design criteria, e.g., nets switching activity, observability, controllability, etc. However, to our knowledge, most HT detection methods are only based on a single criterion, i.e., nets switching activity. This paper proposes a multi-criteria reinforcement learning (RL) HT detection tool that features a tunable reward function for different HT detection scenarios. The tool allows for exploring existing detection strategies and can adapt new detection scenarios with minimal effort. We also propose a generic methodology for comparing HT detection methods fairly. Our preliminary results show an average of 84.2% successful HT detection in ISCAS-85 benchmark
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Scalable Experimental Bounds for Entangled Quantum State Fidelities
Authors:
Shamminuj Aktar,
Andreas Bärtschi,
Abdel-Hameed A. Badawy,
Stephan Eidenbenz
Abstract:
Estimating the state preparation fidelity of highly entangled states on noisy intermediate-scale quantum (NISQ) devices is important for benchmarking and application considerations. Unfortunately, exact fidelity measurements quickly become prohibitively expensive, as they scale exponentially as $O(3^N)$ for $N$-qubit states, using full state tomography with measurements in all Pauli bases combinat…
▽ More
Estimating the state preparation fidelity of highly entangled states on noisy intermediate-scale quantum (NISQ) devices is important for benchmarking and application considerations. Unfortunately, exact fidelity measurements quickly become prohibitively expensive, as they scale exponentially as $O(3^N)$ for $N$-qubit states, using full state tomography with measurements in all Pauli bases combinations. However, Somma and others [PhysRevA.74.052302] established that the complexity could be drastically reduced when looking at fidelity lower bounds for states that exhibit symmetries, such as Dicke States and GHZ States. These bounds must still be tight enough for larger states to provide reasonable estimations on NISQ devices.
For the first time and more than 15 years after the theoretical introduction, we report meaningful lower bounds for the state preparation fidelity of all Dicke States up to $N=10$ and all GHZ states up to $N=20$ on Quantinuum H1 ion-trap systems using efficient implementations of recently proposed scalable circuits for these states. Our achieved lower bounds match or exceed previously reported exact fidelities on superconducting systems for much smaller states. Furthermore, we provide evidence that for large Dicke States $D^N_{N/2}$, we may resort to a GHZ-based approximate state preparation to achieve better fidelity. This work provides a path forward to benchmarking entanglement as NISQ devices improve in size and quality.
△ Less
Submitted 27 January, 2025; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Hardware Trojan Insertion Using Reinforcement Learning
Authors:
Amin Sarihi,
Ahmad Patooghy,
Peter Jamieson,
Abdel-Hameed A. Badawy
Abstract:
This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process to eliminate the inherent human biases that limit the development of robust HT detection methods. An RL agent explores the design space and finds circuit locations that are best for keeping inserted HTs hidden. To achieve this, a digital circuit is converted to an environment in which…
▽ More
This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process to eliminate the inherent human biases that limit the development of robust HT detection methods. An RL agent explores the design space and finds circuit locations that are best for keeping inserted HTs hidden. To achieve this, a digital circuit is converted to an environment in which an RL agent inserts HTs such that the cumulative reward is maximized. Our toolset can insert combinational HTs into the ISCAS-85 benchmark suite with variations in HT size and triggering conditions. Experimental results show that the toolset achieves high input coverage rates (100\% in two benchmark circuits) that confirms its effectiveness. Also, the inserted HTs have shown a minimal footprint and rare activation probability.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
A Divide-and-Conquer Approach to Dicke State Preparation
Authors:
Shamminuj Aktar,
Andreas Bärtschi,
Abdel-Hameed A. Badawy,
Stephan Eidenbenz
Abstract:
We present a divide-and-conquer approach to deterministically prepare Dicke states $\lvert D_k^n\rangle$ (i.e., equal-weight superpositions of all $n$-qubit states with Hamming Weight $k$) on quantum computers. In an experimental evaluation for up to $n=6$ qubits on IBM Quantum Sydney and Montreal devices, we achieve significantly higher state fidelity compared to previous results [Mukherjee and o…
▽ More
We present a divide-and-conquer approach to deterministically prepare Dicke states $\lvert D_k^n\rangle$ (i.e., equal-weight superpositions of all $n$-qubit states with Hamming Weight $k$) on quantum computers. In an experimental evaluation for up to $n=6$ qubits on IBM Quantum Sydney and Montreal devices, we achieve significantly higher state fidelity compared to previous results [Mukherjee and others, TQE'2020], [Cruz and others, QuTe'2019]. The fidelity gains are achieved through several techniques: Our circuits first "divide" the Hamming weight between blocks of $n/2$ qubits, and then "conquer" those blocks with improved versions of Dicke state unitaries [Bärtschi and others, FCT'2019]. Due to the sparse connectivity on IBM's heavy-hex-architectures, these circuits are implemented for linear nearest neighbor topologies. Further gains in (estimating) the state fidelity are due to our use of measurement error mitigation and hardware progress.
△ Less
Submitted 9 June, 2022; v1 submitted 23 December, 2021;
originally announced December 2021.
-
Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance
Authors:
Atanu Barai,
Gopinath Chennupati,
Nandakishore Santhi,
Abdel-Hameed A. Badawy,
Stephan Eidenbenz
Abstract:
Performance modeling of parallel applications on multicore computers remains a challenge in computational co-design due to the complex design of multicore processors including private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model to predict the performance of parallel applications that runs on a multicore computer and shares the same level of cache in the hier…
▽ More
Performance modeling of parallel applications on multicore computers remains a challenge in computational co-design due to the complex design of multicore processors including private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model to predict the performance of parallel applications that runs on a multicore computer and shares the same level of cache in the hierarchy. This model uses a computationally efficient, probabilistic method to predict the reuse distance profiles, where reuse distance is a hardware architecture-independent measure of the patterns of virtual memory accesses. It relies on a stochastic, static basic block-level analysis of reuse profiles measured from the memory traces of applications ran sequentially on small instances rather than using a multi-threaded trace. The results indicate that the hit-rate predictions on the shared cache are accurate.
△ Less
Submitted 29 July, 2019;
originally announced July 2019.
-
Energy Efficient Tri-State CNFET Ternary Logic Gates
Authors:
Sepher Tabrizchi,
Fazel Sharifi,
Abdel-Hameed A. Badawy
Abstract:
Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use…
▽ More
Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties of CNFETs, for example, adjusting the Carbon Nanontube (CNT) diameters to have the desired threshold voltage and have the same mobility of P-FET and N-FET transistors. Each of our designed logic circuits implements a logic function and its complementary via a control signal. Also, these circuits have a high impedance state which saves power while the circuits are not in use. In an effort to show a more detailed application of our approach, we design a 2-digit adder-subtractor circuit. We simulate the proposed ternary circuits using HSPICE via standard 32nm CNFET technology. The simulation results indicate the correct operation of the designs under different process, voltage and temperature (PVT) variations. Moreover, a power efficient ternary logic ALU has been design based on the proposed gates.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
MorphoNoC: Exploring the Design Space of a Configurable Hybrid NoC using Nanophotonics
Authors:
Vikram K. Narayana,
Shuai Sun,
Abdel-Hameed A. Badawy,
Volker J. Sorger,
Tarek El-Ghazawi
Abstract:
As diminishing feature sizes drive down the energy for computations, the power budget for on-chip communication is steadily rising. Furthermore, the increasing number of cores is placing a huge performance burden on the network-on-chip (NoC) infrastructure. While NoCs are designed as regular architectures that allow scaling to hundreds of cores, the lack of a flexible topology gives rise to higher…
▽ More
As diminishing feature sizes drive down the energy for computations, the power budget for on-chip communication is steadily rising. Furthermore, the increasing number of cores is placing a huge performance burden on the network-on-chip (NoC) infrastructure. While NoCs are designed as regular architectures that allow scaling to hundreds of cores, the lack of a flexible topology gives rise to higher latencies, lower throughput, and increased energy costs. In this paper, we explore MorphoNoCs - scalable, configurable, hybrid NoCs obtained by extending regular electrical networks with configurable nanophotonic links. In order to design MorphoNoCs, we first carry out a detailed study of the design space for Multi-Write Multi-Read (MWMR) nanophotonics links. After identifying optimum design points, we then discuss the router architecture for deploying them in hybrid electronic-photonic NoCs. We then study explore the design space at the network level, by varying the waveguide lengths and the number of hybrid routers. This affords us to carry out energy-latency trade-offs. For our evaluations, we adopt traces from synthetic benchmarks as well as the NAS Parallel Benchmark suite. Our results indicate that MorphoNoCs can achieve latency improvements of up to 3.0x or energy improvements of up to 1.37x over the base electronic network.
△ Less
Submitted 14 March, 2017; v1 submitted 12 December, 2016;
originally announced January 2017.
-
Evaluating Discussion Boards on BlackBoard as a Collaborative Learning Tool A Students Survey and Reflections
Authors:
AbdelHameed A. Badawy,
Michelle M. Hugue
Abstract:
In this paper, we investigate how the students think of their experience in a junior level course that has a blackboard course presence where the students use the discussion boards extensively. A survey is set up through blackboard as a voluntary quiz and the student who participated were given a freebie point. The results and the participation were very interesting in terms of the feedback we got…
▽ More
In this paper, we investigate how the students think of their experience in a junior level course that has a blackboard course presence where the students use the discussion boards extensively. A survey is set up through blackboard as a voluntary quiz and the student who participated were given a freebie point. The results and the participation were very interesting in terms of the feedback we got via open comments from the students as well as the statistics we gathered from the answers to the questions. The students have shown understanding and willingness to participate in pedagogy-enhancing endeavors.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.
-
Students Perceptions of the Effectiveness of Discussion Boards What can we get from our students for a freebie point
Authors:
Abdel-Hameed A. Badawy
Abstract:
We investigate how the students think of their experience in a junior 300 level computer science course that uses blackboard as the underlying course management system. The discussion boards in Blackboard are heavily used for programming project support and to foster cooperation among students to answer their questions and concerns. A survey is conducted through blackboard as a voluntary quiz and…
▽ More
We investigate how the students think of their experience in a junior 300 level computer science course that uses blackboard as the underlying course management system. The discussion boards in Blackboard are heavily used for programming project support and to foster cooperation among students to answer their questions and concerns. A survey is conducted through blackboard as a voluntary quiz and the student who participated were given a participation point for their effort. The results and the participation were very interesting. We obtained statistics from the answers to the questions. The students also have given us feedback in the form of comments to all questions except for two only. The students have shown understanding, maturity and willingness to participate in pedagogy-enhancing endeavors with the premise that it might help their education and other people education as well.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.