-
ChipletQuake: On-die Digital Impedance Sensing for Chiplet and Interposer Verification
Authors:
Saleh Khalaj Monfared,
Maryam Saadat Safa,
Shahin Tajik
Abstract:
The increasing complexity and cost of manufacturing monolithic chips have driven the semiconductor industry toward chiplet-based designs, where smaller and modular chiplets are integrated onto a single interposer. While chiplet architectures offer significant advantages, such as improved yields, design flexibility, and cost efficiency, they introduce new security challenges in the horizontal hardw…
▽ More
The increasing complexity and cost of manufacturing monolithic chips have driven the semiconductor industry toward chiplet-based designs, where smaller and modular chiplets are integrated onto a single interposer. While chiplet architectures offer significant advantages, such as improved yields, design flexibility, and cost efficiency, they introduce new security challenges in the horizontal hardware manufacturing supply chain. These challenges include risks of hardware Trojans, cross-die side-channel and fault injection attacks, probing of chiplet interfaces, and intellectual property theft. To address these concerns, this paper presents \textit{ChipletQuake}, a novel on-chiplet framework for verifying the physical security and integrity of adjacent chiplets during the post-silicon stage. By sensing the impedance of the power delivery network (PDN) of the system, \textit{ChipletQuake} detects tamper events in the interposer and neighboring chiplets without requiring any direct signal interface or additional hardware components. Fully compatible with the digital resources of FPGA-based chiplets, this framework demonstrates the ability to identify the insertion of passive and subtle malicious circuits, providing an effective solution to enhance the security of chiplet-based systems. To validate our claims, we showcase how our framework detects Hardware Trojan and interposer tampering.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Chypnosis: Undervolting-based Static Side-channel Attacks
Authors:
Kyle Mitard,
Saleh Khalaj Monfared,
Fatemeh Khojasteh Dana,
Robert Dumitru,
Yuval Yarom,
Shahin Tajik
Abstract:
Static side-channel analysis attacks, which rely on a stopped clock to extract sensitive information, pose a growing threat to embedded systems' security. To protect against such attacks, several proposed defenses aim to detect unexpected variations in the clock signal and clear sensitive states. In this work, we present \emph{Chypnosis}, an undervolting attack technique that indirectly stops the…
▽ More
Static side-channel analysis attacks, which rely on a stopped clock to extract sensitive information, pose a growing threat to embedded systems' security. To protect against such attacks, several proposed defenses aim to detect unexpected variations in the clock signal and clear sensitive states. In this work, we present \emph{Chypnosis}, an undervolting attack technique that indirectly stops the target circuit clock, while retaining stored data. Crucially, Chypnosis also blocks the state clearing stage of prior defenses, allowing recovery of secret information even in their presence. However, basic undervolting is not sufficient in the presence of voltage sensors designed to handle fault injection via voltage tampering. To overcome such defenses, we observe that rapidly dropping the supply voltage can disable the response mechanism of voltage sensor systems. We implement Chypnosis on various FPGAs, demonstrating the successful bypass of their sensors, both in the form of soft and hard IPs. To highlight the real-world applicability of Chypnosis, we show that the alert handler of the OpenTitan root-of-trust, responsible for providing hardware responses to threats, can be bypassed. Furthermore, we demonstrate that by combining Chypnosis with static side-channel analysis techniques, namely laser logic state imaging (LLSI) and impedance analysis (IA), we can extract sensitive information from a side-channel protected cryptographic module used in OpenTitan, even in the presence of established clock and voltage sensors. Finally, we propose and implement an improvement to an established FPGA-compatible clock detection countermeasure, and we validate its resilience against Chypnosis.
△ Less
Submitted 29 September, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Logical Maneuvers: Detecting and Mitigating Adversarial Hardware Faults in Space
Authors:
Fatemeh Khojasteh Dana,
Saleh Khalaj Monfared,
Shahin Tajik
Abstract:
Satellites are highly vulnerable to adversarial glitches or high-energy radiation in space, which could cause faults on the onboard computer. Various radiation- and fault-tolerant methods, such as error correction codes (ECC) and redundancy-based approaches, have been explored over the last decades to mitigate temporary soft errors on software and hardware. However, conventional ECC methods fail t…
▽ More
Satellites are highly vulnerable to adversarial glitches or high-energy radiation in space, which could cause faults on the onboard computer. Various radiation- and fault-tolerant methods, such as error correction codes (ECC) and redundancy-based approaches, have been explored over the last decades to mitigate temporary soft errors on software and hardware. However, conventional ECC methods fail to deal with hard errors or permanent faults in the hardware components. This work introduces a detection- and response-based countermeasure to deal with partially damaged processor chips. It recovers the processor chip from permanent faults and enables continuous operation with available undamaged resources on the chip. We incorporate digitally-compatible delay-based sensors on the target processor's chip to reliably detect the incoming radiation or glitching attempts on the physical fabric of the chip, even before a fault occurs. Upon detecting a fault in one or more components of the processor's arithmetic logic unit (ALU), our countermeasure employs adaptive software recompilations to resynthesize and substitute the affected instructions with instructions of still functioning components to accomplish the task. Furthermore, if the fault is more widespread and prevents the correct operation of the entire processor, our approach deploys adaptive hardware partial reconfigurations to replace and reroute the failed components to undamaged locations of the chip. To validate our claims, we deploy a high-energy near-infrared (NIR) laser beam on a RISC-V processor implemented on a 28~nm FPGA to emulate radiation and even hard errors by partially damaging the FPGA fabric. We demonstrate that our sensor can confidently detect the radiation and trigger the processor testing and fault recovery mechanisms. Finally, we discuss the overhead imposed by our countermeasure.
△ Less
Submitted 10 February, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
LaserEscape: Detecting and Mitigating Optical Probing Attacks
Authors:
Saleh Khalaj Monfared,
Kyle Mitard,
Andrew Cannon,
Domenic Forte,
Shahin Tajik
Abstract:
The security of integrated circuits (ICs) can be broken by sophisticated physical attacks relying on failure analysis methods. Optical probing is one of the most prominent examples of such attacks, which can be accomplished in a matter of days, even with limited knowledge of the IC under attack. Unfortunately, few countermeasures are proposed in the literature, and none has been fabricated and tes…
▽ More
The security of integrated circuits (ICs) can be broken by sophisticated physical attacks relying on failure analysis methods. Optical probing is one of the most prominent examples of such attacks, which can be accomplished in a matter of days, even with limited knowledge of the IC under attack. Unfortunately, few countermeasures are proposed in the literature, and none has been fabricated and tested in practice. These countermeasures usually require changing the standard cell libraries and, thus, are incompatible with digital and programmable platforms, such as field programmable gate arrays (FPGAs). In this work, we shift our attention from preventing the attack to detecting and responding to it. We introduce LaserEscape, the first fully digital and FPGA-compatible countermeasure to detect and mitigate optical probing attacks. LaserEscape incorporates digital delay-based sensors to reliably detect the physical alteration on the fabric caused by laser beam irradiations in real time. Furthermore, as a response to the attack, LaserEscape deploys real-time hiding approaches using randomized hardware reconfigurability. It realizes 1) moving target defense (MTD) to physically move the sensitive circuity under attack out of the probing field of focus to protect secret keys and 2) polymorphism to logically obfuscate the functionality of the targeted circuit to counter function extraction and reverse engineering attempts. We demonstrate the effectiveness and resiliency of our approach by performing optical probing attacks on protected and unprotected designs on a 28-nm FPGA. Our results show that optical probing attacks can be reliably detected and mitigated without interrupting the chip's operation.
△ Less
Submitted 30 August, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
The Reversing Machine: Reconstructing Memory Assumptions
Authors:
Mohammad Sina Karvandi,
Soroush Meghdadizanjani,
Sima Arasteh,
Saleh Khalaj Monfared,
Mohammad K. Fallah,
Saeid Gorgin,
Jeong-A Lee,
Erik van der Kouwe
Abstract:
Existing anti-malware software and reverse engineering toolkits struggle with stealthy sub-OS rootkits due to limitations of run-time kernel-level monitoring. A malicious kernel-level driver can bypass OS-level anti-virus mechanisms easily. Although static analysis of such malware is possible, obfuscation and packing techniques complicate offline analysis. Moreover, current dynamic analyzers suffe…
▽ More
Existing anti-malware software and reverse engineering toolkits struggle with stealthy sub-OS rootkits due to limitations of run-time kernel-level monitoring. A malicious kernel-level driver can bypass OS-level anti-virus mechanisms easily. Although static analysis of such malware is possible, obfuscation and packing techniques complicate offline analysis. Moreover, current dynamic analyzers suffer from virtualization performance overhead and create detectable traces that allow modern malware to evade them.
To address these issues, we present \textit{The Reversing Machine} (TRM), a new hypervisor-based memory introspection design for reverse engineering, reconstructing memory offsets, and fingerprinting evasive and obfuscated user-level and kernel-level malware. TRM proposes two novel techniques that enable efficient and transparent analysis of evasive malware: hooking a binary using suspended process creation for hypervisor-based memory introspection, and leveraging Mode-Based Execution Control (MBEC) to detect user/kernel mode transitions and memory access patterns. Unlike existing malware detection environments, TRM can extract full memory traces in user and kernel spaces and hook the entire target memory map to reconstruct arrays, structures within the operating system, and possible rootkits.
We perform TRM-assisted reverse engineering of kernel-level structures and show that it can speed up manual reverse engineering by 75\% on average. We obfuscate known malware with the latest packing tools and successfully perform similarity detection. Furthermore, we demonstrate a real-world attack by deploying a modified rootkit onto a driver that bypasses state-of-the-art security auditing tools. We show that TRM can detect each threat and that, out of 24 state-of-the-art AV solutions, only TRM can detect the most advanced threats.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
RandOhm: Mitigating Impedance Side-channel Attacks using Randomized Circuit Configurations
Authors:
Saleh Khalaj Monfared,
Domenic Forte,
Shahin Tajik
Abstract:
Physical side-channel attacks can compromise the security of integrated circuits. Most physical side-channel attacks (e.g., power or electromagnetic) exploit the dynamic behavior of a chip, typically manifesting as changes in current consumption or voltage fluctuations where algorithmic countermeasures, such as masking, can effectively mitigate them. However, as demonstrated recently, these mitiga…
▽ More
Physical side-channel attacks can compromise the security of integrated circuits. Most physical side-channel attacks (e.g., power or electromagnetic) exploit the dynamic behavior of a chip, typically manifesting as changes in current consumption or voltage fluctuations where algorithmic countermeasures, such as masking, can effectively mitigate them. However, as demonstrated recently, these mitigation techniques are not entirely effective against backscattered side-channel attacks such as impedance analysis. In the case of an impedance attack, an adversary exploits the data-dependent impedance variations of the chip power delivery network (PDN) to extract secret information. In this work, we introduce RandOhm, which exploits a moving target defense (MTD) strategy based on the partial reconfiguration (PR) feature of mainstream FPGAs and programmable SoCs to defend against impedance side-channel attacks. We demonstrate that the information leakage through the PDN impedance could be significantly reduced via runtime reconfiguration of the secret-sensitive parts of the circuitry. Hence, by constantly randomizing the placement and routing of the circuit, one can decorrelate the data-dependent computation from the impedance value. Moreover, in contrast to existing PR-based countermeasures, RandOhm deploys open-source bitstream manipulation tools on programmable SoCs to speed up the randomization and provide real-time protection. To validate our claims, we apply RandOhm to AES ciphers realized on 28-nm FPGAs. We analyze the resiliency of our approach by performing non-profiled and profiled impedance analysis attacks and investigate the overhead of our mitigation in terms of delay and performance.
△ Less
Submitted 30 August, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
LeakyOhm: Secret Bits Extraction using Impedance Analysis
Authors:
Saleh Khalaj Monfared,
Tahoura Mosavirik,
Shahin Tajik
Abstract:
The threats of physical side-channel attacks and their countermeasures have been widely researched. Most physical side-channel attacks rely on the unavoidable influence of computation or storage on current consumption or voltage drop on a chip. Such data-dependent influence can be exploited by, for instance, power or electromagnetic analysis. In this work, we introduce a novel non-invasive physica…
▽ More
The threats of physical side-channel attacks and their countermeasures have been widely researched. Most physical side-channel attacks rely on the unavoidable influence of computation or storage on current consumption or voltage drop on a chip. Such data-dependent influence can be exploited by, for instance, power or electromagnetic analysis. In this work, we introduce a novel non-invasive physical side-channel attack, which exploits the data-dependent changes in the impedance of the chip. Our attack relies on the fact that the temporarily stored contents in registers alter the physical characteristics of the circuit, which results in changes in the die's impedance. To sense such impedance variations, we deploy a well-known RF/microwave method called scattering parameter analysis, in which we inject sine wave signals with high frequencies into the system's power distribution network (PDN) and measure the echo of the signals. We demonstrate that according to the content bits and physical location of a register, the reflected signal is modulated differently at various frequency points enabling the simultaneous and independent probing of individual registers. Such side-channel leakage challenges the $t$-probing security model assumption used in masking, which is a prominent side-channel countermeasure. To validate our claims, we mount non-profiled and profiled impedance analysis attacks on hardware implementations of unprotected and high-order masked AES. We show that in the case of the profiled attack, only a single trace is required to recover the secret key. Finally, we discuss how a specific class of hiding countermeasures might be effective against impedance leakage.
△ Less
Submitted 23 October, 2023; v1 submitted 8 May, 2023;
originally announced October 2023.
-
HyperDbg: Reinventing Hardware-Assisted Debugging (Extended Version)
Authors:
Mohammad Sina Karvandi,
MohammadHossein Gholamrezaei,
Saleh Khalaj Monfared,
Soroush Meghdadizanjani,
Behrooz Abbassi,
Ali Amini,
Reza Mortazavi,
Saeid Gorgin,
Dara Rahmati,
Michael Schwarz
Abstract:
Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this pape…
▽ More
Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper, we present HyperDbg, a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-the-art hardware features available in today's CPUs, such as VT-x and extended page tables. In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that HyperDbg's stealthiness allows debugging on average 22% and 26% more samples than WinDbg and x64dbg, respectively. Moreover, in contrast to existing debuggers, HyperDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared to WinDbg as the only kernel debugger, HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as a 0-day analysis, structure reconstruction for reverse engineering, software performance analysis, and code-coverage analysis.
△ Less
Submitted 2 September, 2022; v1 submitted 29 May, 2022;
originally announced July 2022.
-
Unlucky Explorer: A Complete non-Overlapping Map Exploration
Authors:
Mohammad Sina Kiarostami,
Saleh Khalaj Monfared,
Mohammadreza Daneshvaramoli,
Ali Oliayi,
Negar Yousefian,
Dara Rahmati,
Saeid Gorgin
Abstract:
Nowadays, the field of Artificial Intelligence in Computer Games (AI in Games) is going to be more alluring since computer games challenge many aspects of AI with a wide range of problems, particularly general problems. One of these kinds of problems is Exploration, which states that an unknown environment must be explored by one or several agents. In this work, we have first introduced the Maze D…
▽ More
Nowadays, the field of Artificial Intelligence in Computer Games (AI in Games) is going to be more alluring since computer games challenge many aspects of AI with a wide range of problems, particularly general problems. One of these kinds of problems is Exploration, which states that an unknown environment must be explored by one or several agents. In this work, we have first introduced the Maze Dash puzzle as an exploration problem where the agent must find a Hamiltonian Path visiting all the cells. Then, we have investigated to find suitable methods by a focus on Monte-Carlo Tree Search (MCTS) and SAT to solve this puzzle quickly and accurately. An optimization has been applied to the proposed MCTS algorithm to obtain a promising result. Also, since the prefabricated test cases of this puzzle are not large enough to assay the proposed method, we have proposed and employed a technique to generate solvable test cases to evaluate the approaches. Eventually, the MCTS-based method has been assessed by the auto-generated test cases and compared with our implemented SAT approach that is considered a good rival. Our comparison indicates that the MCTS-based approach is an up-and-coming method that could cope with the test cases with small and medium sizes with faster run-time compared to SAT. However, for certain discussed reasons, including the features of the problem, tree search organization, and also the approach of MCTS in the Simulation step, MCTS takes more time to execute in Large size scenarios. Consequently, we have found the bottleneck for the MCTS-based method in significant test cases that could be improved in two real-world problems.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
A Way Around UMIP and Descriptor-Table Exiting via TSX-based Side-Channel
Authors:
Mohammad Sina Karvandi,
Saleh Khalaj Monfared,
Mohammad Sina Kiarostami,
Dara Rahmati,
Saeid Gorgin
Abstract:
Nowadays, in operating systems, numerous protection mechanisms prevent or limit the user-mode applicationsto access the kernels internal information. This is regularlycarried out by software-based defenses such as Address Space Layout Randomization (ASLR) and Kernel ASLR(KASLR). They play pronounced roles when the security of sandboxed applications such as Web-browser are considered.Armed with arb…
▽ More
Nowadays, in operating systems, numerous protection mechanisms prevent or limit the user-mode applicationsto access the kernels internal information. This is regularlycarried out by software-based defenses such as Address Space Layout Randomization (ASLR) and Kernel ASLR(KASLR). They play pronounced roles when the security of sandboxed applications such as Web-browser are considered.Armed with arbitrary write access in the kernel memory, if these protections are bypassed, an adversary could find a suitable where to write in order to get an elevation of privilege or code execution in ring 0. In this paper, we introduce a reliable method based on Transactional Synchronization Extensions (TSX) side-channel leakage to reveal the address of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT). We indicate that by detecting these addresses, one could execute instructions to sidestep the Intels User-Mode InstructionPrevention (UMIP) and the Hypervisor-based mitigation and, consequently, neutralized them. The introduced method is successfully performed after the most recent patches for Meltdown and Spectre. Moreover, the implementation of the proposed approach on different platforms, including the latest releases of Microsoft Windows, Linux, and, Mac OSX with the latest 9th generation of Intel processors, shows that the proposed mechanism is independent from the Operating System implementation. We demonstrate that a combinationof this method with call-gate mechanism (available in modernprocessors) in a chain of events will eventually lead toa system compromise despite the limitations of a super-secure sandboxed environment in the presence of Windows proprietary Virtualization Based Security (VBS). Finally, we suggest the software-based mitigation to avoid these issues with an acceptable overhead cost.
△ Less
Submitted 22 April, 2021; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Decentralized Cooperative Communication-less Multi-Agent Task Assignment with Monte-Carlo Tree Search
Authors:
Mohammadreza Daneshvaramoli,
Mohammad Sina Kiarostami,
Saleh Khalaj Monfared,
Helia Karisani,
Hamed Khashehchi,
Dara Rahmati,
Saeid Gorgin,
Amir Rahmati
Abstract:
Cooperative task assignment is an important subject in multi-agent systems with a wide range of applications. These systems are usually designed with massive communication among the agents to minimize the error in pursuit of the general goal of the entire system. In this work, we propose a novel approach for Decentralized Cooperative Communication-less Multi-Agent Task Assignment (DCCMATA) employi…
▽ More
Cooperative task assignment is an important subject in multi-agent systems with a wide range of applications. These systems are usually designed with massive communication among the agents to minimize the error in pursuit of the general goal of the entire system. In this work, we propose a novel approach for Decentralized Cooperative Communication-less Multi-Agent Task Assignment (DCCMATA) employing Monte-Carlo Tree Search (MCTS). Here, each agent can assign the optimal task by itself for itself. We design the system to automatically maximize the success rate, achieving the collective goal effectively. To put it another way, the agents optimally compute each following step, only by knowing the current location of other agents, with no additional communication overhead. In contrast with the previously proposed methods which rely on the task assignment procedure for similar problems, we describe a method in which the agents move towards the collective goal. This may lead to scenarios where some agents not necessarily move towards the closest goal. However, the total efficiency (makespan) and effectiveness (success ratio) in these cases are significantly improved. To evaluate our approach, we have tested the algorithm with a wide range of parameters(agents, size, goal). Our implementation completely solves (Success Rate = %100) a 20*20 grid with 20 goals by 20 agents in 7.9 s runtime for each agent. Also, the proposed algorithm runs with the complexity of O(N^2I^2 + IN^4), where the I and N are the MCTS iterative index and grid size, respectively.
△ Less
Submitted 23 February, 2020; v1 submitted 26 October, 2019;
originally announced October 2019.
-
Generating High Quality Random Numbers: A High Throughput Parallel Bitsliced Approach
Authors:
Saleh Khalaj Monfared,
Omid Hajihassani,
Soroush Meghdadi Zanjani,
Mohammadsina Kiarostami,
Dara Rahmati,
Saeid Gorgin
Abstract:
In this work, by employing a bitsliced data representation as building blocks of algorithms, we showcase the capability and scalability of our proposed method in a variety of PRNG methods in the category of block and stream ciphers. While demonstrating the suitability of stream-ciphers for high throughput PRNG, as an example, we implement and investigate a bitsliced MICKEY 2.0 PRNG by altering the…
▽ More
In this work, by employing a bitsliced data representation as building blocks of algorithms, we showcase the capability and scalability of our proposed method in a variety of PRNG methods in the category of block and stream ciphers. While demonstrating the suitability of stream-ciphers for high throughput PRNG, as an example, we implement and investigate a bitsliced MICKEY 2.0 PRNG by altering the paradigm of internal functions and data structure. The LFSR-based (Linear Feedback Shift Register) nature of the PRNG in our implementation perfectly suits the GPU's many-core structure due to its register oriented architecture and allows the usage of bit slicing technique to further improve the performance. In our SIMD vectorized fully parallel GPU implementation, each GPU thread is capable of generating a remarkable number of 32 pseudo-random bits in each LFSR clock cycle. We then compare our implementation with some of the most significant PRNGs that display a satisfactory performance in both throughput and randomness criteria. The proposed implementation successfully passes the NIST test for statistical randomness and bit-wise correlation criteria. To the best of authors' best knowledge, our method outperforms the current best implementations in the literature for computer-based PRNG and the optical solutions in terms of performance and performance per cost, while maintaining an acceptable measure of randomness. Our highest performance among all of the implemented CPRNGs with the proposed method is achieved by the MICKEY 2.0 algorithm which shows 1.9x improvement over the state of the art NVIDIA's proprietary high-performance PRNG, cuRAND library, achieving 1.6 Tb/s of throughput on the affordable NVIDIA GTX 980 Ti.
△ Less
Submitted 20 October, 2019; v1 submitted 10 September, 2019;
originally announced September 2019.