-
Device-Algorithm Co-Design of Ferroelectric Compute-in-Memory In-Situ Annealer for Combinatorial Optimization Problems
Authors:
Yu Qian,
Xianmin Huang,
Ranran Wang,
Zeyu Yang,
Min Zhou,
Thomas Kämpfe,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Combinatorial optimization problems (COPs) are crucial in many applications but are computationally demanding. Traditional Ising annealers address COPs by directly converting them into Ising models (known as direct-E transformation) and solving them through iterative annealing. However, these approaches require vector-matrix-vector (VMV) multiplications with a complexity of $O(n^2)$ for Ising ener…
▽ More
Combinatorial optimization problems (COPs) are crucial in many applications but are computationally demanding. Traditional Ising annealers address COPs by directly converting them into Ising models (known as direct-E transformation) and solving them through iterative annealing. However, these approaches require vector-matrix-vector (VMV) multiplications with a complexity of $O(n^2)$ for Ising energy computation and complex exponential annealing factor calculations during annealing process, thus significantly increasing hardware costs. In this work, we propose a ferroelectric compute-in-memory (CiM) in-situ annealer to overcome aforementioned challenges. The proposed device-algorithm co-design framework consists of (i) a novel transformation method (first to our known) that converts COPs into an innovative incremental-E form, which reduces the complexity of VMV multiplication from $O(n^2)$ to $O(n)$, and approximates exponential annealing factor with a much simplified fractional form; (ii) a double gate ferroelectric FET (DG FeFET)-based CiM crossbar that efficiently computes the in-situ incremental-E form by leveraging the unique structure of DG FeFETs; (iii) %When feasible solutions are detected, a CiM annealer that approaches the solutions of COPs via iterative incremental-E computations within a tunable back gate-based in-situ annealing flow. Evaluation results show that our proposed CiM annealer significantly reduces hardware overhead, reducing energy consumption by 1503/1716$\times$ and time cost by 8.08/8.15$\times$ in solving 3000-node Max-Cut problems compared to two state-of-the-art annealers. It also exhibits high solving efficiency, achieving a remarkable average success rate of 98\%, whereas other annealers show only 50\% given the same iteration counts.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
A Bio-inspired Asymmetric Double-Gate Ferroelectric FET for Emulating Astrocyte and Dendrite Dynamics in Neuromorphic Systems
Authors:
Zhouhang Jiang,
A N M Nafiul Islam,
Zhuangyu Han,
Zijian Zhao,
Franz Müller,
Jiahui Duan,
Halid Mulaosmanovic,
Stefan Dünkel,
Sven Beyer,
Sourav Dutta,
Vijaykrishnan Narayanan,
Thomas Kämpfe,
Suma George Cardwell,
Frances Chance,
Abhronil Sengupta,
Kai Ni
Abstract:
Neuromorphic systems seek to replicate the functionalities of biological neural networks to attain significant improvements in performance and efficiency of AI computing platforms. However, these systems have generally remained limited to emulation of simple neurons and synapses; and ignored higher order functionalities enabled by other components of the brain like astrocytes and dendrites. In thi…
▽ More
Neuromorphic systems seek to replicate the functionalities of biological neural networks to attain significant improvements in performance and efficiency of AI computing platforms. However, these systems have generally remained limited to emulation of simple neurons and synapses; and ignored higher order functionalities enabled by other components of the brain like astrocytes and dendrites. In this work, drawing inspiration from biology, we introduce a compact Double-Gate Ferroelectric Field Effect Transistor (DG-FeFET) cell that can emulate the dynamics of both astrocytes and dendrites within neuromorphic architectures. We demonstrate that with a ferroelectric top gate for synaptic weight programming as in conventional synapses and a non-ferroelectric back gate, the DG-FeFET realizes a synapse with a dynamic gain modulation mechanism. This can be leveraged as an analog for a compact astrocyte-tripartite synapse, as well as enabling dendrite-like gain modulation operations. By employing a fully-depleted silicon-on-insulator (FDSOI) FeFET as our double-gate device, we validate the linear control of the synaptic weight via the back gate terminal (i.e., the gate underneath the buried oxide (BOX) layer) through comprehensive theoretical and experimental studies. We showcase the promise such a tripartite synaptic device holds for numerous important neuromorphic applications, including autonomous self-repair of faulty neuromorphic hardware mediated by astrocytic functionality. Coordinate transformations based on dragonfly prey-interception circuitry models are also demonstrated based on dendritic function emulation by the device. This work paves the way forward for developing truly "brain-like" neuromorphic hardware that go beyond the current dogma focusing only on neurons and synapses.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
An In-Situ Spatial-Temporal Sequence Detector for Neuromorphic Vision Sensor Empowered by High Density Vertical NAND Storage
Authors:
Zijian Zhao,
Varun Darshana Parekh,
Po-Kai Hsu,
Yixin Qin,
Yiming Song,
A N M Nafiul Islam,
Ningyuan Cao,
Siddharth Joshi,
Thomas Kämpfe,
Moonyoung Jung,
Kwangyou Seo,
Kwangsoo Kim,
Wanki Kim,
Daewon Ha,
Sourav Dutta,
Abhronil Sengupta,
Xiao Gong,
Shimeng Yu,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements,…
▽ More
Neuromorphic vision sensors require efficient real-time pattern recognition, yet conventional architectures struggle with energy and latency constraints. Here, we present a novel in-situ spatiotemporal sequence detector that leverages vertical NAND storage to achieve massively parallel pattern detection. By encoding each cell with two single-transistor-based multi-level cell (MLC) memory elements, such as ferroelectric field-effect transistors (FeFETs), and mapping a pixel's temporal sequence onto consecutive word lines (WLs), we enable direct temporal pattern detection within NAND strings. Each NAND string serves as a dedicated reference for a single pixel, while different blocks store patterns for distinct pixels, allowing large-scale spatial-temporal pattern recognition via simple direct bit-line (BL) sensing, a well-established operation in vertical NAND storage. We experimentally validate our approach at both the cell and array levels, demonstrating that vertical NAND-based detector achieves more than six orders of magnitude improvement in energy efficiency and more than three orders of magnitude reduction in latency compared to conventional CPU-based methods. These findings establish vertical NAND storage as a scalable and energy-efficient solution for next-generation neuromorphic vision processing.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
TAP-CAM: A Tunable Approximate Matching Engine based on Ferroelectric Content Addressable Memory
Authors:
Chenyu Ni,
Sijie Chen,
Che-Kai Liu,
Liu Liu,
Mohsen Imani,
Thomas Kampfe,
Kai Ni,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Pattern search is crucial in numerous analytic applications for retrieving data entries akin to the query. Content Addressable Memories (CAMs), an in-memory computing fabric, directly compare input queries with stored entries through embedded comparison logic, facilitating fast parallel pattern search in memory. While conventional CAM designs offer exact match functionality, they are inadequate fo…
▽ More
Pattern search is crucial in numerous analytic applications for retrieving data entries akin to the query. Content Addressable Memories (CAMs), an in-memory computing fabric, directly compare input queries with stored entries through embedded comparison logic, facilitating fast parallel pattern search in memory. While conventional CAM designs offer exact match functionality, they are inadequate for meeting the approximate search needs of emerging data-intensive applications. Some recent CAM designs propose approximate matching functions, but they face limitations such as excessively large cell area or the inability to precisely control the degree of approximation. In this paper, we propose TAP-CAM, a novel ferroelectric field effect transistor (FeFET) based ternary CAM (TCAM) capable of both exact and tunable approximate matching. TAP-CAM employs a compact 2FeFET-2R cell structure as the entry storage unit, and similarities in Hamming distances between input queries and stored entries are measured using an evaluation transistor associated with the matchline of CAM array. The operation, robustness and performance of the proposed design at array level have been discussed and evaluated, respectively. We conduct a case study of K-nearest neighbor (KNN) search to benchmark the proposed TAP-CAM at application level. Results demonstrate that compared to 16T CMOS CAM with exact match functionality, TAP-CAM achieves a 16.95x energy improvement, along with a 3.06% accuracy enhancement. Compared to 2FeFET TCAM with approximate match functionality, TAP-CAM achieves a 6.78x energy improvement.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
TReCiM: Lower Power and Temperature-Resilient Multibit 2FeFET-1T Compute-in-Memory Design
Authors:
Yifei Zhou,
Thomas Kämpfe,
Kai Ni,
Hussam Amrouch,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Compute-in-memory (CiM) emerges as a promising solution to solve hardware challenges in artificial intelligence (AI) and the Internet of Things (IoT), particularly addressing the "memory wall" issue. By utilizing nonvolatile memory (NVM) devices in a crossbar structure, CiM efficiently accelerates multiply-accumulate (MAC) computations, the crucial operations in neural networks and other AI models…
▽ More
Compute-in-memory (CiM) emerges as a promising solution to solve hardware challenges in artificial intelligence (AI) and the Internet of Things (IoT), particularly addressing the "memory wall" issue. By utilizing nonvolatile memory (NVM) devices in a crossbar structure, CiM efficiently accelerates multiply-accumulate (MAC) computations, the crucial operations in neural networks and other AI models. Among various NVM devices, Ferroelectric FET (FeFET) is particularly appealing for ultra-low-power CiM arrays due to its CMOS compatibility, voltage-driven write/read mechanisms and high ION/IOFF ratio. Moreover, subthreshold-operated FeFETs, which operate at scaling voltages in the subthreshold region, can further minimize the power consumption of CiM array. However, subthreshold-FeFETs are susceptible to temperature drift, resulting in computation accuracy degradation. Existing solutions exhibit weak temperature resilience at larger array size and only support 1-bit. In this paper, we propose TReCiM, an ultra-low-power temperature-resilient multibit 2FeFET-1T CiM design that reliably performs MAC operations in the subthreshold-FeFET region with temperature ranging from 0 to 85 degrees Celcius at scale. We benchmark our design using NeuroSim framework in the context of VGG-8 neural network architecture running the CIFAR-10 dataset. Benchmarking results suggest that when considering temperature drift impact, our proposed TReCiM array achieves 91.31% accuracy, with 1.86% accuracy improvement compared to existing 1-bit 2T-1FeFET CiM array. Furthermore, our proposed design achieves 48.03 TOPS/W energy efficiency at system level, comparable to existing designs with smaller technology feature sizes.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Energy Efficient Dual Designs of FeFET-Based Analog In-Memory Computing with Inherent Shift-Add Capability
Authors:
Zeyu Yang,
Qingrong Huang,
Yu Qian,
Kai Ni,
Thomas Kämpfe,
Xunzhao Yin
Abstract:
In-memory computing (IMC) architecture emerges as a promising paradigm, improving the energy efficiency of multiply-and-accumulate (MAC) operations within DNNs by integrating the parallel computations within the memory arrays. Various high-precision analog IMC array designs have been developed based on both SRAM and emerging non-volatile memories. These designs perform MAC operations of partial in…
▽ More
In-memory computing (IMC) architecture emerges as a promising paradigm, improving the energy efficiency of multiply-and-accumulate (MAC) operations within DNNs by integrating the parallel computations within the memory arrays. Various high-precision analog IMC array designs have been developed based on both SRAM and emerging non-volatile memories. These designs perform MAC operations of partial input and weight, with the corresponding partial products then fed into shift-add circuitry to produce the final MAC results. However, existing works often present intricate shift-add process for weight. The traditional digital shift-add process is limited in throughput due to time-multiplexing of ADCs, and advancing the shift-add process to the analog domain necessitates customized circuit implementations, resulting in compromises in energy and area efficiency. Furthermore, the joint optimization of the partial MAC operations and the weight shift-add process is rarely explored. In this paper, we propose novel, energy efficient dual designs of FeFET based high precision analog IMC featuring inherent shift-add capability. We introduce a FeFET based IMC paradigm that performs partial MAC in each column, and inherently integrates the shift-add process for 4-bit weights by leveraging FeFET's analog storage characteristics. This paradigm supports both 2's complement mode and non-2's complement mode MAC, thereby offering flexible support for 4-/8-bit weight data in 2's complement format. Building upon this paradigm, we propose novel FeFET based dual designs, CurFe for the current mode and ChgFe for the charge mode, to accommodate the high precision analog domain IMC architecture.Evaluation results at circuit and system levels indicate that the circuit/system-level energy efficiency of the proposed FeFET-based analog IMC is 1.56$\times$/1.37$\times$ higher when compared to SOTA analog IMC designs.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
FeBiM: Efficient and Compact Bayesian Inference Engine Empowered with Ferroelectric In-Memory Computing
Authors:
Chao Li,
Zhicheng Xu,
Bo Wen,
Ruibin Mao,
Can Li,
Thomas Kämpfe,
Kai Ni,
Xunzhao Yin
Abstract:
In scenarios with limited training data or where explainability is crucial, conventional neural network-based machine learning models often face challenges. In contrast, Bayesian inference-based algorithms excel in providing interpretable predictions and reliable uncertainty estimation in these scenarios. While many state-of-the-art in-memory computing (IMC) architectures leverage emerging non-vol…
▽ More
In scenarios with limited training data or where explainability is crucial, conventional neural network-based machine learning models often face challenges. In contrast, Bayesian inference-based algorithms excel in providing interpretable predictions and reliable uncertainty estimation in these scenarios. While many state-of-the-art in-memory computing (IMC) architectures leverage emerging non-volatile memory (NVM) technologies to offer unparalleled computing capacity and energy efficiency for neural network workloads, their application in Bayesian inference is limited. This is because the core operations in Bayesian inference differ significantly from the multiplication-accumulation (MAC) operations common in neural networks, rendering them generally unsuitable for direct implementation in most existing IMC designs. In this paper, we propose FeBiM, an efficient and compact Bayesian inference engine powered by multi-bit ferroelectric field-effect transistor (FeFET)-based IMC. FeBiM effectively encodes the trained probabilities of a Bayesian inference model within a compact FeFET-based crossbar. It maps quantized logarithmic probabilities to discrete FeFET states. As a result, the accumulated outputs of the crossbar naturally represent the posterior probabilities, i.e., the Bayesian inference model's output given a set of observations. This approach enables efficient in-memory Bayesian inference without the need for additional calculation circuitry. As the first FeFET-based in-memory Bayesian inference engine, FeBiM achieves an impressive storage density of 26.32 Mb/mm$^{2}$ and a computing efficiency of 581.40 TOPS/W in a representative Bayesian classification task. These results demonstrate 10.7$\times$/43.4$\times$ improvement in compactness/efficiency compared to the state-of-the-art hardware implementation of Bayesian inference.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
A Remedy to Compute-in-Memory with Dynamic Random Access Memory: 1FeFET-1C Technology for Neuro-Symbolic AI
Authors:
Xunzhao Yin,
Hamza Errahmouni Barkam,
Franz Müller,
Yuxiao Jiang,
Mohsen Imani,
Sukhrob Abdulazhanov,
Alptekin Vardar,
Nellie Laleni,
Zijian Zhao,
Jiahui Duan,
Zhiguo Shi,
Siddharth Joshi,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Neuro-symbolic artificial intelligence (AI) excels at learning from noisy and generalized patterns, conducting logical inferences, and providing interpretable reasoning. Comprising a 'neuro' component for feature extraction and a 'symbolic' component for decision-making, neuro-symbolic AI has yet to fully benefit from efficient hardware accelerators. Additionally, current hardware struggles to acc…
▽ More
Neuro-symbolic artificial intelligence (AI) excels at learning from noisy and generalized patterns, conducting logical inferences, and providing interpretable reasoning. Comprising a 'neuro' component for feature extraction and a 'symbolic' component for decision-making, neuro-symbolic AI has yet to fully benefit from efficient hardware accelerators. Additionally, current hardware struggles to accommodate applications requiring dynamic resource allocation between these two components. To address these challenges-and mitigate the typical data-transfer bottleneck of classical Von Neumann architectures-we propose a ferroelectric charge-domain compute-in-memory (CiM) array as the foundational processing element for neuro-symbolic AI. This array seamlessly handles both the critical multiply-accumulate (MAC) operations of the 'neuro' workload and the parallel associative search operations of the 'symbolic' workload. To enable this approach, we introduce an innovative 1FeFET-1C cell, combining a ferroelectric field-effect transistor (FeFET) with a capacitor. This design, overcomes the destructive sensing limitations of DRAM in CiM applications, while capable of capitalizing decades of DRAM expertise with a similar cell structure as DRAM, achieves high immunity against FeFET variation-crucial for neuro-symbolic AI-and demonstrates superior energy efficiency. The functionalities of our design have been successfully validated through SPICE simulations and prototype fabrication and testing. Our hardware platform has been benchmarked in executing typical neuro-symbolic AI reasoning tasks, showing over 2x improvement in latency and 1000x improvement in energy efficiency compared to GPU-based implementations.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
HyCiM: A Hybrid Computing-in-Memory QUBO Solver for General Combinatorial Optimization Problems with Inequality Constraints
Authors:
Yu Qian,
Zeyu Yang,
Kai Ni,
Alptekin Vardar,
Thomas Kämpfe,
Xunzhao Yin
Abstract:
Computationally challenging combinatorial optimization problems (COPs) play a fundamental role in various applications. To tackle COPs, many Ising machines and Quadratic Unconstrained Binary Optimization (QUBO) solvers have been proposed, which typically involve direct transformation of COPs into Ising models or equivalent QUBO forms (D-QUBO). However, when addressing COPs with inequality constrai…
▽ More
Computationally challenging combinatorial optimization problems (COPs) play a fundamental role in various applications. To tackle COPs, many Ising machines and Quadratic Unconstrained Binary Optimization (QUBO) solvers have been proposed, which typically involve direct transformation of COPs into Ising models or equivalent QUBO forms (D-QUBO). However, when addressing COPs with inequality constraints, this D-QUBO approach introduces numerous extra auxiliary variables, resulting in a substantially larger search space, increased hardware costs, and reduced solving efficiency. In this work, we propose HyCiM, a novel hybrid computing-in-memory (CiM) based QUBO solver framework, designed to overcome aforementioned challenges. The proposed framework consists of (i) an innovative transformation method (first to our known) that converts COPs with inequality constraints into an inequality-QUBO form, thus eliminating the need of expensive auxiliary variables and associated calculations; (ii) "inequality filter", a ferroelectric FET (FeFET)-based CiM circuit that accelerates the inequality evaluation, and filters out infeasible input configurations; (iii) %When feasible solutions are detected, a FeFET-based CiM annealer that is capable of approaching global solutions of COPs via iterative QUBO computations within a simulated annealing process. The evaluation results show that HyCiM drastically narrows down the search space, eliminating $2^{100} \text{ to } 2^{2536}$ infeasible input configurations compared to the conventional D-QUBO approach.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
C-Nash: A Novel Ferroelectric Computing-in-Memory Architecture for Solving Mixed Strategy Nash Equilibrium
Authors:
Yu Qian,
Kai Ni,
Thomas Kämpfe,
Cheng Zhuo,
Xunzhao Yin
Abstract:
The concept of Nash equilibrium (NE), pivotal within game theory, has garnered widespread attention across numerous industries. Recent advancements introduced several quantum Nash solvers aimed at identifying pure strategy NE solutions (i.e., binary solutions) by integrating slack terms into the objective function, commonly referred to as slack-quadratic unconstrained binary optimization (S-QUBO).…
▽ More
The concept of Nash equilibrium (NE), pivotal within game theory, has garnered widespread attention across numerous industries. Recent advancements introduced several quantum Nash solvers aimed at identifying pure strategy NE solutions (i.e., binary solutions) by integrating slack terms into the objective function, commonly referred to as slack-quadratic unconstrained binary optimization (S-QUBO). However, incorporation of slack terms into the quadratic optimization results in changes of the objective function, which may cause incorrect solutions. Furthermore, these quantum solvers only identify a limited subset of pure strategy NE solutions, and fail to address mixed strategy NE (i.e., decimal solutions), leaving many solutions undiscovered. In this work, we propose C-Nash, a novel ferroelectric computing-in-memory (CiM) architecture that can efficiently handle both pure and mixed strategy NE solutions. The proposed architecture consists of (i) a transformation method that converts quadratic optimization into a MAX-QUBO form without introducing additional slack variables, thereby avoiding objective function changes; (ii) a ferroelectric FET (FeFET) based bi-crossbar structure for storing payoff matrices and accelerating the core vector-matrix-vector (VMV) multiplications of QUBO form; (iii) A winner-takes-all (WTA) tree implementing the MAX form and a two-phase based simulated annealing (SA) logic for searching NE solutions. Evaluations show that C-Nash has up to 68.6% increase in the success rate for identifying NE solutions, finding all pure and mixed NE solutions rather than only a portion of pure NE solutions, compared to D-Wave based quantum approaches. Moreover, C-Nash boasts a reduction up to 157.9X/79.0X in time-to-solutions compared to D-Wave 2000 Q6 and D-Wave Advantage 4.1, respectively.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search
Authors:
Zhicheng Xu,
Che-Kai Liu,
Chao Li,
Ruibin Mao,
Jianyi Yang,
Thomas Kämpfe,
Mohsen Imani,
Can Li,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search…
▽ More
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors
Authors:
Haotian Xu,
Jianyi Yang,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni,
Xunzhao Yin
Abstract:
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an…
▽ More
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering and amplification circuits, and emerging designs based on an ambipolar ferroelectric transistor require costly non-trivial characteristic tuning or complex technology process. In this paper, we show that a pair of standard ferroelectric field effect transistors (FeFETs) can be used to build compact frequency multipliers without aforementioned technology issues. By leveraging the tunable parabolic shape of the 2FeFET structures' transfer characteristics, we propose four reconfigurable frequency multipliers, which can switch between signal transmission and frequency doubling. Furthermore, based on the 2FeFET structures, we propose four frequency multipliers that realize triple, quadruple frequency modes, elucidating a scalable methodology to generate more multiplication harmonics of the input frequency. Performance metrics such as maximum operating frequency, power, etc., are evaluated and compared with existing works. We also implement a practical case of frequency modulation scheme based on the proposed reconfigurable multipliers without additional devices. Our work provides a novel path of scalable and reconfigurable frequency multiplier designs based on devices that have characteristics similar to FeFETs, and show that FeFETs are a promising candidate for signal processing and communication systems in terms of maximum operating frequency and power.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems
Authors:
Xunzhao Yin,
Yu Qian,
Alptekin Vardar,
Marcel Gunther,
Franz Muller,
Nellie Laleni,
Zijian Zhao,
Zhouhang Jiang,
Zhiguo Shi,
Yiyu Shi,
Xiao Gong,
Cheng Zhuo,
Thomas Kampfe,
Kai Ni
Abstract:
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COP…
▽ More
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Embedding Security into Ferroelectric FET Array via In-Situ Memory Operation
Authors:
Yixin Xu,
Yi Xiao,
Zijian Zhao,
Franz Müller,
Alptekin Vardar,
Xiao Gong,
Sumitha George,
Thomas Kämpfe,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable…
▽ More
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable. Existing solutions to address the security issues of NVMs are mainly based on Advanced Encryption Standard (AES), which incurs significant performance and power overhead. In this paper, we propose a lightweight memory encryption/decryption scheme by exploiting in-situ memory operations with negligible overhead. To validate the feasibility of the encryption/decryption scheme, device-level and array-level experiments are performed using ferroelectric field effect transistor (FeFET) as an example NVM without loss of generality. Besides, a comprehensive evaluation is performed on a 128x128 FeFET AND-type memory array in terms of area, latency, power and throughput. Compared with the AES-based scheme, our scheme shows around 22.6x/14.1x increase in encryption/decryption throughput with negligible power penalty. Furthermore, we evaluate the performance of our scheme over the AES-based scheme when deploying different neural network workloads. Our scheme yields significant latency reduction by 90% on average for encryption and decryption processes.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
A Homogeneous Processing Fabric for Matrix-Vector Multiplication and Associative Search Using Ferroelectric Time-Domain Compute-in-Memory
Authors:
Xunzhao Yin,
Qingrong Huang,
Franz Müller,
Shan Deng,
Alptekin Vardar,
Sourav De,
Zhouhang Jiang,
Mohsen Imani,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain wi…
▽ More
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain with each stage featuring the XOR/AND cell to control the associated capacitor loading and the computation results of binary MAC and CAM are reflected in the chain output signal delay, illustrating full digital compatibility; iii) comprehensive theoretical and experimental validation of the proposed 2FeFET cell and inverter delay chains and their robustness against FeFET variation; iv) the homogeneous processing fabric is applied in hyperdimensional computing to show dynamic and fine-grain resource allocation to accommodate different tasks requiring varying demands over the binary MAC and CAM resources.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Ferroelectric FET-based strong physical unclonable function: a low-power, high-reliable and reconfigurable solution for Internet-of-Things security
Authors:
Xinrui Guo,
Xiaoyang Ma,
Franz Muller,
Kai Ni,
Thomas Kampfe,
Yongpan Liu,
Vijaykrishnan Narayanan,
Xueqing Li
Abstract:
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle…
▽ More
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle (C2C) variation of FeFETs as the entropy source. Based on the experimental measurements, the proposed PUF shows satisfying performance including high uniformity, uniqueness, reconfigurability and reliability. To resist machine-learning attack, XOR structure was introduced, and simulations show that our proposed PUF has similar resistance to existing attack models with traditional arbiter PUFs. Furthermore, our design is shown to be power-efficient, and highly robust to write voltage, temperature and device size, which makes it a competitive security solution for Internet-of-Things edge devices.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
An Ultra-Compact Single FeFET Binary and Multi-Bit Associative Search Engine
Authors:
Xunzhao Yin,
Franz Müller,
Qingrong Huang,
Chao Li,
Mohsen Imani,
Zeyu Yang,
Jiahao Cai,
Maximilian Lederer,
Ricardo Olivo,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables paral…
▽ More
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables parallel associative search and in-memory hamming distance calculation; ii) a multi-bit CAM for exact search using the same CAM cell; iii) compact device designs that integrate the series resistor current limiter into the intrinsic FeFET structure to turn the 1FeFET1R into an effective 1FeFET cell; iv) a successful 2-step search operation and a sufficient sensing margin of the proposed binary and multi-bit 1FeFET1R CAM array with sizes of practical interests in both experiments and simulations, given the existing unoptimized FeFET device variation; v) 89.9x speedup and 66.5x energy efficiency improvement over the state-of-the art alignment tools on GPU in accelerating genome pattern matching applications through the hyperdimensional computing paradigm.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Deep Random Forest with Ferroelectric Analog Content Addressable Memory
Authors:
Xunzhao Yin,
Franz Müller,
Ann Franchesca Laguna,
Chao Li,
Wenwen Ye,
Qingrong Huang,
Qinming Zhang,
Zhiguo Shi,
Maximilian Lederer,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin…
▽ More
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key for hardware acceleration of DRF lies in efficiently realizing the branch-split operation at decision nodes when traversing a decision tree. In this work, we propose to implement DRF through simple associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell can perform a branch-split operation with an energy-efficient associative search by storing the decision boundaries as the analog polarization states in an FeFET. The DRF accelerator architecture and the corresponding mapping of the DRF model to the ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM based DRF and its robustness against FeFET device non-idealities are validated both in experiments and simulations. Evaluation results show that the FeFET ACAM DRF accelerator exhibits 10^6x/16x and 10^6x/2.5x improvements in terms of energy and latency when compared with other deep random forest hardware implementations on the state-of-the-art CPU/ReRAM, respectively.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Alleviation of Temperature Variation Induced Accuracy Degradation in Ferroelectric FinFET Based Neural Network
Authors:
Sourav De,
Hoang-Hiep Le,
Md. Aftab Baig,
Yao-Jen Lee,
Darsen D. Lu,
Thomas Kämpfe
Abstract:
This paper reports the impacts of temperature variation on the inference accuracy of pre-trained all-ferroelectric FinFET deep neural networks, along with plausible design techniques to abate these impacts. We adopted a pre-trained artificial neural network (N.N.) with 96.4% inference accuracy on the MNIST dataset as the baseline. As an aftermath of temperature change, a compact model captured the…
▽ More
This paper reports the impacts of temperature variation on the inference accuracy of pre-trained all-ferroelectric FinFET deep neural networks, along with plausible design techniques to abate these impacts. We adopted a pre-trained artificial neural network (N.N.) with 96.4% inference accuracy on the MNIST dataset as the baseline. As an aftermath of temperature change, a compact model captured the conductance drift of a programmed cell over a wide range of gate biases. We observed a significant inference accuracy degradation in the analog neural network at 233 K for an N.N. trained at 300 K. Finally, we deployed binary neural networks with "read voltage" optimization to ensure immunity of N.N. to accuracy degradation under temperature variation, maintaining an inference accuracy of 96%. Keywords: Ferroelectric memories
△ Less
Submitted 15 August, 2022; v1 submitted 3 March, 2021;
originally announced March 2021.
-
In-Memory Nearest Neighbor Search with FeFET Multi-Bit Content-Addressable Memories
Authors:
Arman Kazemi,
Mohammad Mehdi Sharifi,
Ann Franchesca Laguna,
Franz Müller,
Ramin Rajaei,
Ricardo Olivo,
Thomas Kämpfe,
Michael Niemier,
X. Sharon Hu
Abstract:
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_\infty$ and Hamming distance metrics, but…
▽ More
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_\infty$ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (FeFETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.