-
Gain Cell-Based Analog Content Addressable Memory for Dynamic Associative tasks in AI
Authors:
Paul-Philipp Manea,
Nathan Leroux,
Emre Neftci,
John Paul Strachan
Abstract:
Analog Content Addressable Memories (aCAMs) have proven useful for associative in-memory computing applications like Decision Trees, Finite State Machines, and Hyper-dimensional Computing. While non-volatile implementations using FeFETs and ReRAM devices offer speed, power, and area advantages, they suffer from slow write speeds and limited write cycles, making them less suitable for computations…
▽ More
Analog Content Addressable Memories (aCAMs) have proven useful for associative in-memory computing applications like Decision Trees, Finite State Machines, and Hyper-dimensional Computing. While non-volatile implementations using FeFETs and ReRAM devices offer speed, power, and area advantages, they suffer from slow write speeds and limited write cycles, making them less suitable for computations involving fully dynamic data patterns. To address these limitations, in this work, we propose a capacitor gain cell-based aCAM designed for dynamic processing, where frequent memory updates are required. Our system compares analog input voltages to boundaries stored in capacitors, enabling efficient dynamic tasks. We demonstrate the application of aCAM within transformer attention mechanisms by replacing the softmax-scaled dot-product similarity with aCAM similarity, achieving competitive results. Circuit simulations on a TSMC 28 nm node show promising performance in terms of energy efficiency, precision, and latency, making it well-suited for fast, dynamic AI applications.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Authors:
Nathan Leroux,
Paul-Philipp Manea,
Chirag Sudarshan,
Jan Finkbeiner,
Sebastian Siegel,
John Paul Strachan,
Emre Neftci
Abstract:
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks.
We present a custom self-attention in-memory computing…
▽ More
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks.
We present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text processing performance comparable to GPT-2 without training from scratch. Our architecture respectively reduces attention latency and energy consumption by up to two and five orders of magnitude compared to GPUs, marking a significant step toward ultra-fast, low-power generative Transformers.
△ Less
Submitted 25 November, 2024; v1 submitted 28 September, 2024;
originally announced September 2024.
-
Analog Feedback-Controlled Memristor programming Circuit for analog Content Addressable Memory
Authors:
Jiaao Yu,
Paul-Philipp Manea,
Sara Ameli,
Mohammad Hizzani,
Amro Eldebiky,
John Paul Strachan
Abstract:
Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conduct…
▽ More
Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conductance, which brings many defects such as high dynamic power and long programming time. Here, we propose an analog feedback-controlled memristor programming circuit that makes use of a novel look-up table-based (LUT-based) programming algorithm. With the proposed algorithm, the programming and the verification of a memristor can be performed in a single-direction sequential process. Besides, we also integrated a single proposed programming circuit with eight analog CAM (aCAM) cells to build an aCAM array. We present SPICE simulations on TSMC 28nm process. The theoretical analysis shows that 1. A memristor conductance within an aCAM cell can be converted to an output boundary voltage in aCAM searching operations and 2. An output boundary voltage in aCAM searching operations can be converted to a programming data line voltage in aCAM programming operations. The simulation results of the proposed programming circuit prove the theoretical analysis and thus verify the feasibility to program memristors without frequently switching between verifying and programming the conductance. Besides, the simulation results of the proposed aCAM array show that the proposed programming circuit can be integrated into a large array architecture.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.