-
Real-time raw signal genomic analysis using fully integrated memristor hardware
Authors:
Peiyi He,
Shengbo Wang,
Ruibin Mao,
Sebastian Siegel,
Giacomo Pedretti,
Jim Ignowski,
John Paul Strachan,
Ruibang Luo,
Can Li
Abstract:
Advances in third-generation sequencing have enabled portable and real-time genomic sequencing, but real-time data processing remains a bottleneck, hampering on-site genomic analysis due to prohibitive time and energy costs. These technologies generate a massive amount of noisy analog signals that traditionally require basecalling and digital mapping, both demanding frequent and costly data moveme…
▽ More
Advances in third-generation sequencing have enabled portable and real-time genomic sequencing, but real-time data processing remains a bottleneck, hampering on-site genomic analysis due to prohibitive time and energy costs. These technologies generate a massive amount of noisy analog signals that traditionally require basecalling and digital mapping, both demanding frequent and costly data movement on von Neumann hardware. To overcome these challenges, we present a memristor-based hardware-software co-design that processes raw sequencer signals directly in analog memory, effectively combining the separated basecalling and read mapping steps. Here we demonstrate, for the first time, end-to-end memristor-based genomic analysis in a fully integrated memristor chip. By exploiting intrinsic device noise for locality-sensitive hashing and implementing parallel approximate searches in content-addressable memory, we experimentally showcase on-site applications including infectious disease detection and metagenomic classification. Our experimentally-validated analysis confirms the effectiveness of this approach on real-world tasks, achieving a state-of-the-art 97.15% F1 score in virus raw signal mapping, with 51x speed up and 477x energy saving compared to implementation on a state-of-the-art ASIC. These results demonstrate that memristor-based in-memory computing provides a viable solution for integration with portable sequencers, enabling truly real-time on-site genomic analysis for applications ranging from pathogen surveillance to microbial community profiling.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Accelerating Hybrid XOR$-$CNF SAT Problems Natively with In-Memory Computing
Authors:
Haesol Im,
Fabian Böhm,
Giacomo Pedretti,
Noriyuki Kushida,
Moslem Noori,
Elisabetta Valiante,
Xiangyi Zhang,
Chan-Woo Yang,
Tinish Bhattacharya,
Xia Sheng,
Jim Ignowski,
Arne Heittmann,
John Paul Strachan,
Masoud Mohseni,
Ray Beausoleil,
Thomas Van Vaerenbergh,
Ignacio Rozada
Abstract:
The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunication, solutions can often be found more efficiently by representing them with a combination of exclusive OR (XOR) and conjunctive normal form (CNF) clauses. We propose a hardware accelerator architect…
▽ More
The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunication, solutions can often be found more efficiently by representing them with a combination of exclusive OR (XOR) and conjunctive normal form (CNF) clauses. We propose a hardware accelerator architecture that natively embeds and solves such hybrid CNF$-$XOR problems using in-memory computing hardware. To achieve this, we introduce an algorithm and demonstrate, both experimentally and through simulations, how it can be efficiently implemented with memristor crossbar arrays. Compared to the conventional approaches that translate CNF$-$XOR problems to pure CNF problems, our simulations show that the accelerator improves computation speed, energy efficiency, and chip area utilization by $\sim$10$\times$ for a set of hard cryptographic benchmarking problems. Moreover, the accelerator achieves a $\sim$10$\times$ speedup and a $\sim$1000$\times$ gain in energy efficiency over state-of-the-art SAT solvers running on CPUs.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem
Authors:
Haesol Im,
Chan-Woo Yang,
Moslem Noori,
Dmitrii Dobrynin,
Elisabetta Valiante,
Giacomo Pedretti,
Arne Heittmann,
Thomas Van Vaerenbergh,
Masoud Mohseni,
John Paul Strachan,
Dmitri Strukov,
Ray Beausoleil,
Ignacio Rozada
Abstract:
Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. It has been demonstrated in the literature that such special-purpose solvers can outperform traditional CMOS architectures by orders of magnitude with respect to timing metrics on synthetic problems. However, they fac…
▽ More
Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. It has been demonstrated in the literature that such special-purpose solvers can outperform traditional CMOS architectures by orders of magnitude with respect to timing metrics on synthetic problems. However, they face challenges with constrained problems such as the quadratic assignment problem (QAP), where mapping to binary formulations such as QUBO introduces overhead and limits parallelism. In-memory computing (IMC) devices, such as memristor-based analog Ising machines, offer significant speedups and efficiency gains over traditional CPU-based solvers, particularly for solving combinatorial optimization problems. In this work, we present a novel local search heuristic designed for IMC hardware to tackle the QAP. Our approach enables massive parallelism that allows for computing of full neighbourhoods simultaneously to make update decisions. We ensure binary solutions remain feasible by selecting local moves that lead to neighbouring feasible solutions, leveraging feasible-space search heuristics and the underlying structure of a given problem. Our approach is compatible with both digital computers and analog hardware. We demonstrate its effectiveness in CPU implementations by comparing it with state-of-the-art heuristics for solving the QAP.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Solving Boolean satisfiability problems with resistive content addressable memories
Authors:
Giacomo Pedretti,
Fabian Böhm,
Tinish Bhattacharya,
Arne Heittman,
Xiangyi Zhang,
Mohammad Hizzani,
George Hutchinson,
Dongseok Kwon,
John Moon,
Elisabetta Valiante,
Ignacio Rozada,
Catherine E. Graves,
Jim Ignowski,
Masoud Mohseni,
John Paul Strachan,
Dmitri Strukov,
Ray Beausoleil,
Thomas Van Vaerenbergh
Abstract:
Solving optimization problems is a highly demanding workload requiring high-performance computing systems. Optimization solvers are usually difficult to parallelize in conventional digital architectures, particularly when stochastic decisions are involved. Recently, analog computing architectures for accelerating stochastic optimization solvers have been presented, but they were limited to academi…
▽ More
Solving optimization problems is a highly demanding workload requiring high-performance computing systems. Optimization solvers are usually difficult to parallelize in conventional digital architectures, particularly when stochastic decisions are involved. Recently, analog computing architectures for accelerating stochastic optimization solvers have been presented, but they were limited to academic problems in quadratic polynomial format. Here we present KLIMA, a k-Local In-Memory Accelerator with resistive Content Addressable Memories (CAMs) and Dot-Product Engines (DPEs) to accelerate the solution of high-order industry-relevant optimization problems, in particular Boolean Satisfiability. By co-designing the optimization heuristics and circuit architecture we improve the speed and energy to solution up to 182x compared to the digital state of the art.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Gain Cell-Based Analog Content Addressable Memory for Dynamic Associative tasks in AI
Authors:
Paul-Philipp Manea,
Nathan Leroux,
Emre Neftci,
John Paul Strachan
Abstract:
Analog Content Addressable Memories (aCAMs) have proven useful for associative in-memory computing applications like Decision Trees, Finite State Machines, and Hyper-dimensional Computing. While non-volatile implementations using FeFETs and ReRAM devices offer speed, power, and area advantages, they suffer from slow write speeds and limited write cycles, making them less suitable for computations…
▽ More
Analog Content Addressable Memories (aCAMs) have proven useful for associative in-memory computing applications like Decision Trees, Finite State Machines, and Hyper-dimensional Computing. While non-volatile implementations using FeFETs and ReRAM devices offer speed, power, and area advantages, they suffer from slow write speeds and limited write cycles, making them less suitable for computations involving fully dynamic data patterns. To address these limitations, in this work, we propose a capacitor gain cell-based aCAM designed for dynamic processing, where frequent memory updates are required. Our system compares analog input voltages to boundaries stored in capacitors, enabling efficient dynamic tasks. We demonstrate the application of aCAM within transformer attention mechanisms by replacing the softmax-scaled dot-product similarity with aCAM similarity, achieving competitive results. Circuit simulations on a TSMC 28 nm node show promising performance in terms of energy efficiency, precision, and latency, making it well-suited for fast, dynamic AI applications.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Authors:
Nathan Leroux,
Paul-Philipp Manea,
Chirag Sudarshan,
Jan Finkbeiner,
Sebastian Siegel,
John Paul Strachan,
Emre Neftci
Abstract:
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks.
We present a custom self-attention in-memory computing…
▽ More
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks.
We present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text processing performance comparable to GPT-2 without training from scratch. Our architecture respectively reduces attention latency and energy consumption by up to two and five orders of magnitude compared to GPUs, marking a significant step toward ultra-fast, low-power generative Transformers.
△ Less
Submitted 25 November, 2024; v1 submitted 28 September, 2024;
originally announced September 2024.
-
Roadmap to Neuromorphic Computing with Emerging Technologies
Authors:
Adnan Mehonic,
Daniele Ielmini,
Kaushik Roy,
Onur Mutlu,
Shahar Kvatinsky,
Teresa Serrano-Gotarredona,
Bernabe Linares-Barranco,
Sabina Spiga,
Sergey Savelev,
Alexander G Balanov,
Nitin Chawla,
Giuseppe Desoli,
Gerardo Malavena,
Christian Monzio Compagnoni,
Zhongrui Wang,
J Joshua Yang,
Ghazi Sarwat Syed,
Abu Sebastian,
Thomas Mikolajick,
Beatriz Noheda,
Stefan Slesazeck,
Bernard Dieny,
Tuo-Hung,
Hou,
Akhil Varri
, et al. (28 additional authors not shown)
Abstract:
The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t…
▽ More
The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining the next essential steps for their advancement.
△ Less
Submitted 5 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Integration of Physics-Derived Memristor Models with Machine Learning Frameworks
Authors:
Zhenming Yu,
Stephan Menzel,
John Paul Strachan,
Emre Neftci
Abstract:
Simulation frameworks such MemTorch, DNN+NeuroSim, and aihwkit are commonly used to facilitate the end-to-end co-design of memristive machine learning (ML) accelerators. These simulators can take device nonidealities into account and are integrated with modern ML frameworks. However, memristors in these simulators are modeled with either lookup tables or simple analytic models with basic nonlinear…
▽ More
Simulation frameworks such MemTorch, DNN+NeuroSim, and aihwkit are commonly used to facilitate the end-to-end co-design of memristive machine learning (ML) accelerators. These simulators can take device nonidealities into account and are integrated with modern ML frameworks. However, memristors in these simulators are modeled with either lookup tables or simple analytic models with basic nonlinearities. These simple models are unable to capture certain performance-critical aspects of device nonidealities. For example, they ignore the physical cause of switching, which induces errors in switching timings and thus incorrect estimations of conductance states. This work aims at bringing physical dynamics into consideration to model nonidealities while being compatible with GPU accelerators. We focus on Valence Change Memory (VCM) cells, where the switching nonlinearity and SET/RESET asymmetry relate tightly with the thermal resistance, ion mobility, Schottky barrier height, parasitic resistance, and other effects. The resulting dynamics require solving an ODE that captures changes in oxygen vacancies. We modified a physics-derived SPICE-level VCM model, integrated it with the aihwkit simulator and tested the performance with the MNIST dataset. Results show that noise that disrupts the SET/RESET matching affects network performance the most. This work serves as a tool for evaluating how physical dynamics in memristive devices affect neural network accuracy and can be used to guide the development of future integrated devices.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
The Ouroboros of Memristors: Neural Networks Facilitating Memristor Programming
Authors:
Zhenming Yu,
Ming-Jay Yang,
Jan Finkbeiner,
Sebastian Siegel,
John Paul Strachan,
Emre Neftci
Abstract:
Memristive devices hold promise to improve the scale and efficiency of machine learning and neuromorphic hardware, thanks to their compact size, low power consumption, and the ability to perform matrix multiplications in constant time. However, on-chip training with memristor arrays still faces challenges, including device-to-device and cycle-to-cycle variations, switching non-linearity, and espec…
▽ More
Memristive devices hold promise to improve the scale and efficiency of machine learning and neuromorphic hardware, thanks to their compact size, low power consumption, and the ability to perform matrix multiplications in constant time. However, on-chip training with memristor arrays still faces challenges, including device-to-device and cycle-to-cycle variations, switching non-linearity, and especially SET and RESET asymmetry. To combat device non-linearity and asymmetry, we propose to program memristors by harnessing neural networks that map desired conductance updates to the required pulse times. With our method, approximately 95% of devices can be programmed within a relative percentage difference of +-50% from the target conductance after just one attempt. Our approach substantially reduces memristor programming delays compared to traditional write-and-verify methods, presenting an advantageous solution for on-chip training scenarios. Furthermore, our proposed neural network can be accelerated by memristor arrays upon deployment, providing assistance while reducing hardware overhead compared with previous works.
This work contributes significantly to the practical application of memristors, particularly in reducing delays in memristor programming. It also envisions the future development of memristor-based machine learning accelerators.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Computing High-Degree Polynomial Gradients in Memory
Authors:
T. Bhattacharya,
G. H. Hutchinson,
G. Pedretti,
X. Sheng,
J. Ignowski,
T. Van Vaerenbergh,
R. Beausoleil,
J. P. Strachan,
D. B. Strukov
Abstract:
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine learning, and operations research applications. Prior work on such hardware, performed in the context of the Ising Machines and related concepts, is limited to quadr…
▽ More
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine learning, and operations research applications. Prior work on such hardware, performed in the context of the Ising Machines and related concepts, is limited to quadratic polynomials and not scalable to commonly used higher-order functions. Here, we propose a novel approach for massively parallel gradient calculations of high-degree polynomials, which is conducive to efficient mixed-signal in-memory computing circuit implementations and whose area complexity scales linearly with the number of variables and terms in the function and, most importantly, independent of its degree. Two flavors of such an approach are proposed. The first is limited to binary-variable polynomials typical in combinatorial optimization problems, while the second type is broader at the cost of a more complex periphery. To validate the former approach, we experimentally demonstrated solving a small-scale third-order Boolean satisfiability problem based on integrated metal-oxide memristor crossbar circuits, one of the most prospective in-memory computing device technologies, with a competitive heuristics algorithm. Simulation results for larger-scale, more practical problems show orders of magnitude improvements in the area, and related advantages in speed and energy efficiency compared to the state-of-the-art. We discuss how our work could enable even higher-performance systems after co-designing algorithms to exploit massively parallel gradient computation.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
MINT: A wrapper to make multi-modal and multi-image AI models interactive
Authors:
Jan Freyberg,
Abhijit Guha Roy,
Terry Spitz,
Beverly Freeman,
Mike Schaekermann,
Patricia Strachan,
Eva Schnider,
Renee Wong,
Dale R Webster,
Alan Karthikesalingam,
Yun Liu,
Krishnamurthy Dvijotham,
Umesh Telang
Abstract:
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method…
▽ More
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrapping a skin disease prediction model, where multiple images and a set of optional answers to $25$ standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Memristor-based hardware and algorithms for higher-order Hopfield optimization solver outperforming quadratic Ising machines
Authors:
Mohammad Hizzani,
Arne Heittmann,
George Hutchinson,
Dmitrii Dobrynin,
Thomas Van Vaerenbergh,
Tinish Bhattacharya,
Adrien Renaudineau,
Dmitri Strukov,
John Paul Strachan
Abstract:
Ising solvers offer a promising physics-based approach to tackle the challenging class of combinatorial optimization problems. However, typical solvers operate in a quadratic energy space, having only pair-wise coupling elements which already dominate area and energy. We show that such quadratization can cause severe problems: increased dimensionality, a rugged search landscape, and misalignment w…
▽ More
Ising solvers offer a promising physics-based approach to tackle the challenging class of combinatorial optimization problems. However, typical solvers operate in a quadratic energy space, having only pair-wise coupling elements which already dominate area and energy. We show that such quadratization can cause severe problems: increased dimensionality, a rugged search landscape, and misalignment with the original objective function. Here, we design and quantify a higher-order Hopfield optimization solver, with 28nm CMOS technology and memristive couplings for lower area and energy computations. We combine algorithmic and circuit analysis to show quantitative advantages over quadratic Ising Machines (IM)s, yielding 48x and 72x reduction in time-to-solution (TTS) and energy-to-solution (ETS) respectively for Boolean satisfiability problems of 150 variables, with favorable scaling.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Authors:
Shawn Xu,
Lin Yang,
Christopher Kelly,
Marcin Sieniek,
Timo Kohlberger,
Martin Ma,
Wei-Hung Weng,
Atilla Kiraly,
Sahar Kazemzadeh,
Zakkai Melamed,
Jungyeon Park,
Patricia Strachan,
Yun Liu,
Chuck Lau,
Preeti Singh,
Christina Chen,
Mozziyar Etemadi,
Sreenivasa Raju Kalidindi,
Yossi Matias,
Katherine Chou,
Greg S. Corrado,
Shravya Shetty,
Daniel Tse,
Shruthi Prabhakara,
Daniel Golden
, et al. (3 additional authors not shown)
Abstract:
In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach…
▽ More
In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
△ Less
Submitted 7 September, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Conformal prediction under ambiguous ground truth
Authors:
David Stutz,
Abhijit Guha Roy,
Tatiana Matejovicova,
Patricia Strachan,
Ali Taylan Cemgil,
Arnaud Doucet
Abstract:
Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-α$ for a user-chosen $α\in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label…
▽ More
Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-α$ for a user-chosen $α\in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically.
△ Less
Submitted 24 October, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Evaluating AI systems under uncertain ground truth: a case study in dermatology
Authors:
David Stutz,
Ali Taylan Cemgil,
Abhijit Guha Roy,
Tatiana Matejovicova,
Melih Barsbey,
Patricia Strachan,
Mike Schaekermann,
Jan Freyberg,
Rajeev Rikhye,
Beverly Freeman,
Javier Perez Matos,
Umesh Telang,
Dale R. Webster,
Yuan Liu,
Greg S. Corrado,
Yossi Matias,
Pushmeet Kohli,
Yun Liu,
Arnaud Doucet,
Alan Karthikesalingam
Abstract:
For safety, medical AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is assumed to be fixed and certain. However, this ground truth is often curated in the form of differential diagnoses. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty…
▽ More
For safety, medical AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is assumed to be fixed and certain. However, this ground truth is often curated in the form of differential diagnoses. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty through disagreement. Both forms of uncertainty are ignored in standard evaluation which aggregates these differential diagnoses to a single label. In this paper, we show that ignoring uncertainty leads to overly optimistic estimates of model performance, therefore underestimating risk associated with particular diagnostic decisions. To this end, we propose a statistical aggregation approach, where we infer a distribution on probabilities of underlying medical condition candidates themselves, based on observed annotations. This formulation naturally accounts for the potential disagreements between different experts, as well as uncertainty stemming from individual differential diagnoses, capturing the entire ground truth uncertainty. Our approach boils down to generating multiple samples of medical condition probabilities, then evaluating and averaging performance metrics based on these sampled probabilities. In skin condition classification, we find that a large portion of the dataset exhibits significant ground truth uncertainty and standard evaluation severely over-estimates performance without providing uncertainty estimates. In contrast, our framework provides uncertainty estimates on common metrics of interest such as top-k accuracy and average overlap, showing that performance can change multiple percentage points. We conclude that, while assuming a crisp ground truth can be acceptable for many AI applications, a more nuanced evaluation protocol should be utilized in medical diagnosis.
△ Less
Submitted 13 April, 2025; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Analog Feedback-Controlled Memristor programming Circuit for analog Content Addressable Memory
Authors:
Jiaao Yu,
Paul-Philipp Manea,
Sara Ameli,
Mohammad Hizzani,
Amro Eldebiky,
John Paul Strachan
Abstract:
Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conduct…
▽ More
Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conductance, which brings many defects such as high dynamic power and long programming time. Here, we propose an analog feedback-controlled memristor programming circuit that makes use of a novel look-up table-based (LUT-based) programming algorithm. With the proposed algorithm, the programming and the verification of a memristor can be performed in a single-direction sequential process. Besides, we also integrated a single proposed programming circuit with eight analog CAM (aCAM) cells to build an aCAM array. We present SPICE simulations on TSMC 28nm process. The theoretical analysis shows that 1. A memristor conductance within an aCAM cell can be converted to an output boundary voltage in aCAM searching operations and 2. An output boundary voltage in aCAM searching operations can be converted to a programming data line voltage in aCAM programming operations. The simulation results of the proposed programming circuit prove the theoretical analysis and thus verify the feasibility to program memristors without frequently switching between verifying and programming the conductance. Besides, the simulation results of the proposed aCAM array show that the proposed programming circuit can be integrated into a large array architecture.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
High-Speed and Energy-Efficient Non-Volatile Silicon Photonic Memory Based on Heterogeneously Integrated Memresonator
Authors:
Bassem Tossoun,
Di Liang,
Stanley Cheung,
Zhuoran Fang,
Xia Sheng,
John Paul Strachan,
Raymond G. Beausoleil
Abstract:
Recently, interest in programmable photonics integrated circuits has grown as a potential hardware framework for deep neural networks, quantum computing, and field programmable arrays (FPGAs). However, these circuits are constrained by the limited tuning speed and large power consumption of the phase shifters used. In this paper, introduced for the first time are memresonators, or memristors heter…
▽ More
Recently, interest in programmable photonics integrated circuits has grown as a potential hardware framework for deep neural networks, quantum computing, and field programmable arrays (FPGAs). However, these circuits are constrained by the limited tuning speed and large power consumption of the phase shifters used. In this paper, introduced for the first time are memresonators, or memristors heterogeneously integrated with silicon photonic microring resonators, as phase shifters with non-volatile memory. These devices are capable of retention times of 12 hours, switching voltages lower than 5 V, an endurance of 1,000 switching cycles. Also, these memresonators have been switched using voltage pulses as short as 300 ps with a record low switching energy of 0.15 pJ. Furthermore, these memresonators are fabricated on a heterogeneous III-V/Si platform capable of integrating a rich family of active, passive, and non-linear optoelectronic devices, such as lasers and detectors, directly on-chip to enable in-memory photonic computing and further advance the scalability of integrated photonic processor circuits.
△ Less
Submitted 25 May, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Experimentally realized memristive memory augmented neural network
Authors:
Ruibin Mao,
Bo Wen,
Yahui Zhao,
Arman Kazemi,
Ann Franchesca Laguna,
Michael Neimier,
X. Sharon Hu,
Xia Sheng,
Catherine E. Graves,
John Paul Strachan,
Can Li
Abstract:
Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have diff…
▽ More
Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Prospects for Analog Circuits in Deep Networks
Authors:
Shih-Chii Liu,
John Paul Strachan,
Arindam Basu
Abstract:
Operations typically used in machine learning al-gorithms (e.g. adds and soft max) can be implemented bycompact analog circuits. Analog Application-Specific Integrated Circuit (ASIC) designs that implement these algorithms using techniques such as charge sharing circuits and subthreshold transistors, achieve very high power efficiencies. With the recent advances in deep learning algorithms, focus…
▽ More
Operations typically used in machine learning al-gorithms (e.g. adds and soft max) can be implemented bycompact analog circuits. Analog Application-Specific Integrated Circuit (ASIC) designs that implement these algorithms using techniques such as charge sharing circuits and subthreshold transistors, achieve very high power efficiencies. With the recent advances in deep learning algorithms, focus has shifted to hardware digital accelerator designs that implement the prevalent matrix-vector multiplication operations. Power in these designs is usually dominated by the memory access power of off-chip DRAM needed for storing the network weights and activations. Emerging dense non-volatile memory technologies can help to provide on-chip memory and analog circuits can be well suited to implement the needed multiplication-vector operations coupled with in-computing memory approaches. This paper presents abrief review of analog designs that implement various machine learning algorithms. It then presents an outlook for the use ofanalog circuits in low-power deep network accelerators suitable for edge or tiny machine learning applications.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
2022 Roadmap on Neuromorphic Computing and Engineering
Authors:
Dennis V. Christensen,
Regina Dittmann,
Bernabé Linares-Barranco,
Abu Sebastian,
Manuel Le Gallo,
Andrea Redaelli,
Stefan Slesazeck,
Thomas Mikolajick,
Sabina Spiga,
Stephan Menzel,
Ilia Valov,
Gianluca Milano,
Carlo Ricciardi,
Shi-Jun Liang,
Feng Miao,
Mario Lanza,
Tyler J. Quill,
Scott T. Keene,
Alberto Salleo,
Julie Grollier,
Danijela Marković,
Alice Mizrahi,
Peng Yao,
J. Joshua Yang,
Giacomo Indiveri
, et al. (34 additional authors not shown)
Abstract:
Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exas…
▽ More
Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices.
The aim of this Roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The Roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges. We hope that this Roadmap will be a useful resource to readers outside this field, for those who are just entering the field, and for those who are well established in the neuromorphic community.
https://doi.org/10.1088/2634-4386/ac4a83
△ Less
Submitted 13 January, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Tree-based machine learning performed in-memory with memristive analog CAM
Authors:
Giacomo Pedretti,
Catherine E. Graves,
Can Li,
Sergey Serebryakov,
Xia Sheng,
Martin Foltin,
Ruibin Mao,
John Paul Strachan
Abstract:
Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, while easier to train, they are difficult to optimize for fast inference without accuracy loss in von Neumann architectures due to non-uniform memo…
▽ More
Tree-based machine learning techniques, such as Decision Trees and Random Forests, are top performers in several domains as they do well with limited training datasets and offer improved interpretability compared to Deep Neural Networks (DNN). However, while easier to train, they are difficult to optimize for fast inference without accuracy loss in von Neumann architectures due to non-uniform memory access patterns. Recently, we proposed a novel analog, or multi-bit, content addressable memory(CAM) for fast look-up table operations. Here, we propose a design utilizing this as a computational primitive for rapid tree-based inference. Large random forest models are mapped to arrays of analog CAMs coupled to traditional analog random access memory (RAM), and the unique features of the analog CAM enable compression and high performance. An optimized architecture is compared with previously proposed tree-based model accelerators, showing improvements in energy to decision by orders of magnitude for common image classification tasks. The results demonstrate the potential for non-volatile analog CAM hardware in accelerating large tree-based machine learning models.
△ Less
Submitted 17 March, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM
Authors:
Aayush Ankit,
Izzat El Hajj,
Sai Rahul Chalamalasetti,
Sapan Agarwal,
Matthew Marinella,
Martin Foltin,
John Paul Strachan,
Dejan Milojicic,
Wen-mei Hwu,
Kaushik Roy
Abstract:
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars a…
▽ More
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our evaluation shows that PANTHER achieves up to $8.02\times$, $54.21\times$, and $103\times$ energy reductions as well as $7.16\times$, $4.02\times$, and $16\times$ execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Thermodynamic Computing
Authors:
Tom Conte,
Erik DeBenedictis,
Natesh Ganesh,
Todd Hylton,
John Paul Strachan,
R. Stanley Williams,
Alexander Alemi,
Lee Altenberg,
Gavin Crooks,
James Crutchfield,
Lidia del Rio,
Josh Deutsch,
Michael DeWeese,
Khari Douglas,
Massimiliano Esposito,
Michael Frank,
Robert Fry,
Peter Harsha,
Mark Hill,
Christopher Kello,
Jeff Krichmar,
Suhas Kumar,
Shih-Chii Liu,
Seth Lloyd,
Matteo Marsili
, et al. (14 additional authors not shown)
Abstract:
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hard…
▽ More
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hardware, devices have become so small that we are struggling to eliminate the effects of thermodynamic fluctuations, which are unavoidable at the nanometer scale. In terms of software, our ability to imagine and program effective computational abstractions and implementations are clearly challenged in complex domains. In terms of systems, currently five percent of the power generated in the US is used to run computing systems - this astonishing figure is neither ecologically sustainable nor economically scalable. Economically, the cost of building next-generation semiconductor fabrication plants has soared past $10 billion. All of these difficulties - device scaling, software complexity, adaptability, energy consumption, and fabrication economics - indicate that the current computing paradigm has matured and that continued improvements along this path will be limited. If technological progress is to continue and corresponding social and economic benefits are to continue to accrue, computing must become much more capable, energy efficient, and affordable. We propose that progress in computing can continue under a united, physically grounded, computational paradigm centered on thermodynamics. Herein we propose a research agenda to extend these thermodynamic foundations into complex, non-equilibrium, self-organizing systems and apply them holistically to future computing systems that will harness nature's innate computational capacity. We call this type of computing "Thermodynamic Computing" or TC.
△ Less
Submitted 14 November, 2019; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Analog content addressable memories with memristors
Authors:
Can Li,
Catherine E. Graves,
Xia Sheng,
Darrin Miller,
Martin Foltin,
Giacomo Pedretti,
John Paul Strachan
Abstract:
A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static-random-access-…
▽ More
A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static-random-access-memory cell in conventional designs, but employ similar schemes based only on binary or ternary states for storage and search.
We propose a new analog content-addressable-memory concept and circuit to overcome these limitations by utilizing the analog conductance tunability of memristors. Our analog content-addressable-memory stores data within the programmable conductance and can take as input either analog or digital search values. Experimental demonstrations, scaled simulations and analysis show that our analog content-addressable-memory can reduce area and power consumption, which enables the acceleration of existing applications, but also new computing application areas.
△ Less
Submitted 7 April, 2020; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Harnessing Intrinsic Noise in Memristor Hopfield Neural Networks for Combinatorial Optimization
Authors:
Fuxi Cai,
Suhas Kumar,
Thomas Van Vaerenbergh,
Rui Liu,
Can Li,
Shimeng Yu,
Qiangfei Xia,
J. Joshua Yang,
Raymond Beausoleil,
Wei Lu,
John Paul Strachan
Abstract:
We describe a hybrid analog-digital computing approach to solve important combinatorial optimization problems that leverages memristors (two-terminal nonvolatile memories). While previous memristor accelerators have had to minimize analog noise effects, we show that our optimization solver harnesses such noise as a computing resource. Here we describe a memristor-Hopfield Neural Network (mem-HNN)…
▽ More
We describe a hybrid analog-digital computing approach to solve important combinatorial optimization problems that leverages memristors (two-terminal nonvolatile memories). While previous memristor accelerators have had to minimize analog noise effects, we show that our optimization solver harnesses such noise as a computing resource. Here we describe a memristor-Hopfield Neural Network (mem-HNN) with massively parallel operations performed in a dense crossbar array. We provide experimental demonstrations solving NP-hard max-cut problems directly in analog crossbar arrays, and supplement this with experimentally-grounded simulations to explore scalability with problem size, providing the success probabilities, time and energy to solution, and interactions with intrinsic analog noise. Compared to fully digital approaches, and present-day quantum and optical accelerators, we forecast the mem-HNN to have over four orders of magnitude higher solution throughput per power consumption. This suggests substantially improved performance and scalability compared to current quantum annealing approaches, while operating at room temperature and taking advantage of existing CMOS technology augmented with emerging analog non-volatile memristors.
△ Less
Submitted 3 April, 2019; v1 submitted 26 March, 2019;
originally announced March 2019.
-
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
Authors:
Aayush Ankit,
Izzat El Hajj,
Sai Rahul Chalamalasetti,
Geoffrey Ndu,
Martin Foltin,
R. Stanley Williams,
Paolo Faraboschi,
Wen-mei Hwu,
John Paul Strachan,
Kaushik Roy,
Dejan S Milojicic
Abstract:
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications.
We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossba…
▽ More
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications.
We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMA's microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability.
We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores.
We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMA's components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446\times$ energy and $66\times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.
△ Less
Submitted 29 January, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Long short-term memory networks in memristor crossbars
Authors:
Can Li,
Zhongrui Wang,
Mingyi Rao,
Daniel Belkin,
Wenhao Song,
Hao Jiang,
Peng Yan,
Yunning Li,
Peng Lin,
Miao Hu,
Ning Ge,
John Paul Strachan,
Mark Barnell,
Qing Wu,
R. Stanley Williams,
J. Joshua Yang,
Qiangfei Xia
Abstract:
Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence. State-of-the-art LSTM models with significantly increased complexity and a large number of parameters, however, have a bottleneck in computing power resulting from limited memory capacity and data communication bandwidth. Here we demonstrate experime…
▽ More
Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence. State-of-the-art LSTM models with significantly increased complexity and a large number of parameters, however, have a bottleneck in computing power resulting from limited memory capacity and data communication bandwidth. Here we demonstrate experimentally that LSTM can be implemented with a memristor crossbar, which has a small circuit footprint to store a large number of parameters and in-memory computing capability that circumvents the 'von Neumann bottleneck'. We illustrate the capability of our system by solving real-world problems in regression and classification, which shows that memristor LSTM is a promising low-power and low-latency hardware platform for edge inference.
△ Less
Submitted 30 May, 2018;
originally announced May 2018.