Search | arXiv e-print repository

Heating Up Quasi-Monte Carlo Graph Random Features: A Diffusion Kernel Perspective

Abstract: We build upon a recently introduced class of quasi-graph random features (q-GRFs), which have demonstrated the ability to yield lower variance estimators of the 2-regularized Laplacian kernel (Choromanski 2023). Our research investigates whether similar results can be achieved with alternative kernel functions, specifically the Diffusion (or Heat), Matérn, and Inverse Cosine kernels. We find that… ▽ More We build upon a recently introduced class of quasi-graph random features (q-GRFs), which have demonstrated the ability to yield lower variance estimators of the 2-regularized Laplacian kernel (Choromanski 2023). Our research investigates whether similar results can be achieved with alternative kernel functions, specifically the Diffusion (or Heat), Matérn, and Inverse Cosine kernels. We find that the Diffusion kernel performs most similarly to the 2-regularized Laplacian, and we further explore graph types that benefit from the previously established antithetic termination procedure. Specifically, we explore Erdős-Rényi and Barabási-Albert random graph models, Binary Trees, and Ladder graphs, with the goal of identifying combinations of specific kernel and graph type that benefit from antithetic termination. We assert that q-GRFs achieve lower variance estimators of the Diffusion (or Heat) kernel on Ladder graphs. However, the number of rungs on the Ladder graphs impacts the algorithm's performance; further theoretical results supporting our experimentation are forthcoming. This work builds upon some of the earliest Quasi-Monte Carlo methods for kernels defined on combinatorial objects, paving the way for kernel-based learning algorithms and future real-world applications in various domains. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 18 pages, 16 figures

arXiv:2409.19071 [pdf, other]

Analog fast Fourier transforms for scalable and efficient signal processing

Authors: T. Patrick Xiao, Ben Feinberg, David K. Richardson, Matthew Cannon, Harsha Medu, Vineet Agrawal, Matthew J. Marinella, Sapan Agarwal, Christopher H. Bennett

Abstract: Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing $\unicode{x2013}$ such as by artificial intelligence (AI) algorithms $\unicode{x2013}$ and for transmission over communication networks. Analog in-memory computing has been sho… ▽ More Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing $\unicode{x2013}$ such as by artificial intelligence (AI) algorithms $\unicode{x2013}$ and for transmission over communication networks. Analog in-memory computing has been shown to be a fast and energy-efficient solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, using a large-scale charge-trapping memory array with precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is $>$1000$\times$ larger than any previous analog DFT demonstration. The analog FFT also provides more numerically precise DFTs with greater tolerance to device and circuit non-idealities than a direct matrix-vector multiplication approach. We show that the extension of the FFT algorithm to analog in-memory processors leads to design considerations that differ markedly from digital implementations, and that analog Fourier transforms have a substantial power efficiency advantage at all size scales over FFTs implemented on state-of-the-art digital hardware. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2403.06938 [pdf, other]

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

Authors: Ryan Wong, Nikita Kim, Kevin Higgs, Sapan Agarwal, Engin Ipek, Saugata Ghose, Ben Feinberg

Abstract: As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent p… ▽ More As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2109.01262 [pdf, other]

doi 10.1109/MCAS.2022.3214409

On the Accuracy of Analog Neural Network Inference Accelerators

Authors: T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett, Venkatraman Prabhakar, Prashant Saxena, Vineet Agrawal, Sapan Agarwal, Matthew J. Marinella

Abstract: Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy ef… ▽ More Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient. △ Less

Submitted 3 February, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: Changes in v3: modified definition of state-independent error (factor of 2) for fairer comparison to state-proportional. Added more results on INT4 network

Journal ref: IEEE Circuits and Systems Magazine, vol. 22, no. 4, pp. 26-48, 2022

arXiv:2004.00802 [pdf]

Device-aware inference operations in SONOS nonvolatile memory arrays

Authors: Christopher H. Bennett, T. Patrick Xiao, Ryan Dellana, Vineet Agrawal, Ben Feinberg, Venkatraman Prabhakar, Krishnaswamy Ramkumar, Long Hinh, Swatilekha Saha, Vijay Raghavan, Ramesh Chettuvetty, Sapan Agarwal, Matthew J. Marinella

Abstract: Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, o… ▽ More Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: To be presented at IEEE International Physics Reliability Symposium (IRPS) 2020

arXiv:2003.10396 [pdf, other]

Evaluating complexity and resilience trade-offs in emerging memory inference machines

Authors: Christopher H. Bennett, Ryan Dellana, T. Patrick Xiao, Ben Feinberg, Sapan Agarwal, Suma Cardwell, Matthew J. Marinella, William Severa, Brad Aimone

Abstract: Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our w… ▽ More Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our work proposes a middle path towards high performance and strong resilience utilizing the Mosaics framework, and specifically by re-using synaptic connections in a recurrent neural network implementation that possesses a natural form of noise-immunity. △ Less

Submitted 25 February, 2020; originally announced March 2020.

arXiv:1205.2671 [pdf, other]

Fundamental Physics at the Intensity Frontier

Authors: J. L. Hewett, H. Weerts, R. Brock, J. N. Butler, B. C. K. Casey, J. Collar, A. de Gouvea, R. Essig, Y. Grossman, W. Haxton, J. A. Jaros, C. K. Jung, Z. T. Lu, K. Pitts, Z. Ligeti, J. R. Patterson, M. Ramsey-Musolf, J. L. Ritchie, A. Roodman, K. Scholberg, C. E. M. Wagner, G. P. Zeller, S. Aefsky, A. Afanasev, K. Agashe , et al. (443 additional authors not shown)

Abstract: The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms. The Proceedings of the 2011 workshop on Fundamental Physics at the Intensity Frontier. Science opportunities at the intensity frontier are identified and described in the areas of heavy quarks, charged leptons, neutrinos, proton decay, new light weakly-coupled particles, and nucleons, nuclei, and atoms. △ Less

Submitted 11 May, 2012; originally announced May 2012.

Comments: 229 pages

Report number: ANL-HEP-TR-12-25, SLAC-R-991

Showing 1–7 of 7 results for author: Feinberg, B