Skip to main content

Showing 1–40 of 40 results for author: Oliveira, G F

Searching in archive cs. Search in all archives.
.
  1. MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem

    Authors: Melina Soysal, Konstantina Koliogeorgi, Can Firtina, Nika Mansouri Ghiasi, Rakesh Nadig, Haiyu Mao, Geraldo F. Oliveira, Yu Liang, Klea Zambaku, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Raw signal genome analysis (RSGA) has emerged as a promising approach to enable real-time genome analysis by directly analyzing raw electrical signals. However, rapid advancements in sequencing technologies make it increasingly difficult for software-based RSGA to match the throughput of raw signal generation. This paper demonstrates that while hardware acceleration techniques can significantly ac… ▽ More

    Submitted 3 July, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2504.01948  [pdf, other

    cs.AR cs.DB cs.DC

    PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System

    Authors: Manos Frouzakis, Juan Gómez-Luna, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Database Management Systems (DBMSs) are crucial for efficient data management and analytics, and are used in several different application domains. Due to the increasing volume of data a DBMS deals with, current processor-centric architectures (e.g., CPUs, GPUs) suffer from data movement bottlenecks when executing key DBMS operations (e.g., selection, aggregation, ordering, and join). This happens… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  3. arXiv:2502.13075  [pdf, other

    cs.AR cs.CR

    Variable Read Disturbance: An Experimental Analysis of Temporal Variation in DRAM Read Disturbance

    Authors: Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yuksel, Oguzhan Canpolat, Haocong Luo, Geraldo F. Oliveira, A. Giray Yaglikci, Minesh Patel, Onur Mutlu

    Abstract: Modern DRAM chips are subject to read disturbance errors. State-of-the-art read disturbance mitigations rely on accurate and exhaustive characterization of the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first RowHammer or RowPress bitflip) of every DRAM row (of which there are millions or billions in a modern system) to prevent read disturb… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: Extended version of our publication at the 31st IEEE International Symposium on High-Performance Computer Architecture (HPCA-31), 2025

  4. arXiv:2502.12650  [pdf, other

    cs.CR cs.AR

    Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance

    Authors: Oğuzhan Canpolat, A. Giray Yağlıkçı, Geraldo F. Oliveira, Ataberk Olgun, Nisa Bostancı, İsmail Emir Yüksel, Haocong Luo, Oğuz Ergin, Onur Mutlu

    Abstract: We 1) present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method, Per Row Activation Counting (PRAC) and 2) propose Chronus, a new mechanism that addresses PRAC's two major weaknesses. Our analysis shows that PRAC's system performance overhead on benign applications is non-negligible for modern DRAM chips and p… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: To appear in HPCA'25. arXiv admin note: text overlap with arXiv:2406.19094

  5. arXiv:2501.17466  [pdf, ps, other

    cs.AR cs.DC

    Proteus: Enabling High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic

    Authors: Geraldo F. Oliveira, Mayank Kabra, Yuxin Guo, Kangqi Chen, A. Giray Yağlıkçı, Melina Soysal, Mohammad Sadrosadati, Joaquin Olivares Bueno, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

    Abstract: Processing-using-DRAM (PUD) is a paradigm where the analog operational properties of DRAM are used to perform bulk logic operations. While PUD promises high throughput at low energy and area cost, we uncover three limitations of existing PUD approaches that lead to significant inefficiencies: (i) static data representation, i.e., two's complement with fixed bit-precision, leading to unnecessary co… ▽ More

    Submitted 12 June, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  6. arXiv:2412.19275  [pdf, other

    cs.AR cs.DC

    Memory-Centric Computing: Recent Advances in Processing-in-DRAM

    Authors: Onur Mutlu, Ataberk Olgun, Geraldo F. Oliveira, Ismail Emir Yuksel

    Abstract: Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by 1) fundamentally avoiding data movement, 2) reducing data access latency & energy, and 3) exploiting large parallelism of memory arrays. Many recent studies show… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: This paper is an extended version of an IEDM 2024 Invited Paper in the AI Memory focus session

  7. arXiv:2406.19094  [pdf, other

    cs.CR cs.AR

    Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance

    Authors: Oğuzhan Canpolat, A. Giray Yağlıkçı, Geraldo F. Oliveira, Ataberk Olgun, Oğuz Ergin, Onur Mutlu

    Abstract: We present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method, Per Row Activation Counting (PRAC), described in JEDEC DDR5 specification's April 2024 update. Unlike prior state-of-the-art that advises the memory controller to periodically issue refresh management (RFM) commands, which provides the DRAM chip wit… ▽ More

    Submitted 8 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in DRAMSec 2024

  8. arXiv:2405.06081  [pdf, other

    cs.AR cs.DC

    Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Geraldo F. Oliveira, A. Giray Yaglikci, Ataberk Olgun, Melina Soysal, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: We experimentally analyze the computational capability of commercial off-the-shelf (COTS) DRAM chips and the robustness of these capabilities under various timing delays between DRAM commands, data patterns, temperature, and voltage levels. We extensively characterize 120 COTS DDR4 chips from two major manufacturers. We highlight four key results of our study. First, COTS DRAM chips are capable of… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in DSN 2024

  9. arXiv:2403.04539  [pdf, other

    cs.AR

    PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures

    Authors: Geraldo F. Oliveira, Emanuele G. Esposito, Juan Gómez-Luna, Onur Mutlu

    Abstract: Processing-using-DRAM (PUD) architectures impose a restrictive data layout and alignment for their operands, where source and destination operands (i) must reside in the same DRAM subarray (i.e., a group of DRAM rows sharing the same row buffer and row decoder) and (ii) are aligned to the boundaries of a DRAM row. However, standard memory allocation routines (i.e., malloc, posix_memalign, and huge… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  10. arXiv:2402.19080  [pdf, other

    cs.AR cs.DC

    MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing

    Authors: Geraldo F. Oliveira, Ataberk Olgun, Abdullah Giray Yağlıkçı, F. Nisa Bostancı, Juan Gómez-Luna, Saugata Ghose, Onur Mutlu

    Abstract: Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD paralleli… ▽ More

    Submitted 3 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Extended version of HPCA 2024 paper. arXiv admin note: text overlap with arXiv:2109.05881 by other authors

  11. arXiv:2402.18736  [pdf, other

    cs.AR cs.DC

    Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, Ataberk Olgun, F. Nisa Bostanci, A. Giray Yaglikci, Geraldo F. Oliveira, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Processing-using-DRAM (PuD) is an emerging paradigm that leverages the analog operational properties of DRAM circuitry to enable massively parallel in-DRAM computation. PuD has the potential to reduce or eliminate costly data movement between processing elements and main memory. Prior works experimentally demonstrate three-input MAJ (MAJ3) and two-input AND and OR operations in commercial off-the-… ▽ More

    Submitted 21 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: A shorter version of this work is to appear at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024

  12. arXiv:2402.18652  [pdf, other

    cs.CR cs.AR

    Spatial Variation-Aware Read Disturbance Defenses: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions

    Authors: Abdullah Giray Yağlıkçı, Yahya Can Tuğrul, Geraldo F. Oliveira, İsmail Emir Yüksel, Ataberk Olgun, Haocong Luo, Onur Mutlu

    Abstract: Read disturbance in modern DRAM chips is a widespread phenomenon and is reliably used for breaking memory isolation, a fundamental building block for building robust systems. RowHammer and RowPress are two examples of read disturbance in DRAM where repeatedly accessing (hammering) or keeping active (pressing) a memory location induces bitflips in other memory locations. Unfortunately, shrinking te… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: A shorter version of this work is to appear at the 30th IEEE International Symposium on High-Performance Computer Architecture (HPCA-30), 2024

  13. arXiv:2312.02880  [pdf, other

    cs.AR cs.DC

    PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips

    Authors: Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Abdullah Giray Yaglikci, Ataberk Olgun, Geraldo F. Oliveira, Melina Soysal, Haocong Luo, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1)… ▽ More

    Submitted 18 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2310.10168  [pdf, other

    cs.AR

    DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures

    Authors: Geraldo F. Oliveira, Alain Kohli, David Novo, Ataberk Olgun, A. Giray Yaglikci, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

    Abstract: The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM… ▽ More

    Submitted 22 April, 2025; v1 submitted 16 October, 2023; originally announced October 2023.

  15. arXiv:2310.09977  [pdf, other

    cs.CR cs.AR

    ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation

    Authors: Ataberk Olgun, Yahya Can Tugrul, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Steve Rhyner, Abdullah Giray Yaglikci, Geraldo F. Oliveira, Onur Mutlu

    Abstract: We introduce ABACuS, a new low-cost hardware-counter-based RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. We observe that both benign workloads and RowHammer attacks tend to access DRAM rows with the same row address in multiple DRAM banks at around the same time. Based on this observation, ABACuS's key idea is to use… ▽ More

    Submitted 2 May, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: To appear in USENIX Security '24

  16. arXiv:2304.01951  [pdf, other

    cs.MS cs.AR cs.DC cs.LG

    TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

    Authors: Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures s… ▽ More

    Submitted 5 September, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Our open-source software is available at https://github.com/CMU-SAFARI/transpimlib

  17. arXiv:2212.06292  [pdf, other

    cs.AR cs.DC

    ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems

    Authors: Nika Mansouri Ghiasi, Nandita Vijaykumar, Geraldo F. Oliveira, Lois Orosa, Ivan Fernandez, Mohammad Sadrosadati, Konstantinos Kanellopoulos, Nastaran Hajinazar, Juan Gómez Luna, Onur Mutlu

    Abstract: Partitioning applications between NDP and host CPU cores causes inter-segment data movement overhead, which is caused by moving data generated from one segment (e.g., instructions, functions) and used in consecutive segments. Prior works take two approaches to this problem. The first class of works maps segments to NDP or host cores based on the properties of each segment, neglecting the inter-seg… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: To appear in IEEE TETC

  18. arXiv:2210.08508  [pdf, ps, other

    cs.AR cs.DC

    RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Geraldo F. Oliveira, Konstantinos Kanellopoulos, Rachata Ausavarungnirun, Juan Gómez Luna, João Ferreira, Jeremie S. Kim, Christina Giannoula, Nandita Vijaykumar, Jisung Park, Onur Mutlu

    Abstract: Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip, allowing for fine-grained connections between layers and significantly alleviating main memory bottlenecks. We show for a variety of workloads, on a state-of-the-art M3D-based system, that the performance and energy bottlenecks shift from main memory to the processor… ▽ More

    Submitted 8 June, 2025; v1 submitted 16 October, 2022; originally announced October 2022.

  19. arXiv:2209.08938  [pdf, other

    cs.AR cs.DC cs.LG

    Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

    Authors: Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

    Abstract: Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches l… ▽ More

    Submitted 27 March, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.14320

  20. arXiv:2209.05566  [pdf, other

    cs.AR cs.DC

    Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory

    Authors: Jisung Park, Roknoddin Azizi, Geraldo F. Oliveira, Mohammad Sadrosadati, Rakesh Nadig, David Novo, Juan Gómez-Luna, Myungsuk Kim, Onur Mutlu

    Abstract: Bulk bitwise operations, i.e., bitwise operations on large bit vectors, are prevalent in a wide range of important application domains, including databases, graph processing, genome analysis, cryptography, and hyper-dimensional computing. In conventional systems, the performance and energy efficiency of bulk bitwise operations are bottlenecked by data movement between the compute units and the mem… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

  21. arXiv:2207.13795  [pdf, other

    cs.AR

    Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

    Authors: Ataberk Olgun, F. Nisa Bostanci, Geraldo F. Oliveira, Yahya Can Tugrul, Rahul Bera, A. Giray Yaglikci, Hasan Hassan, Oguz Ergin, Onur Mutlu

    Abstract: We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfers and DRAM row activation. Sectored DRAM leverages two key ideas to enable fine-grained data transfers and row activation at low chip area cost. First, a cache block transfer between main memory and the memory controller happens in a fixed number of clock cycles where… ▽ More

    Submitted 9 June, 2024; v1 submitted 27 July, 2022; originally announced July 2022.

    Comments: Extended version of paper that is to appear in ACM Transactions on Architecture and Code Optimization (ACM TACO)

  22. arXiv:2207.07886  [pdf, other

    cs.AR cs.AI cs.DC cs.LG

    An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

    Authors: Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu

    Abstract: Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e.,… ▽ More

    Submitted 5 September, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

    Comments: Our open-source software is available at https://github.com/CMU-SAFARI/pim-ml

  23. arXiv:2206.06022  [pdf, other

    cs.AR cs.LG

    Machine Learning Training on a Real Processing-in-Memory System

    Authors: Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu

    Abstract: Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., comp… ▽ More

    Submitted 3 August, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: This extended abstract appears as an invited paper at the 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

  24. arXiv:2205.14664  [pdf, other

    cs.AR cs.AI cs.DB cs.DC cs.LG

    Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

    Authors: Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

    Abstract: Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major bottleneck for system performance and energy consumption. One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging a… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

  25. arXiv:2205.14647  [pdf, other

    cs.AR cs.DC cs.PF

    Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures

    Authors: Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Onur Mutlu

    Abstract: The increasing prevalence and growing size of data in modern applications have led to high costs for computation in traditional processor-centric computing systems. Moving large volumes of data between memory devices (e.g., DRAM) and computing elements (e.g., CPUs, GPUs) across bandwidth-limited memory channels can consume more than 60% of the total energy in modern systems. To mitigate these cost… ▽ More

    Submitted 31 May, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.11890

  26. arXiv:2204.11275  [pdf, other

    cs.AR cs.DB

    Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation

    Authors: Amirali Boroumand, Saugata Ghose, Geraldo F. Oliveira, Onur Mutlu

    Abstract: A growth in data volume, combined with increasing demand for real-time analysis (using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across separate single-purpo… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Accepted to ICDE 2022. arXiv admin note: substantial text overlap with arXiv:2103.00798

  27. arXiv:2112.14216  [pdf, other

    cs.AR

    Casper: Accelerating Stencil Computation using Near-cache Processing

    Authors: Alain Denzler, Rahul Bera, Nastaran Hajinazar, Gagandeep Singh, Geraldo F. Oliveira, Juan Gómez-Luna, Onur Mutlu

    Abstract: Stencil computation is one of the most used kernels in a wide variety of scientific applications, ranging from large-scale weather prediction to solving partial differential equations. Stencil computations are characterized by three unique properties: (1) low arithmetic intensity, (2) limited temporal data reuse, and (3) regular and predictable data access pattern. As a result, stencil computation… ▽ More

    Submitted 5 September, 2023; v1 submitted 28 December, 2021; originally announced December 2021.

    ACM Class: C.3

  28. arXiv:2111.02325  [pdf, other

    cs.AR cs.PF

    Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

    Authors: Geraldo F. Oliveira, Saugata Ghose, Juan Gómez-Luna, Amirali Boroumand, Alexis Savery, Sonny Rao, Salman Qazi, Gwendal Grignou, Rahul Thakur, Eric Shiu, Onur Mutlu

    Abstract: The number and diversity of consumer devices are growing rapidly, alongside their target applications' memory consumption. Unfortunately, DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity o… ▽ More

    Submitted 19 September, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: This paper has been accepted by IEEE Access

  29. arXiv:2110.01709  [pdf, other

    cs.AR cs.DC cs.PF

    Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

    Authors: Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

    Abstract: Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads… ▽ More

    Submitted 3 April, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Invited paper to appear at Workshop on Computing with Unconventional Technologies (CUT) 2021 https://sites.google.com/umn.edu/cut-2021/home. arXiv admin note: substantial text overlap with arXiv:2105.03814

  30. arXiv:2109.14320  [pdf, other

    cs.AR cs.LG

    Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: Emerging edge computing platforms often contain machine learning (ML) accelerators that can accelerate inference for a wide range of neural network (NN) models. These models are designed to fit within the limited area and energy constraints of the edge computing platforms, each targeting various applications (e.g., face detection, speech recognition, translation, image captioning, video analytics)… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: This work appears at the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT 2021). arXiv admin note: substantial text overlap with arXiv:2103.00768

  31. arXiv:2109.12697  [pdf, other

    cs.AR

    HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes

    Authors: Minesh Patel, Geraldo F. Oliveira, Onur Mutlu

    Abstract: State-of-the-art techniques for addressing scaling-related main memory errors identify and repair bits that are at risk of error from within the memory controller. Unfortunately, modern main memory chips internally use on-die error correcting codes (on-die ECC) that obfuscate the memory controller's view of errors, complicating the process of identifying at-risk bits (i.e., error profiling). To un… ▽ More

    Submitted 18 December, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

    Comments: This work is to appear at the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021)

  32. arXiv:2105.12839  [pdf, other

    cs.AR cs.DC

    SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM

    Authors: Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez Luna, Onur Mutlu

    Abstract: Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex ope… ▽ More

    Submitted 30 June, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: This is an extended version of the paper that appeared at ASPLOS 2021

  33. arXiv:2105.03814  [pdf, other

    cs.AR cs.DC cs.PF

    Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture

    Authors: Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, Onur Mutlu

    Abstract: Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bo… ▽ More

    Submitted 4 May, 2022; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Our open source software is available at https://github.com/CMU-SAFARI/prim-benchmarks

  34. arXiv:2105.03725  [pdf, other

    cs.AR cs.DC cs.PF

    DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

    Authors: Geraldo F. Oliveira, Juan Gómez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu

    Abstract: Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data… ▽ More

    Submitted 6 April, 2023; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Our open source software is available at https://github.com/CMU-SAFARI/DAMOV

  35. pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

    Authors: João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, Onur Mutlu

    Abstract: Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory… ▽ More

    Submitted 23 January, 2025; v1 submitted 15 April, 2021; originally announced April 2021.

    ACM Class: B.3.1; C.1.3

    Journal ref: IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, 900-919

  36. arXiv:2103.00798  [pdf, other

    cs.AR cs.DB

    Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design

    Authors: Amirali Boroumand, Saugata Ghose, Geraldo F. Oliveira, Onur Mutlu

    Abstract: An exponential growth in data volume, combined with increasing demand for real-time analysis (i.e., using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across se… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  37. arXiv:2103.00768  [pdf, other

    cs.AR cs.LG

    Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

    Authors: Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

    Abstract: As the need for edge computing grows, many modern consumer devices now contain edge machine learning (ML) accelerators that can compute a wide range of neural network (NN) models while still fitting within tight resource constraints. We analyze a commercial Edge TPU using 24 Google edge NN models (including CNNs, LSTMs, transducers, and RCNNs), and find that the accelerator suffers from three shor… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

  38. arXiv:2012.11890  [pdf, ps, other

    cs.AR cs.DC cs.ET

    SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

    Authors: Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

    Abstract: Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that enables massively-parallel computation of a wide ra… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: Extended abstract of the full paper to appear in ASPLOS 2021

  39. arXiv:2012.03112  [pdf, other

    cs.AR cs.DC

    A Modern Primer on Processing in Memory

    Authors: Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun, Mohammad Sadrosadati, Geraldo F. Oliveira

    Abstract: This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or modules, in the logic layer of 3D-stacked memory, in the memory controllers, in storage devices or chips), so that data movement between the computation units a… ▽ More

    Submitted 6 February, 2025; v1 submitted 5 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1903.03988

  40. arXiv:2005.09748  [pdf, other

    cs.AR

    The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

    Authors: Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo Francisco de Oliveira Jr., Jonathan Appavoo, Vivek Seshadri, Onur Mutlu

    Abstract: Computers continue to diversify with respect to system designs, emerging memory technologies, and application memory demands. Unfortunately, continually adapting the conventional virtual memory framework to each possible system configuration is challenging, and often results in performance loss or requires non-trivial workarounds. To address these challenges, we propose a new virtual memory framew… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.