Skip to main content

Showing 1–19 of 19 results for author: Keckler, S W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.18403  [pdf, other

    cs.AR cs.DC

    Kitsune: Enabling Dataflow Execution on GPUs

    Authors: Michael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W. Keckler

    Abstract: State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a bulk-synchronous execution model which has many drawbacks and is ill-suited for the graph structure of DL applications. Many industry and academic works attempt to overcome these by employing vertical… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  2. arXiv:2502.17780  [pdf

    cs.DC eess.SY

    GPUArmor: A Hardware-Software Co-design for Efficient and Scalable Memory Safety on GPUs

    Authors: Mohamed Tarek Ibn Ziad, Sana Damani, Mark Stephenson, Stephen W. Keckler, Aamer Jaleel

    Abstract: Memory safety errors continue to pose a significant threat to current computing systems, and graphics processing units (GPUs) are no exception. A prominent class of memory safety algorithms is allocation-based solutions. The key idea is to maintain each allocation's metadata (base address and size) in a disjoint table and retrieve it at runtime to verify memory accesses. While several previous sol… ▽ More

    Submitted 25 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: arXiv version of submission

  3. arXiv:2406.13868  [pdf, other

    cs.LG cs.AI

    SDQ: Sparse Decomposed Quantization for LLM Inference

    Authors: Geonhwa Jeong, Po-An Tsai, Stephen W. Keckler, Tushar Krishna

    Abstract: Recently, large language models (LLMs) have shown surprising performance in task-specific workloads as well as general tasks with the given prompts. However, to achieve unprecedented performance, recent LLMs use billions to trillions of parameters, which hinder the wide adaptation of those models due to their extremely large compute and memory requirements. To resolve the issue, various model comp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  4. arXiv:2404.16256  [pdf, other

    cs.CR cs.AR

    Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation

    Authors: Aamer Jaleel, Stephen W. Keckler, Gururaj Saileshwar

    Abstract: This paper focuses on mitigating DRAM Rowhammer attacks. In recent years, solutions like TRR have been deployed in DDR4 DRAM to track aggressor rows and then issue a mitigative action by refreshing neighboring victim rows. Unfortunately, such in-DRAM solutions are resource-constrained (only able to provision few tens of counters to track aggressor rows) and are prone to thrashing based attacks, th… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  5. arXiv:2403.07953  [pdf, other

    cs.LG cs.AI cs.AR

    Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

    Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

    Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be acceler… ▽ More

    Submitted 24 May, 2025; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: This paper is accepted to MLSys 2025

  6. arXiv:2310.07854  [pdf, other

    cs.RO

    VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning

    Authors: Yu-Shun Hsiao, Siva Kumar Sastry Hari, Balakumar Sundaralingam, Jason Yik, Thierry Tambe, Charbel Sakr, Stephen W. Keckler, Vijay Janapa Reddi

    Abstract: High-dimensional motion generation requires numerical precision for smooth, collision-free solutions. Typically, double-precision or single-precision floating-point (FP) formats are utilized. Using these for big tensors imposes a strain on the memory bandwidth provided by the devices and alters the memory footprint, hence limiting their applicability to low-power edge devices needed for mobile rob… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 7 pages, 5 figures, 8 tables, to be published in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  7. arXiv:2212.02687  [pdf, other

    cs.CV cs.AR

    Vision Transformer Computation and Resilience for Dynamic Inference

    Authors: Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, Mark Horowitz

    Abstract: State-of-the-art deep learning models for computer vision tasks are based on the transformer architecture and often deployed in real-time applications. In this scenario, the resources available for every inference can vary, so it is useful to be able to dynamically adapt execution to trade accuracy for efficiency. To create dynamic models, we leverage the resilience of vision transformers to pruni… ▽ More

    Submitted 15 April, 2024; v1 submitted 5 December, 2022; originally announced December 2022.

    Journal ref: 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  8. arXiv:2205.03347  [pdf, other

    cs.AI cs.RO

    Zhuyi: Perception Processing Rate Estimation for Safety in Autonomous Vehicles

    Authors: Yu-Shun Hsiao, Siva Kumar Sastry Hari, MichaƂ Filipiuk, Timothy Tsai, Michael B. Sullivan, Vijay Janapa Reddi, Vasu Singh, Stephen W. Keckler

    Abstract: The processing requirement of autonomous vehicles (AVs) for high-accuracy perception in complex scenarios can exceed the resources offered by the in-vehicle computer, degrading safety and comfort. This paper proposes a sensor frame processing rate (FPR) estimation model, Zhuyi, that quantifies the minimum safe FPR continuously in a driving scenario. Zhuyi can be employed post-deployment as an onli… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: 2022 Design Automation Conference (DAC), July 10-14, 2022, San Francisco

  9. arXiv:2104.02188  [pdf, other

    cs.AR cs.DC cs.LG

    GPU Domain Specialization via Composable On-Package Architecture

    Authors: Yaosheng Fu, Evgeny Bolotin, Niladrish Chatterjee, David Nellans, Stephen W. Keckler

    Abstract: As GPUs scale their low precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that converged GPU design trying to address diverging architectural requirements between FP32 (or larger) based HPC and FP16 (or smaller) based DL workloads results in sub-optimal configuration for either of… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  10. arXiv:2103.07403  [pdf, other

    cs.RO cs.AI eess.SY

    Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles

    Authors: Zahra Ghodsi, Siva Kumar Sastry Hari, Iuri Frosio, Timothy Tsai, Alejandro Troccoli, Stephen W. Keckler, Siddharth Garg, Anima Anandkumar

    Abstract: Extracting interesting scenarios from real-world data as well as generating failure cases is important for the development and testing of autonomous systems. We propose efficient mechanisms to both characterize and generate testing scenarios using a state-of-the-art driving simulator. For any scenario, our method generates a set of possible driving paths and identifies all the possible safe drivin… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  11. arXiv:2006.04984  [pdf, other

    cs.DC cs.LG

    Making Convolutions Resilient via Algorithm-Based Error Detection Techniques

    Authors: Siva Kumar Sastry Hari, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler

    Abstract: The ability of Convolutional Neural Networks (CNNs) to accurately process real-time telemetry has boosted their use in safety-critical and high-performance computing systems. As such systems require high levels of resilience to errors, CNNs must execute correctly in the presence of hardware faults. Full duplication provides the needed assurance but incurs a prohibitive 100% overhead. Algorithmic t… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  12. arXiv:2005.01445  [pdf, other

    cs.DC cs.AR

    Estimating Silent Data Corruption Rates Using a Two-Level Model

    Authors: Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler

    Abstract: High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full application… ▽ More

    Submitted 27 April, 2020; originally announced May 2020.

  13. arXiv:2002.09786  [pdf, other

    cs.LG cs.CV stat.ML

    HarDNN: Feature Map Vulnerability Evaluation in CNNs

    Authors: Abdulrahman Mahmoud, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, Stephen W. Keckler

    Abstract: As Convolutional Neural Networks (CNNs) are increasingly being employed in safety-critical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in software-manifested errors which can adversely affect high-level decision making. This paper presents HarDNN, a software-directed ap… ▽ More

    Submitted 25 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: 14 pages, 5 figures, a short version accepted for publication in First Workshop on Secure and Resilient Autonomy (SARA) co-located with MLSys2020

  14. arXiv:1907.01051  [pdf, other

    cs.LG cs.SE stat.ML

    ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection

    Authors: Saurabh Jha, Subho S. Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B. Sullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, Ravishankar K. Iyer

    Abstract: The safety and resilience of fully autonomous vehicles (AVs) are of significant concern, as exemplified by several headline-making accidents. While AV development today involves verification, validation, and testing, end-to-end assessment of AV systems under accidental faults in realistic driving scenarios has been largely unexplored. This paper presents DriveFI, a machine learning-based fault inj… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Accepted at 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

  15. arXiv:1907.01024  [pdf, other

    cs.SE

    Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors

    Authors: Saurabh Jha, Timothy Tsai, Siva Hari, Michael Sullivan, Zbigniew Kalbarczyk, Stephen W. Keckler, Ravishankar K. Iyer

    Abstract: Fully autonomous vehicles (AVs), i.e., AVs with autonomy level 5, are expected to dominate road transportation in the near-future and contribute trillions of dollars to the global economy. The general public, government organizations, and manufacturers all have significant concern regarding resiliency and safety standards of the autonomous driving system (ADS) of AVs . In this work, we proposed an… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Presented at Automotive Reliability and Testing (ART) 2018 colocated with International Testing Conference

  16. arXiv:1806.00512  [pdf, other

    cs.LG cs.CL stat.ML

    Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

    Authors: Maohua Zhu, Jason Clemons, Jeff Pool, Minsoo Rhu, Stephen W. Keckler, Yuan Xie

    Abstract: Exploiting sparsity enables hardware systems to run neural networks faster and more energy-efficiently. However, most prior sparsity-centric optimization techniques only accelerate the forward pass of neural networks and usually require an even longer training process with iterative pruning and retraining. We observe that artificially inducing sparsity in the gradients of the gates in an LSTM cell… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  17. arXiv:1708.04485  [pdf, other

    cs.NE cs.AR cs.LG

    SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

    Authors: Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally

    Abstract: Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs in a wide range of situations, especially mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improve… ▽ More

    Submitted 23 May, 2017; originally announced August 2017.

  18. arXiv:1705.01626  [pdf, other

    cs.LG cs.AR

    Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

    Authors: Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Stephen W. Keckler

    Abstract: Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant perfor… ▽ More

    Submitted 3 May, 2017; originally announced May 2017.

  19. arXiv:1602.08124  [pdf, other

    cs.DC cs.LG cs.NE

    vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

    Authors: Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, Stephen W. Keckler

    Abstract: The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We prop… ▽ More

    Submitted 28 July, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

    Comments: Published as a conference paper at the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO-49), 2016