Skip to main content

Showing 1–22 of 22 results for author: Esmaeilzadeh, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01374  [pdf, ps, other

    cs.LG cs.AI cs.PL

    Compiler Optimization via LLM Reasoning for Efficient Model Serving

    Authors: Sujun Tang, Christopher Priebe, Rohan Mahapatra, Lianhui Qin, Hadi Esmaeilzadeh

    Abstract: While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven substantial performance improvements, but existing compilers struggle with neural workloads due to the exponentially large and highly interdependent space of possible… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2310.17912  [pdf

    cs.DC

    Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators

    Authors: Sean Kinzer, Soroush Ghodrati, Rohan Mahapatra, Byung Hoon Ahn, Edwin Mascarenhas, Xiaolong Li, Janarbek Matai, Liang Zhang, Hadi Esmaeilzadeh

    Abstract: Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more mi… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  3. arXiv:2308.12120  [pdf, other

    cs.LG cs.AR

    An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

    Authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng

    Abstract: Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines backend power, performance, and area (PPA) analysis wi… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: This is an extended version of our work titled "Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms" published in MLCAD 2022

  4. arXiv:2306.16767  [pdf, other

    cs.AR cs.LG

    Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations

    Authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Sean Kinzer, Susmita Dey Manasi, Sachin S. Sapatnekar, Zhiang Wang

    Abstract: Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support fo… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Journal ref: ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 30, Issue 1, Article No.: 3, Pages 1 - 34, Oct. 2024

  5. arXiv:2303.03483  [pdf

    cs.AR

    In-Storage Domain-Specific Acceleration for Serverless Computing

    Authors: Rohan Mahapatra, Soroush Ghodrati, Byung Hoon Ahn, Sean Kinzer, Shu-ting Wang, Hanyang Xu, Lavanya Karthikeyan, Hardik Sharma, Amir Yazdanbakhsh, Mohammad Alian, Hadi Esmaeilzadeh

    Abstract: While (1) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (2) storage dissaggregation in the system infrastructure level and (3) integration of domain-specific accelerators in the hardware level. Each of these three trends individually provide significant benefits; however, when combined the benefits diminish. Specifically, the pa… ▽ More

    Submitted 23 March, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

  6. arXiv:2204.03227  [pdf, other

    cs.CL cs.AR cs.LG

    Accelerating Attention through Gradient-Based Learned Runtime Pruning

    Authors: Zheng Li, Soroush Ghodrati, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, Mingu Kang

    Abstract: Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural Language Processing models. This attention mechanism calculates a correlation score for each word with respect to the other words in a sentence. Commonly, only a small subset of words highly correlates with the word under attention, which is only determined at runtime. As such, a significant amount of co… ▽ More

    Submitted 14 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: First three authors contributed equally; published at ISCA 2022

  7. arXiv:2105.07879  [pdf

    cs.AI cs.CL cs.CY

    Conscious AI

    Authors: Hadi Esmaeilzadeh, Reza Vaezi

    Abstract: Recent advances in artificial intelligence (AI) have achieved human-scale speed and accuracy for classification tasks. In turn, these capabilities have made AI a viable replacement for many human activities that at their core involve classification, such as basic mechanical and analytical tasks in low-level service jobs. Current systems do not need to be conscious to recognize patterns and classif… ▽ More

    Submitted 20 May, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

  8. arXiv:2004.12254  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy in Deep Learning: A Survey

    Authors: Fatemehsadat Mireshghallah, Mohammadkazem Taram, Praneeth Vepakomma, Abhishek Singh, Ramesh Raskar, Hadi Esmaeilzadeh

    Abstract: The ever-growing advances of deep learning in many areas including vision, recommendation systems, natural language processing, etc., have led to the adoption of Deep Neural Networks (DNNs) in production systems. The availability of large datasets and high computational power are the main contributors to these advances. The datasets are usually crowdsourced and may contain sensitive information. T… ▽ More

    Submitted 6 November, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

  9. arXiv:2004.05333  [pdf, other

    cs.LG cs.PF

    Bit-Parallel Vector Composability for Neural Acceleration

    Authors: Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, Hadi Esmaeilzadeh

    Abstract: Conventional neural accelerators rely on isolated self-sufficient functional units that perform an atomic operation while communicating the results through an operand delivery-aggregation logic. Each single unit processes all the bits of their operands atomically and produce all the bits of the results in isolation. This paper explores a different design style, where each unit is only responsible… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

  10. arXiv:2003.12154  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy

    Authors: Fatemehsadat Mireshghallah, Mohammadkazem Taram, Ali Jalali, Ahmed Taha Elthakeb, Dean Tullsen, Hadi Esmaeilzadeh

    Abstract: When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to t… ▽ More

    Submitted 20 February, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: This paper is presented at the 2021 Web conference (WWW 2021)

  11. arXiv:2003.02369  [pdf, other

    cs.DC cs.LG stat.ML

    Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices

    Authors: Byung Hoon Ahn, Jinwon Lee, Jamie Menjay Lin, Hsin-Pai Cheng, Jilei Hou, Hadi Esmaeilzadeh

    Abstract: Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual designs. These designs are especially effective while designing neural architectures under hard resource constraints (memory, MACs, . . . ) which highlights the import… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: Published as a conference paper at MLSys 2020 (Oral Presentation)

  12. arXiv:2003.00146  [pdf, other

    cs.LG stat.ML

    WaveQ: Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization

    Authors: Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Tarek Elgindi, Charles-Alban Deledalle, Hadi Esmaeilzadeh

    Abstract: As deep neural networks make their ways into different domains, their compute efficiency is becoming a first-order constraint. Deep quantization, which reduces the bitwidth of the operations (below 8 bits), offers a unique opportunity as it can reduce both the storage and compute requirements of the network super-linearly. However, if not employed with diligence, this can lead to significant accur… ▽ More

    Submitted 24 April, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Preliminary work. Under review

  13. arXiv:2001.08743  [pdf, other

    cs.LG stat.ML

    Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

    Authors: Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

    Abstract: Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently genetic algorithms and other stochastic methods. These methods suffer from frequent costly hardware measurements rendering t… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: Published as a conference paper at ICLR 2020. arXiv admin note: text overlap with arXiv:1905.12799

  14. arXiv:1906.11915  [pdf, other

    cs.AR

    Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

    Authors: Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh

    Abstract: Low-power potential of mixed-signal design makes it an alluring option to accelerate Deep Neural Networks (DNNs). However, mixed-signal circuitry suffers from limited range for information encoding, susceptibility to noise, and Analog to Digital (A/D) conversion overheads. This paper aims to address these challenges by offering and leveraging the insight that a vector dot-product (the basic operat… ▽ More

    Submitted 12 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

  15. arXiv:1906.06033  [pdf, other

    cs.LG stat.ML

    Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks

    Authors: Ahmed T. Elthakeb, Prannoy Pilligundla, Alex Cloninger, Hadi Esmaeilzadeh

    Abstract: The deep layers of modern neural networks extract a rather rich set of features as an input propagates through the network. This paper sets out to harvest these rich intermediate representations for quantization with minimal accuracy loss while significantly reducing the memory footprint and compute intensity of the DNN. This paper utilizes knowledge distillation through teacher-student paradigm (… ▽ More

    Submitted 2 March, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

  16. arXiv:1905.12799  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

    Authors: Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh

    Abstract: Achieving faster execution with shorter compilation time can enable further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently, simulated annealing and genetic algorithms. Our work takes a unique approach by formulating compiler optimizations for neur… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  17. arXiv:1905.11814  [pdf, other

    cs.CR cs.LG stat.ML

    Shredder: Learning Noise Distributions to Protect Inference Privacy

    Authors: Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Dean Tullsen, Hadi Esmaeilzadeh

    Abstract: A wide variety of deep neural applications increasingly rely on the cloud to perform their compute-heavy inference. This common practice requires sending private and privileged data over the network to remote servers, exposing it to the service provider and potentially compromising its privacy. Even if the provider is trusted, the data can still be vulnerable over communication channels or via sid… ▽ More

    Submitted 27 October, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Presented in ASPLOS 2020

  18. arXiv:1905.01416  [pdf, other

    cs.LG stat.ML

    SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training

    Authors: Ahmed T. Elthakeb, Prannoy Pilligundla, Hadi Esmaeilzadeh

    Abstract: Deep quantization of neural networks (below eight bits) offers significant promise in reducing their compute and storage cost. Albeit alluring, without special techniques for training and optimization, deep quantization results in significant accuracy loss. To further mitigate this loss, we propose a novel sinusoidal regularization, called SinReQ1, for deep quantized training. SinReQ adds a period… ▽ More

    Submitted 1 December, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

  19. arXiv:1811.01704  [pdf, other

    cs.LG stat.ML

    ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

    Authors: Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

    Abstract: Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of network encodings. Recent research affirms that carefully selecting the quantization levels for each layer can preserve the accuracy while pushing the bitwidth below… ▽ More

    Submitted 16 April, 2020; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: Presented as a spotlight paper at NeurIPS Workshop on ML for Systems 2018

  20. arXiv:1806.01107  [pdf, other

    cs.DC cs.AR cs.LG cs.NE

    GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

    Authors: Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh

    Abstract: Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs a… ▽ More

    Submitted 10 May, 2018; originally announced June 2018.

    Comments: Proceedings of the 45th International Symposium on Computer Architecture (ISCA), 2018

  21. arXiv:1801.06027  [pdf, other

    cs.DB cs.AR cs.LG

    In-RDBMS Hardware Acceleration of Advanced Analytics

    Authors: Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, Hadi Esmaeilzadeh

    Abstract: The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in-Database Acceleration… ▽ More

    Submitted 18 September, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    Journal ref: Divya Mahajan, Joon Kyung Kim, Jacob Sacks, Adel Ardalan, Arun Kumar, and Hadi Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11): 1317-1331, 2018

  22. arXiv:1712.01507  [pdf, other

    cs.NE cs.AR

    Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

    Authors: Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Joon Kyung Kim, Vikas Chandra, Hadi Esmaeilzadeh

    Abstract: Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted f… ▽ More

    Submitted 30 May, 2018; v1 submitted 5 December, 2017; originally announced December 2017.