Skip to main content

Showing 1–42 of 42 results for author: Annavaram, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15588  [pdf, ps, other

    cs.LG

    Memory-Efficient Differentially Private Training with Gradient Random Projection

    Authors: Alex Mulrooney, Devansh Gupta, James Flemings, Huanyu Zhang, Murali Annavaram, Meisam Razaviyayn, Xinwei Zhang

    Abstract: Differential privacy (DP) protects sensitive data during neural network training, but standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping, limiting scalability. We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage while maintaining utility on par with first-order DP approaches. Rather than di… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    ACM Class: I.2.7; I.2.10

  2. arXiv:2506.12035  [pdf, ps, other

    cs.LG cs.AI

    MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention

    Authors: Chaoyi Jiang, Sungwoo Kim, Lei Gao, Hossein Entezari Zarch, Won Woo Ro, Murali Annavaram

    Abstract: Masked autoregressive (MAR) models unify the strengths of masked and autoregressive generation by predicting tokens in a fixed order using bidirectional attention for image generation. While effective, MAR models suffer from significant computational overhead, as they recompute attention and feed-forward representations for all tokens at every decoding step, despite most tokens remaining semantica… ▽ More

    Submitted 22 May, 2025; originally announced June 2025.

  3. arXiv:2504.05598  [pdf, other

    cs.CL cs.LG

    DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

    Authors: Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavaram

    Abstract: Speculative Decoding (SD) is a widely used approach to accelerate the inference of large language models (LLMs) without reducing generation quality. It operates by first using a compact model to draft multiple tokens efficiently, followed by parallel verification using the target LLM. This approach leads to faster inference compared to auto-regressive decoding. While there are multiple approaches… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  4. arXiv:2501.19287  [pdf, other

    cs.LG

    Differentially Private In-context Learning via Sampling Few-shot Mixed with Zero-shot Outputs

    Authors: James Flemings, Haosheng Gan, Hongyi Li, Meisam Razaviyayn, Murali Annavaram

    Abstract: In-context learning (ICL) has shown promising improvement in downstream task adaptation of LLMs by augmenting prompts with relevant input-output examples (demonstrations). However, the ICL demonstrations can contain privacy-sensitive information, which can be leaked and/or regurgitated by the LLM output. Differential Privacy (DP), a widely adopted privacy safeguard, has emerged to mitigate this pr… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  5. arXiv:2501.11771  [pdf, other

    cs.CR cs.DC

    Characterization of GPU TEE Overheads in Distributed Data Parallel ML Training

    Authors: Jonghyun Lee, Yongqin Wang, Rachit Rajat, Murali Annavaram

    Abstract: Confidential computing (CC) or trusted execution enclaves (TEEs) is now the most common approach to enable secure computing in the cloud. The recent introduction of GPU TEEs by NVIDIA enables machine learning (ML) models to be trained without leaking model weights or data to the cloud provider. However, the potential performance implications of using GPU TEEs for ML training are not well character… ▽ More

    Submitted 27 March, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

  6. arXiv:2411.17089  [pdf, ps, other

    cs.LG cs.DC cs.PF

    KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation

    Authors: Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram

    Abstract: Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) cache is used to store intermediate activations, which significantly lowers the computational overhead for token generation. However, the memory required for the KV cache grows rapidly, often exceeding the capacity of GPU memory. A cost-effective alternative is to… ▽ More

    Submitted 4 June, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: ACL Findings 2025

  7. arXiv:2410.15240  [pdf, other

    cs.CR cs.AR

    Fastrack: Fast IO for Secure ML using GPU TEEs

    Authors: Yongqin Wang, Rachit Rajat, Jonghyun Lee, Tingting Tang, Murali Annavaram

    Abstract: As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decryp… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  8. arXiv:2410.03026  [pdf, ps, other

    cs.CL cs.LG

    Estimating Privacy Leakage of Augmented Contextual Knowledge in Language Models

    Authors: James Flemings, Bo Jiang, Wanrong Zhang, Zafar Takhirov, Murali Annavaram

    Abstract: Language models (LMs) rely on their parametric knowledge augmented with relevant contextual knowledge for certain tasks, such as question answering. However, the contextual knowledge can contain private information that may be leaked when answering queries, and estimating this privacy leakage is not well understood. A straightforward approach of directly comparing an LM's output to the contexts ca… ▽ More

    Submitted 30 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

  9. arXiv:2410.02016  [pdf, other

    cs.LG cs.CR

    Adaptively Private Next-Token Prediction of Large Language Models

    Authors: James Flemings, Meisam Razaviyayn, Murali Annavaram

    Abstract: As Large Language Models (LLMs) proliferate, developing privacy safeguards for these models is crucial. One popular safeguard involves training LLMs in a differentially private manner. However, such solutions are shown to be computationally expensive and detrimental to the utility of these models. Since LLMs are deployed on the cloud and thus only accessible via an API, a Machine Learning as a Ser… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  10. arXiv:2409.15520  [pdf, other

    cs.LG cs.DC

    Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines

    Authors: Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram

    Abstract: Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. The next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Given the sensitive nature of such private data, it is desirable to fine-tune these models on edge devices to improve user trust. However, fine-tuning on resource-constrained edge devices pre… ▽ More

    Submitted 6 November, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted at NeurIPS 2024 ENLSP-IV workshop

  11. arXiv:2407.08108  [pdf, ps, other

    cs.IR cs.AI cs.LG

    CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data

    Authors: Hossein Entezari Zarch, Abdulla Alshabanah, Chaoyi Jiang, Murali Annavaram

    Abstract: Deep learning recommendation models (DLRMs) are at the heart of the current e-commerce industry. However, the amount of training data used to train these large models is growing exponentially, leading to substantial training hurdles. The training dataset contains two primary types of information: content-based information (features of users and items) and collaborative information (interactions be… ▽ More

    Submitted 23 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2403.15638  [pdf, other

    cs.CR cs.CL cs.LG

    Differentially Private Next-Token Prediction of Large Language Models

    Authors: James Flemings, Meisam Razaviyayn, Murali Annavaram

    Abstract: Ensuring the privacy of Large Language Models (LLMs) is becoming increasingly important. The most widely adopted technique to accomplish this is DP-SGD, which trains a model to guarantee Differential Privacy (DP). However, DP-SGD overestimates an adversary's capabilities in having white box access to the model and, as a result, causes longer training times and larger memory usage than SGD. On the… ▽ More

    Submitted 26 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  13. arXiv:2403.10995  [pdf, other

    cs.LG cs.AI cs.CR cs.SI

    Edge Private Graph Neural Networks with Singular Value Perturbation

    Authors: Tingting Tang, Yue Niu, Salman Avestimehr, Murali Annavaram

    Abstract: Graph neural networks (GNNs) play a key role in learning representations from graph-structured data and are demonstrated to be useful in many applications. However, the GNN training pipeline has been shown to be vulnerable to node feature leakage and edge extraction attacks. This paper investigates a scenario where an attacker aims to recover private edge information from a trained GNN model. Prev… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted at Privacy Enhancing Technologies Symposium (PETS) 2024

  14. arXiv:2403.08994  [pdf, other

    cs.CL

    Ethos: Rectifying Language Models in Orthogonal Parameter Space

    Authors: Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram

    Abstract: Language models (LMs) have greatly propelled the research on natural language processing. However, LMs also raise concerns regarding the generation of biased or toxic content and the potential disclosure of private information from the training dataset. In this work, we present a new efficient approach, Ethos, that rectifies LMs to mitigate toxicity and bias in outputs and avoid privacy leakage. E… ▽ More

    Submitted 1 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  15. arXiv:2403.00932  [pdf, other

    cs.LG cs.CL cs.CR

    Differentially Private Knowledge Distillation via Synthetic Text Generation

    Authors: James Flemings, Murali Annavaram

    Abstract: Large Language models (LLMs) are achieving state-of-the-art performance in many different downstream tasks. However, the increasing urgency of data privacy puts pressure on practitioners to train LLMs with Differential Privacy (DP) on private data. Concurrently, the exponential growth in parameter size of LLMs necessitates model compression before deployment of LLMs on resource-constrained devices… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  16. arXiv:2311.04406  [pdf, other

    cs.CR

    CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks

    Authors: Yongqin Wang, Pratik Sarkar, Nishat Koti, Arpita Patra, Murali Annavaram

    Abstract: Secure Multiparty Computation (MPC) protocols enable secure evaluation of a circuit by several parties, even in the presence of an adversary who maliciously corrupts all but one of the parties. These MPC protocols are constructed using the well-known secret-sharing-based paradigm (SPDZ and SPDZ2k), where the protocols ensure security against a malicious adversary by computing Message Authenticatio… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  17. arXiv:2212.06264  [pdf, other

    cs.CE cs.CR cs.DC cs.LG

    Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

    Authors: Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee

    Abstract: Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  18. arXiv:2209.13643  [pdf, other

    cs.CR cs.LG

    MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

    Authors: Yongqin Wang, Rachit Rajat, Murali Annavaram

    Abstract: Multi-party computing (MPC) has been gaining popularity as a secure computing model over the past few years. However, prior works have demonstrated that MPC protocols still pay substantial performance penalties compared to plaintext, particularly when applied to ML algorithms. The overhead is due to added computation and communication costs. Prior studies, as well as our own analysis, found that m… ▽ More

    Submitted 27 August, 2024; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: To be appeared in ASPLOS'25

  19. arXiv:2207.00083  [pdf, other

    cs.CR cs.AR cs.LG

    DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware

    Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

    Abstract: Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoret… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

    Comments: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. arXiv admin note: text overlap with arXiv:2105.00334

  20. arXiv:2206.03776  [pdf, other

    cs.CR

    High-Throughput Secure Multiparty Computation with an Honest Majority in Various Network Settings

    Authors: Christopher Harth-Kitzerow, Ajith Suresh, Yongqin Wang, Hossein Yalame, Georg Carle, Murali Annavaram

    Abstract: In this work, we present novel protocols over rings for semi-honest secure three-party computation (3PC) and malicious four-party computation (4PC) with one corruption. While most existing works focus on improving total communication complexity, challenges such as network heterogeneity and computational complexity, which impact MPC performance in practice, remain underexplored. Our protocols addre… ▽ More

    Submitted 21 May, 2025; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: This is the public version of the paper to be published at the 25th Privacy Enhancing Technologies Symposium (PETS 2025)

  21. arXiv:2112.13416  [pdf, other

    cs.CR cs.LG cs.MM

    Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings

    Authors: Tiantian Feng, Hanieh Hashemi, Rajat Hebbar, Murali Annavaram, Shrikanth S. Narayanan

    Abstract: Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive dem… ▽ More

    Submitted 22 December, 2022; v1 submitted 26 December, 2021; originally announced December 2021.

  22. arXiv:2107.12958  [pdf, other

    cs.DC cs.CR cs.IT cs.LG

    Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

    Authors: Tingting Tang, Ramy E. Ali, Hanieh Hashemi, Tynan Gangwani, Salman Avestimehr, Murali Annavaram

    Abstract: Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from… ▽ More

    Submitted 22 March, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

  23. arXiv:2107.08094  [pdf, other

    cs.CR

    LAORAM: A Look Ahead ORAM Architecture for Training Large Embedding Tables

    Authors: Rachit Rajat, Yongqin Wang, Murali Annavaram

    Abstract: Data confidentiality is becoming a significant concern, especially in the cloud computing era. Memory access patterns have been demonstrated to leak critical information such as security keys and a program's spatial and temporal information. This information leak poses an even more significant privacy challenge in machine learning models with embedding tables. Embedding tables are routinely used t… ▽ More

    Submitted 29 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

  24. arXiv:2106.02743  [pdf, other

    cs.LG

    SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks

    Authors: Chaoyang He, Emir Ceyani, Keshav Balasubramanian, Murali Annavaram, Salman Avestimehr

    Abstract: Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems thanks to their ability to learn state-of-the-art level representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns, regulation restrictions, and commercial competition. Federated Learning is… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: Three co-1st authors have equal contribution (alphabetical order)

  25. arXiv:2105.02295  [pdf, other

    cs.CR cs.AR cs.LG

    Byzantine-Robust and Privacy-Preserving Framework for FedML

    Authors: Hanieh Hashemi, Yongqin Wang, Chuan Guo, Murali Annavaram

    Abstract: Federated learning has emerged as a popular paradigm for collaboratively training a model from data distributed among a set of clients. This learning setting presents, among others, two unique challenges: how to protect privacy of the clients' data during training, and how to ensure integrity of the trained model. We propose a two-pronged solution that aims to address both challenges under a singl… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Journal ref: Security and Safety in Machine Learning Systems Workshop in ICLR 2021

  26. arXiv:2105.00334  [pdf, other

    cs.CR cs.AR cs.LG

    Privacy and Integrity Preserving Training Using Trusted Hardware

    Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

    Abstract: Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. However, Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. This work presents DarKnight, a framework for large… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

    Journal ref: Distributed and Private Machine Learning ICLR 2021 Workshop

  27. arXiv:2104.07145  [pdf, other

    cs.LG cs.AI cs.DC

    FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks

    Authors: Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang, Philip S. Yu, Yu Rong, Peilin Zhao, Junzhou Huang, Murali Annavaram, Salman Avestimehr

    Abstract: Graph Neural Network (GNN) research is rapidly growing thanks to the capacity of GNNs in learning distributed representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to privacy concerns, regulation restrictions, and commercial competitions. Federated learning (FL), a trending distributed learning paradigm, prov… ▽ More

    Submitted 7 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: Our shorter versions are accepted to ICLR 2021 Workshop on Distributed and Private Machine Learning(DPML) and MLSys 2021 GNNSys Workshop on Graph Neural Networks and Systems. The full version is under review

  28. arXiv:2012.04930  [pdf, ps, other

    cs.LG cs.DC

    Distributed Training of Graph Convolutional Networks using Subgraph Approximation

    Authors: Alexandra Angerd, Keshav Balasubramanian, Murali Annavaram

    Abstract: Modern machine learning techniques are successfully being adapted to data modeled as graphs. However, many real-world graphs are typically very large and do not fit in memory, often making the problem of training machine learning models on them intractable. Distributed training has been successfully employed to alleviate memory problems and speed up training in machine learning domains in which th… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  29. arXiv:2010.08679  [pdf, other

    cs.IR cs.LG

    Check-N-Run: A Checkpointing System for Training Deep Learning Recommendation Models

    Authors: Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, Murali Annavaram

    Abstract: Checkpoints play an important role in training long running machine learning (ML) models. Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that they can be used to recover from failures to ensure rapid training progress. In addition, they are used for online training to improve inference prediction accuracy with continuous learning. Given the large and ever incre… ▽ More

    Submitted 4 May, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

  30. arXiv:2010.07541  [pdf, other

    cs.DC

    Secure and Fault Tolerant Decentralized Learning

    Authors: Saurav Prakash, Hanieh Hashemi, Yongqin Wang, Murali Annavaram, Salman Avestimehr

    Abstract: Federated learning (FL) is a promising paradigm for training a global model over data distributed across multiple data owners without centralizing clients' raw data. However, sharing of local model updates can also reveal information of clients' local datasets. Trusted execution environments (TEEs) within the FL server have been recently deployed by companies like Meta for secure aggregation. Howe… ▽ More

    Submitted 13 September, 2022; v1 submitted 15 October, 2020; originally announced October 2020.

  31. arXiv:2007.14513  [pdf, other

    cs.LG cs.CV

    Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

    Authors: Chaoyang He, Murali Annavaram, Salman Avestimehr

    Abstract: Scaling up the convolutional neural network (CNN) size (e.g., width, depth, etc.) is known to effectively improve model accuracy. However, the large model size impedes training on resource-constrained edge devices. For instance, federated learning (FL) may place undue burden on the compute capability of edge nodes, even though there is a strong practical need for FL due to its privacy and confiden… ▽ More

    Submitted 5 November, 2020; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: This paper is accepted to NeurIPS 2020. We propose FedGKT, attempting to address one of the core problems of federated learning: training deep neural networks in resource-constrained edge devices

  32. arXiv:2007.13518  [pdf, other

    cs.LG stat.ML

    FedML: A Research Library and Benchmark for Federated Machine Learning

    Authors: Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Xinghua Zhu, Jianzong Wang, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, Salman Avestimehr

    Abstract: Federated learning (FL) is a rapidly growing research field in machine learning. However, existing FL libraries cannot adequately support diverse algorithmic development; inconsistent dataset and model usage make fair algorithm comparison challenging. In this work, we introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison. Fed… ▽ More

    Submitted 8 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: This is FedML white paper V3. Homepage: https://fedml.ai; GitHub: https://github.com/FedML-AI/FedML; In V3, More advanced algorithms and IoT device training are supported, please check here: https://github.com/FedML-AI/FedML/blob/master/fedml_iot/

  33. arXiv:2006.01300  [pdf, other

    cs.CR

    DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks

    Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

    Abstract: Protecting the privacy of input data is of growing importance as machine learning methods reach new application domains. In this paper, we provide a unified training and inference framework for large DNNs while protecting input privacy and computation integrity. Our approach called DarKnight uses a novel data blinding strategy using matrix masking to create input obfuscation within a trusted execu… ▽ More

    Submitted 15 October, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

  34. arXiv:2004.08546  [pdf, other

    cs.LG cs.CV cs.DC cs.MA stat.ML

    Towards Non-I.I.D. and Invisible Data with FedNAS: Federated Deep Learning via Neural Architecture Search

    Authors: Chaoyang He, Murali Annavaram, Salman Avestimehr

    Abstract: Federated Learning (FL) has been proved to be an effective learning framework when data cannot be centralized due to privacy, communication costs, and regulatory restrictions. When training deep learning models under an FL setting, people employ the predefined model architecture discovered in the centralized environment. However, this predefined architecture may not be the optimal choice because i… ▽ More

    Submitted 3 January, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: accepted to CVPR 2020 workshop on neural architecture search and beyond for representation learning. Code is released at https://fedml.ai

  35. arXiv:1912.10643  [pdf, other

    cs.DC

    Jupiter: A Networked Computing Architecture

    Authors: Pradipta Ghosh, Quynh Nguyen, Pranav K Sakulkar, Aleksandra Knezevic, Jason A. Tran, Jiatong Wang, Zhifeng Lin, Bhaskar Krishnamachari, Murali Annavaram, Salman Avestimehr

    Abstract: In the era of Internet of Things, there is an increasing demand for networked computing to support the requirements of the time-constrained, compute-intensive distributed applications such as multi-camera video processing and data fusion for security. We present Jupiter, an open source networked computing system that inputs a Directed Acyclic Graph (DAG)-based computational task graph to efficient… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  36. arXiv:1912.03485  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Privacy-Preserving Inference in Machine Learning Services Using Trusted Execution Environments

    Authors: Krishna Giri Narra, Zhifeng Lin, Yongqin Wang, Keshav Balasubramaniam, Murali Annavaram

    Abstract: This work presents Origami, which provides privacy-preserving inference for large deep neural network (DNN) models through a combination of enclave execution, cryptographic blinding, interspersed with accelerator-based computation. Origami partitions the ML model into multiple partitions. The first partition receives the encrypted user input within an SGX enclave. The enclave decrypts the input an… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

    Comments: 13 pages, Under submission

  37. arXiv:1910.10283  [pdf, other

    cs.DC cs.IT cs.LG

    Train Where the Data is: A Case for Bandwidth Efficient Coded Training

    Authors: Zhifeng Lin, Krishna Giri Narra, Mingchao Yu, Salman Avestimehr, Murali Annavaram

    Abstract: Training a machine learning model is both compute and data-intensive. Most of the model training is performed on high performance compute nodes and the training data is stored near these nodes for faster training. But there is a growing interest in enabling training near the data. For instance, mobile devices are rich sources of training data. It may not be feasible to consolidate the data from mo… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 10 pages, Under submission

  38. arXiv:1906.03999  [pdf, other

    cs.DC cs.LG stat.ML

    Collage Inference: Achieving low tail latency during distributed image classification using coded redundancy models

    Authors: Krishna Narra, Zhifeng Lin, Ganesh Ananthanarayanan, Salman Avestimehr, Murali Annavaram

    Abstract: Reducing the latency variance in machine learning inference is a key requirement in many applications. Variance is harder to control in a cloud deployment in the presence of stragglers. In spite of this challenge, inference is increasingly being done in the cloud, due to the advent of affordable machine learning as a service (MLaaS) platforms. Existing approaches to reduce variance rely on replica… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: 4 pages, CodML workshop at International Conference on Machine Learning (ICML 2019). arXiv admin note: text overlap with arXiv:1904.12222

  39. arXiv:1905.04264  [pdf, other

    cs.DC

    PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs

    Authors: Kiran Kumar Matam, Hanieh Hashemi, Murali Annavaram

    Abstract: Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying num… ▽ More

    Submitted 11 February, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: 13 pages

  40. arXiv:1904.12222  [pdf, other

    cs.CV cs.DC cs.IT cs.LG stat.ML

    Collage Inference: Using Coded Redundancy for Low Variance Distributed Image Classification

    Authors: Krishna Giri Narra, Zhifeng Lin, Ganesh Ananthanarayanan, Salman Avestimehr, Murali Annavaram

    Abstract: MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becoming increasingly popular. Hosting pre-trained machine learning models in the cloud enables elastic scalability as the demand grows. But providing low latency and reducing the latency variance is a key requirement. Variance is harder to control in a cloud deployment due to uncertainties in resource allocations across many virtu… ▽ More

    Submitted 10 September, 2019; v1 submitted 27 April, 2019; originally announced April 2019.

    Comments: 10 pages, Under submission

  41. arXiv:1904.07098  [pdf, other

    cs.DC cs.IT

    Slack Squeeze Coded Computing for Adaptive Straggler Mitigation

    Authors: Krishna Giri Narra, Zhifeng Lin, Mehrdad Kiamari, Salman Avestimehr, Murali Annavaram

    Abstract: While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques leverage coding theory to inject computational redundancy and mitigate stragglers in distributed computations. In this paper, we propose a dynamic workload distributio… ▽ More

    Submitted 31 August, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: 13 pages, SC 2019

  42. arXiv:1811.03617  [pdf, other

    cs.LG cs.DC stat.ML

    GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

    Authors: Mingchao Yu, Zhifeng Lin, Krishna Narra, Songze Li, Youjie Li, Nam Sung Kim, Alexander Schwing, Murali Annavaram, Salman Avestimehr

    Abstract: Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all… ▽ More

    Submitted 31 December, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Accepted at NeurIPS 2018