Skip to main content

Showing 1–5 of 5 results for author: Kosaian, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.07936  [pdf, other

    cs.LG cs.PF

    A Study on the Intersection of GPU Utilization and CNN Inference

    Authors: Jack Kosaian, Amar Phanishayee

    Abstract: There has been significant progress in developing neural network architectures that both achieve high predictive performance and that also achieve high application-level inference throughput (e.g., frames per second). Another metric of increasing importance is GPU utilization during inference: the measurement of how well a deployed neural network uses the computational capabilities of the GPU on w… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  2. Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

    Authors: Jack Kosaian, K. V. Rashmi

    Abstract: Neural networks (NNs) are increasingly employed in safety-critical domains and in environments prone to unreliability (e.g., soft errors), such as on spacecraft. Therefore, it is critical to impart fault tolerance to NN inference. Algorithm-based fault tolerance (ABFT) is emerging as an efficient approach for fault tolerance in NNs. We propose an adaptive approach to ABFT for NN inference that e… ▽ More

    Submitted 7 December, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Appeared in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21), November 14--19, 2021, St. Louis, MO, USA

  3. arXiv:2104.01981  [pdf, other

    cs.LG cs.DC

    ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding

    Authors: Kaige Liu, Jack Kosaian, K. V. Rashmi

    Abstract: Deep-learning-based recommendation models (DLRMs) are widely deployed to serve personalized content to users. DLRMs are large in size due to their use of large embedding tables, and are trained by distributing the model across the memory of tens or hundreds of servers. Server failures are common in such large distributed systems and must be mitigated to enable training to progress. Checkpointing i… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  4. arXiv:1905.00863  [pdf, other

    cs.DC cs.IT cs.LG

    Parity Models: A General Framework for Coding-Based Resilience in ML Inference

    Authors: Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

    Abstract: Machine learning models are becoming the primary workhorses for many applications. Production services deploy models through prediction serving systems that take in queries and return predictions by performing inference on machine learning models. In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and fail… ▽ More

    Submitted 16 September, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: This paper is superseded by the ACM SOSP 2019 paper "Parity Models: Erasure-Coded Resilience for Prediction Serving Systems"

  5. arXiv:1806.01259  [pdf, other

    cs.LG cs.IT stat.ML

    Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

    Authors: Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

    Abstract: Machine learning algorithms are typically run on large scale, distributed compute infrastructure that routinely face a number of unavailabilities such as failures and temporary slowdowns. Adding redundant computations using coding-theoretic tools called "codes" is an emerging technique to alleviate the adverse effects of such unavailabilities. A code consists of an encoding function that proactive… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.