Skip to main content

Showing 1–4 of 4 results for author: Alnaasan, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02423  [pdf, other

    cs.DC cs.AI

    Accelerating Large Language Model Training with Hybrid GPU-based Compression

    Authors: Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha R. Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: Data Parallelism (DP), Tensor Parallelism (TP), and Pipeline Parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive communication routines to collect, aggregate, and re-distribute gradients, activations, and other important model information, which pose significant overhead. Co-desi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2305.13484  [pdf, other

    cs.DC cs.AI cs.CL cs.CV cs.LG

    Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

    Authors: Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

    Abstract: Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. This inherent characteristic severely impedes computational efficiency during i… ▽ More

    Submitted 2 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: In Proceeding of 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

  3. arXiv:2303.05016  [pdf, other

    cs.PF eess.SP

    Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

    Authors: Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K., Panda

    Abstract: Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance characterization of the benefits of using quantization techniques -- mainly FP16/INT8 variants with static and dynamic schemes -- using the MLPerf Edge Infer… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: Extended version of accepted short paper by ICFEC 2023

  4. arXiv:2110.10659  [pdf, other

    cs.DC cs.AI cs.LG

    OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

    Authors: Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K Panda

    Abstract: Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Effici… ▽ More

    Submitted 24 August, 2022; v1 submitted 20 October, 2021; originally announced October 2021.