Skip to main content

Showing 1–8 of 8 results for author: Lie, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10526  [pdf, ps, other

    cs.LG cs.CL cs.CV

    MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models

    Authors: Mugilan Ganesan, Shane Segal, Ankur Aggarwal, Nish Sinnadurai, Sean Lie, Vithursan Thangarasa

    Abstract: Speculative decoding significantly accelerates language model inference by enabling a lightweight draft model to propose multiple tokens that a larger target model verifies simultaneously. However, applying this technique to vision-language models (VLMs) presents two fundamental challenges: small language models that could serve as efficient drafters lack the architectural components to process vi… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Main paper: 11 pp., 4 figs., 3 tabs.; Supplementary: 2 pp

  2. arXiv:2504.08838  [pdf, other

    cs.CL cs.AI

    SD$^2$: Self-Distilled Sparse Drafters

    Authors: Mike Lasby, Nish Sinnadurai, Valavan Manohararajah, Sean Lie, Vithursan Thangarasa

    Abstract: Speculative decoding is a powerful technique for reducing the latency of Large Language Models (LLMs), offering a fault-tolerant framework that enables the use of highly compressed draft models. In this work, we introduce Self-Distilled Sparse Drafters (SD$^2$), a novel methodology that leverages self-data distillation and fine-grained weight sparsity to produce highly efficient and well-aligned d… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 21 pages

    ACM Class: I.2.0; I.2.7

  3. arXiv:2410.09982  [pdf, other

    cs.LG cs.CL

    Self-Data Distillation for Recovering Quality in Pruned Large Language Models

    Authors: Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie

    Abstract: Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning, which removes less critical components of the model, is a promising strategy for reducing complexity. How… ▽ More

    Submitted 10 May, 2025; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: Accepted to MLSys 2025. Main paper: 14 pp., 4 figs., 6 tabs.; Supplementary: 5 pp

  4. arXiv:2405.03594  [pdf, other

    cs.CL cs.AI

    Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

    Authors: Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity. We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning me… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2403.00952  [pdf, other

    cs.CL cs.LG

    MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

    Authors: Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

    Abstract: Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine). Although domain-specific pre-training enhances efficiency and leads to smaller models, the computational costs of training these LLMs remain high, posing… ▽ More

    Submitted 7 August, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 14 pages, 2 Figures, 5 Tables (Main Paper) + 3 pages (Supplementary Material). Published at ACL 2024

  6. arXiv:2401.14589  [pdf

    cs.CL cs.AI

    Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias

    Authors: Yu He Ke, Rui Yang, Sui An Lie, Taylor Xin Yi Lim, Hairil Rizal Abdullah, Daniel Shu Wei Ting, Nan Liu

    Abstract: Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decisi… ▽ More

    Submitted 12 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 21 pages, 3 figures

  7. arXiv:2303.11525  [pdf, other

    cs.LG cs.CL cs.CV

    Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

    Authors: Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta, Sean Lie

    Abstract: Recent research has focused on weight sparsity in deep neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often compromises accuracy, requiring extended training schedules to attain the accuracy of dense models. In contrast, our approach, Sparse Iso-FLOP Transformations (Sparse-IFT), uses sparsity to improve… ▽ More

    Submitted 17 July, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 14 pages, 5 figures, 6 Tables (Main Paper) + 8 pages (Supplementary Material). Published at ICML 2024

  8. arXiv:2303.10464  [pdf, other

    cs.LG cs.CL

    SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

    Authors: Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis DeCoste, Sean Lie, Shreyas Saxena

    Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Sca… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material)