Skip to main content

Showing 1–5 of 5 results for author: Frumkin, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.22879  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.PF

    Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

    Authors: Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu

    Abstract: State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. Despite this, scaling up SSMs on cloud services or limited-resource devices is challenging due to their storage requirements and computational power. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from… ▽ More

    Submitted 10 June, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  2. arXiv:2410.13229  [pdf, other

    cs.LG cs.AI

    Quamba: A Post-Training Quantization Recipe for Selective State Space Models

    Authors: Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu

    Abstract: State Space Models (SSMs) have emerged as an appealing alternative to Transformers for large language models, achieving state-of-the-art accuracy with constant memory complexity which allows for holding longer context lengths than attention-based networks. The superior computational efficiency of SSMs in long sequence modeling positions them favorably over Transformers in many scenarios. However,… ▽ More

    Submitted 7 December, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  3. arXiv:2308.10814  [pdf, other

    cs.CV

    Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

    Authors: Natalia Frumkin, Dibakar Gope, Diana Marculescu

    Abstract: Quantization scale and bit-width are the most important parameters when considering how to quantize a neural network. Prior work focuses on optimizing quantization scales in a global manner through gradient methods (gradient descent \& Hessian analysis). Yet, when applying perturbations to quantization scales, we observe a very jagged, highly non-smooth test loss landscape. In fact, small perturba… ▽ More

    Submitted 26 September, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.09643

  4. arXiv:2212.03246  [pdf, other

    cs.LG cs.AI

    MobileTL: On-device Transfer Learning with Inverted Residual Blocks

    Authors: Hung-Yueh Chiang, Natalia Frumkin, Feng Liang, Diana Marculescu

    Abstract: Transfer learning on edge is challenging due to on-device limited resources. Existing work addresses this issue by training a subset of parameters or adding model patches. Developed with inference in mind, Inverted Residual Blocks (IRBs) split a convolutional layer into depthwise and pointwise convolutions, leading to more stacking layers, e.g., convolution, normalization, and activation layers. T… ▽ More

    Submitted 8 April, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  5. arXiv:2211.09643  [pdf, other

    cs.CV cs.AI

    CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

    Authors: Natalia Frumkin, Dibakar Gope, Diana Marculescu

    Abstract: When considering post-training quantization, prior work has typically focused on developing a mixed precision scheme or learning the best way to partition a network for quantization. In our work, CPT-V, we look at a general way to improve the accuracy of networks that have already been quantized, simply by perturbing the quantization scales. Borrowing the idea of contrastive loss from self-supervi… ▽ More

    Submitted 6 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.