Skip to main content

Showing 1–5 of 5 results for author: Kwon, S M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.14808  [pdf, ps, other

    stat.ML cs.LG math.ST

    Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective

    Authors: Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu

    Abstract: This work aims to demystify the out-of-distribution (OOD) capabilities of in-context learning (ICL) by studying linear regression tasks parameterized with low-rank covariance matrices. With such a parameterization, we can model distribution shifts as a varying angle between the subspace of the training and testing covariance matrices. We prove that a single-layer linear attention model incurs a te… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2503.19859  [pdf, other

    cs.LG eess.SP math.OC stat.CO stat.ML

    An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

    Authors: Laura Balzano, Tianjiao Ding, Benjamin D. Haeffele, Soo Min Kwon, Qing Qu, Peng Wang, Zhangyang Wang, Can Yaras

    Abstract: The rise of deep learning has revolutionized data processing and prediction in signal processing and machine learning, yet the substantial computational demands of training and deploying modern large-scale deep models present significant challenges, including high computational costs and energy consumption. Recent research has uncovered a widespread phenomenon in deep networks: the emergence of lo… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Authors are listed alphabetically; 27 pages, 10 figures

  3. arXiv:2502.20531  [pdf, other

    stat.ML cs.LG

    Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability

    Authors: Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar, Qing Qu

    Abstract: Deep neural networks trained using gradient descent with a fixed learning rate $η$ often operate in the regime of "edge of stability" (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/η$. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Published in ICLR 2025

  4. arXiv:2410.21262  [pdf, other

    cs.LG cs.AI stat.ML

    BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

    Authors: Changwoo Lee, Soo Min Kwon, Qing Qu, Hun-Seok Kim

    Abstract: Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large networks pose significant computational challenges during inference. To address these challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, designed to learn and leverage efficient structures preval… ▽ More

    Submitted 29 October, 2024; v1 submitted 28 October, 2024; originally announced October 2024.

  5. arXiv:2311.05061  [pdf, other

    cs.LG stat.ML

    Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

    Authors: Soo Min Kwon, Zekai Zhang, Dogyoon Song, Laura Balzano, Qing Qu

    Abstract: Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We obs… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: International Conference on Artificial Intelligence and Statistics (AISTATS 2024)