Skip to main content

Showing 1–6 of 6 results for author: Balabanov, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00772  [pdf, ps, other

    cs.LG cs.AI cs.CL

    LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

    Authors: Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu

    Abstract: Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating onl… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: ICML 2025

  2. arXiv:2502.15376  [pdf, other

    cs.LG cond-mat.mes-hall

    Learning Chern Numbers of Topological Insulators with Gauge Equivariant Neural Networks

    Authors: Longde Huang, Oleksandr Balabanov, Hampus Linander, Mats Granath, Daniel Persson, Jan E. Gerken

    Abstract: Equivariant network architectures are a well-established tool for predicting invariant or equivariant quantities. However, almost all learning problems considered in this context feature a global symmetry, i.e. each point of the underlying space is transformed with the same group element, as opposed to a local ``gauge'' symmetry, where each point is transformed with a different group element, expo… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  3. arXiv:2402.12264  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Uncertainty quantification in fine-tuned LLMs using LoRA ensembles

    Authors: Oleksandr Balabanov, Hampus Linander

    Abstract: Fine-tuning large language models can improve task specific performance, although a general understanding of what the fine-tuned model has learned, forgotten and how to trust its predictions is still missing. We derive principled uncertainty quantification for fine-tuned LLMs with posterior approximations using computationally efficient low-rank adaptation ensembles. We analyze three common multip… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted for ICLR2025 Workshop "Quantify Uncertainty and Hallucination in Foundation Models: The Next Frontier in Reliable AI"

  4. Bayesian posterior approximation with stochastic ensembles

    Authors: Oleksandr Balabanov, Bernhard Mehlig, Hampus Linander

    Abstract: We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles. The stochastic ensembles are formulated as families of distributions and trained to approximate the Bayesian posterior with variational inference. We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and a novel non-par… ▽ More

    Submitted 3 January, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: 19 pages, CVPR 2023

    Journal ref: CVPR (2023) 13701-13711

  5. arXiv:2211.14605  [pdf, other

    cs.LG cs.CV stat.ML

    Looking at the posterior: accuracy and uncertainty of neural-network predictions

    Authors: H. Linander, O. Balabanov, H. Yang, B. Mehlig

    Abstract: Bayesian inference can quantify uncertainty in the predictions of neural networks using posterior distributions for model parameters and network output. By looking at these posterior distributions, one can separate the origin of uncertainty into aleatoric and epistemic contributions. One goal of uncertainty quantification is to inform on prediction accuracy. Here we show that prediction accuracy d… ▽ More

    Submitted 22 November, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: 26 pages, 10 figures, 5 tables

    Journal ref: Machine Learning: Science and Technology 4 (2023) 045032

  6. arXiv:2210.11295  [pdf, ps, other

    math.NA cs.DS

    Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures

    Authors: Oleg Balabanov, Matthias Beaupere, Laura Grigori, Victor Lederer

    Abstract: This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace emb… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the International Conference on Machine Learning, pp. 1564-1576. PMLR, 2023