Skip to main content

Showing 1–21 of 21 results for author: Thekumparampil, K

.
  1. arXiv:2506.05454  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Zeroth-Order Optimization Finds Flat Minima

    Authors: Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Michael Muehlebach, Niao He

    Abstract: Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit regularization that provides a fine-grained characterization on wh… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  2. arXiv:2505.14826  [pdf, other

    cs.LG cs.CL stat.ML

    FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

    Authors: Rohan Deb, Kiran Thekumparampil, Kousha Kalantari, Gaurush Hiranandani, Shoham Sabach, Branislav Kveton

    Abstract: Supervised fine-tuning (SFT) is a standard approach to adapting large language models (LLMs) to new domains. In this work, we improve the statistical efficiency of SFT by selecting an informative subset of training examples. Specifically, for a fixed budget of training examples, which determines the computational cost of fine-tuning, we determine the most informative ones. The key idea in our meth… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  3. arXiv:2412.19396  [pdf, other

    cs.LG cs.AI cs.IT math.OC stat.ML

    Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

    Authors: Kiran Koshy Thekumparampil, Gaurush Hiranandani, Kousha Kalantari, Shoham Sabach, Branislav Kveton

    Abstract: We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as learning a Plackett-Luce model over a universe of $N$ choices from $K$-way comparison feedback, where typically $K \ll N$. Our solution is the D-optimal d… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Submitted to AISTATS 2025 on October 10, 2024

  4. arXiv:2403.05054  [pdf, other

    math.OC cs.LG

    A Sinkhorn-type Algorithm for Constrained Optimal Transport

    Authors: Xun Tang, Holakou Rahmanian, Michael Shavlovsky, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying

    Abstract: Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation an… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2401.12253  [pdf, other

    math.OC cs.LG stat.ML

    Accelerating Sinkhorn Algorithm with Sparse Newton Iterations

    Authors: Xun Tang, Michael Shavlovsky, Holakou Rahmanian, Elisa Tardini, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying

    Abstract: Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the pote… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: In ICLR 2024

  6. arXiv:2310.09639  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    DPZero: Private Fine-Tuning of Language Models without Backpropagation

    Authors: Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

    Abstract: The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continues to grow, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize training data, it is important to protect potentially sensitive… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  7. arXiv:2308.00177  [pdf, other

    cs.LG cs.AI

    Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

    Authors: Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi

    Abstract: On tabular data, a significant body of literature has shown that current deep learning (DL) models perform at best similarly to Gradient Boosted Decision Trees (GBDTs), while significantly underperforming them on outlier data. However, these works often study idealized problem settings which may fail to capture complexities of real-world scenarios. We identify a natural tabular data setting where… ▽ More

    Submitted 25 June, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: ICML-MFPL 2023 Workshop Oral, SPIGM@ICML2024

  8. arXiv:2206.00363  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization

    Authors: Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

    Abstract: We study differentially private (DP) algorithms for smooth stochastic minimax optimization, with stochastic minimization as a byproduct. The holy grail of these settings is to guarantee the optimal trade-off between the privacy and the excess population loss, using an algorithm with a linear time-complexity in the number of training samples. We provide a general framework for solving differentiall… ▽ More

    Submitted 19 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  9. arXiv:2201.07427  [pdf, other

    math.OC cs.LG stat.ML

    Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization

    Authors: Kiran Koshy Thekumparampil, Niao He, Sewoong Oh

    Abstract: We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) + y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions and admit first-order gradient oracles. Surprisingly, no known first-order algorithms have hitherto achieved the lower complexity bound of… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: Submitted for review on Oct 15, 2021. Accepted to AISTATS 2022 on Jan 18, 2022

  10. arXiv:2108.06869  [pdf, other

    cs.LG cs.DC math.OC

    FedChain: Chained Algorithms for Near-Optimal Communication Cost in Federated Learning

    Authors: Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh

    Abstract: Federated learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients. A common approach is local methods, where clients take multiple optimization steps over local data before communicating with the server (e.g., FedAvg). Local methods can exploit similarity between clients' data. However, in existing analyses, this comes… ▽ More

    Submitted 16 April, 2023; v1 submitted 15 August, 2021; originally announced August 2021.

    Comments: abstract typo correction

  11. arXiv:2105.08306  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Sample Efficient Linear Meta-Learning by Alternating Minimization

    Authors: Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

    Abstract: Meta-learning synthesizes and leverages the knowledge from a given set of tasks to rapidly learn new tasks using very little data. Meta-learning of linear regression tasks, where the regressors lie in a low-dimensional subspace, is an extensively-studied fundamental problem in this domain. However, existing results either guarantee highly suboptimal estimation errors, or require $Ω(d)$ samples per… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

  12. arXiv:2102.06333  [pdf, other

    cs.LG cs.DC math.OC

    Efficient Algorithms for Federated Saddle Point Optimization

    Authors: Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh

    Abstract: We consider strongly convex-concave minimax problems in the federated setting, where the communication constraint is the main bottleneck. When clients are arbitrarily heterogeneous, a simple Minibatch Mirror-prox achieves the best performance. As the clients become more homogeneous, using multiple local gradient updates at the clients significantly improves upon Minibatch Mirror-prox by communicat… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  13. arXiv:2010.01848  [pdf, other

    math.OC cs.LG stat.ML

    Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

    Authors: Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

    Abstract: We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve $ε$-suboptimality in high-dimensions, $Θ(ε^{-2})$ FO calls are necessary. This is achieved by the projected s… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  14. arXiv:1907.01543  [pdf, other

    math.OC cs.LG stat.ML

    Efficient Algorithms for Smooth Minimax Optimization

    Authors: Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

    Abstract: This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new algorithm combin… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  15. arXiv:1906.06034  [pdf, other

    cs.LG stat.ML

    InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs

    Authors: Zinan Lin, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh

    Abstract: Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution. Recent advances have been dominated by Variational AutoEncoder (VAE)-based methods, while training disentangled generative adversarial networks (GANs) remains challenging. In this w… ▽ More

    Submitted 7 August, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

    Comments: Published in ICML 2020. 45 pages, 52 figures, a new unsupervised model selection scheme (ModelCentrality) is introduced in this version

  16. arXiv:1906.03579  [pdf, other

    stat.ML cs.LG

    Robust conditional GANs under missing or uncertain labels

    Authors: Kiran Koshy Thekumparampil, Sewoong Oh, Ashish Khetan

    Abstract: Matching the performance of conditional Generative Adversarial Networks with little supervision is an important task, especially in venturing into new domains. We design a new training algorithm, which is robust to missing or ambiguous labels. The main idea is to intentionally corrupt the labels of generated examples to match the statistics of the real data, and have a discriminator process the re… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

  17. arXiv:1811.03205  [pdf, other

    stat.ML cs.AI cs.LG

    Robustness of Conditional GANs to Noisy Labels

    Authors: Kiran Koshy Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh

    Abstract: We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the noise model is known or not. When the distribution of the noise is known, we introduce… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

  18. arXiv:1803.03735  [pdf, other

    stat.ML cs.AI cs.LG

    Attention-based Graph Neural Network for Semi-supervised Learning

    Authors: Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li

    Abstract: Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a… ▽ More

    Submitted 9 March, 2018; originally announced March 2018.

  19. arXiv:1704.07228  [pdf, other

    stat.ML cs.LG

    Learning from Comparisons and Choices

    Authors: Sahand Negahban, Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu

    Abstract: When tracking user-specific online activities, each user's preference is revealed in the form of choices and comparisons. For example, a user's purchase history is a record of her choices, i.e. which item was chosen among a subset of offerings. A user's preferences can be observed either explicitly as in movie ratings or implicitly as in viewing times of news articles. Given such individualized or… ▽ More

    Submitted 30 December, 2018; v1 submitted 24 April, 2017; originally announced April 2017.

    Comments: 77 pages, 12 figures; added new experiments and references. arXiv admin note: substantial text overlap with arXiv:1506.07947

  20. arXiv:1506.07947  [pdf, other

    cs.LG cs.IT stat.ML

    Collaboratively Learning Preferences from Ordinal Data

    Authors: Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu

    Abstract: In applications such as recommendation systems and revenue management, it is important to predict preferences on items that have not been seen by a user or predict outcomes of comparisons among those that have never been compared. A popular discrete choice model of multinomial logit model captures the structure of the hidden preferences with a low-rank matrix. In order to predict the preferences,… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: 38 pages 2 figures

  21. arXiv:1402.4892  [pdf, other

    cs.NI cs.DS cs.IT

    Sub-Modularity of Waterfilling with Applications to Online Basestation Allocation

    Authors: Kiran Koshy Thekumparampil, Andrew Thangaraj, Rahul Vaze

    Abstract: We show that the popular water-filling algorithm for maximizing the mutual information in parallel Gaussian channels is sub-modular. The sub-modularity of water-filling algorithm is then used to derive online basestation allocation algorithms, where mobile users are assigned to one of many possible basestations immediately and irrevocably upon arrival without knowing the future user information. T… ▽ More

    Submitted 20 February, 2014; originally announced February 2014.

    Comments: 5 pages, 2 figures; submitted to the International Symposium on Information Theory 2014