Skip to main content

Showing 1–23 of 23 results for author: Gatmiry, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.04994  [pdf, other

    cs.CL cs.AI

    Rethinking Invariance in In-context Learning

    Authors: Lizhe Fang, Yifei Wang, Khashayar Gatmiry, Lei Fang, Yisen Wang

    Abstract: In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable p… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  2. arXiv:2410.21698  [pdf, other

    cs.LG math.ST stat.ML

    On the Role of Depth and Looping for In-Context Learning with Task Diversity

    Authors: Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi, Stefanie Jegelka, Sanjiv Kumar

    Abstract: The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers' abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable abilit… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  3. arXiv:2410.17336  [pdf, other

    cs.LG cs.DS cs.GT math.ST stat.ML

    Computing Optimal Regularizers for Online Linear Optimization

    Authors: Khashayar Gatmiry, Jon Schneider, Stefanie Jegelka

    Abstract: Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO) that guarantee sub-linear regret, but the choice of regularizer can significantly impact dimension-dependent factors in the regret bound. We present an algorithm that takes as input convex and symmetric action sets and loss sets for a specific OLO instance, and outputs a… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  4. arXiv:2410.16401  [pdf, other

    cs.LG math.ST stat.ML

    Simplicity Bias via Global Convergence of Sharpness Minimization

    Authors: Khashayar Gatmiry, Zhiyuan Li, Sashank J. Reddi, Stefanie Jegelka

    Abstract: The remarkable generalization ability of neural networks is usually attributed to the implicit bias of SGD, which often yields models with lower complexity using simpler (e.g. linear) and low-rank features. Recent works have provided empirical and theoretical evidence for the bias of particular variants of SGD (such as label noise SGD) toward flatter regions of the loss landscape. Despite the folk… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  5. arXiv:2410.08292  [pdf, other

    cs.LG cs.AI stat.ML

    Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

    Authors: Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi, Stefanie Jegelka, Sanjiv Kumar

    Abstract: The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with their weights in a single forward pass. Recently, there has been progress in understanding this complex phenomenon from an expressivity point of view, by demonstr… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  6. arXiv:2409.13074  [pdf, other

    cs.LG cs.CV stat.ML

    What does guidance do? A fine-grained analysis in a simple setting

    Authors: Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, Jianfeng Lu

    Abstract: The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution. Our main result is to give a fine-grained characterization of… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  7. arXiv:2407.00571  [pdf, ps, other

    cs.LG

    Adversarial Online Learning with Temporal Feedback Graphs

    Authors: Khashayar Gatmiry, Jon Schneider

    Abstract: We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  8. arXiv:2404.18869  [pdf, ps, other

    cs.LG cs.DS math.PR math.ST stat.ML

    Learning Mixtures of Gaussians Using Diffusion Models

    Authors: Khashayar Gatmiry, Jonathan Kelner, Holden Lee

    Abstract: We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$… ▽ More

    Submitted 4 March, 2025; v1 submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2308.11518  [pdf, ps, other

    cs.LG stat.ML

    EM for Mixture of Linear Regression with Clustered Data

    Authors: Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar

    Abstract: Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federa… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  10. arXiv:2306.13853  [pdf, other

    cs.LG

    A Unified Approach to Controlling Implicit Regularization via Mirror Descent

    Authors: Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan

    Abstract: Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it h… ▽ More

    Submitted 11 January, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.12808

  11. arXiv:2306.13239  [pdf, other

    cs.LG

    The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

    Authors: Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka

    Abstract: Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear wh… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  12. arXiv:2306.11121  [pdf, ps, other

    math.OC cs.LG

    Projection-Free Online Convex Optimization via Efficient Newton Iterations

    Authors: Khashayar Gatmiry, Zakaria Mhammedi

    Abstract: This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set $\cK$ to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap poten… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  13. arXiv:2304.04724  [pdf, ps, other

    stat.CO cs.CC stat.ML

    When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

    Authors: Yuansi Chen, Khashayar Gatmiry

    Abstract: We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstra… ▽ More

    Submitted 8 June, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: 43 pages

  14. arXiv:2304.04095  [pdf, ps, other

    stat.ML cs.CC cs.LG stat.CO

    A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry

    Authors: Yuansi Chen, Khashayar Gatmiry

    Abstract: We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on $\mathbb{R}^d$. We assume that the target density satisfies $ψ_μ$-isoperimetry and that the operator norm and trace of its Hessian are bounded by $L$ and $Υ$ respectively. Our main result establishes that, from a warm start, to achieve $ε$-total variation distance to the target density, MALA… ▽ More

    Submitted 8 June, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: 17 pages

  15. arXiv:2303.00480  [pdf, other

    cs.DS cs.LO math.FA math.NA stat.ML

    Sampling with Barriers: Faster Mixing via Lewis Weights

    Authors: Khashayar Gatmiry, Jonathan Kelner, Santosh S. Vempala

    Abstract: We analyze Riemannian Hamiltonian Monte Carlo (RHMC) for sampling a polytope defined by $m$ inequalities in $\R^n$ endowed with the metric defined by the Hessian of a convex barrier function. The advantage of RHMC over Euclidean methods such as the ball walk, hit-and-run and the Dikin walk is in its ability to take longer steps. However, in all previous work, the mixing rate has a linear dependenc… ▽ More

    Submitted 19 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  16. arXiv:2212.13669  [pdf, ps, other

    cs.LG math.OC

    Near-Optimal Algorithms for Group Distributionally Robust Optimization and Beyond

    Authors: Tasuku Soma, Khashayar Gatmiry, Sharut Gupta, Stefanie Jegelka

    Abstract: Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also pro… ▽ More

    Submitted 31 January, 2025; v1 submitted 27 December, 2022; originally announced December 2022.

    Comments: 4 tables, 2 figures

  17. arXiv:2211.08586  [pdf, ps, other

    cs.DS cs.GT cs.LG

    Bandit Algorithms for Prophet Inequality and Pandora's Box

    Authors: Khashayar Gatmiry, Thomas Kesselheim, Sahil Singla, Yifan Wang

    Abstract: The Prophet Inequality and Pandora's Box problems are fundamental stochastic problem with applications in Mechanism Design, Online Algorithms, Stochastic Optimization, Optimal Stopping, and Operations Research. A usual assumption in these works is that the probability distributions of the $n$ underlying random variables are given as input to the algorithm. Since in practice these distributions nee… ▽ More

    Submitted 6 December, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  18. arXiv:2211.01357  [pdf, ps, other

    math.OC cs.LG

    Quasi-Newton Steps for Efficient Online Exp-Concave Optimization

    Authors: Zakaria Mhammedi, Khashayar Gatmiry

    Abstract: The aim of this paper is to design computationally-efficient and optimal algorithms for the online and stochastic exp-concave optimization settings. Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a $O(d\ln T)$ bound on their regret after $T$ rounds, where $d$ is the dimension of the feasible set. However, such algorithms perform so-called generalized pro… ▽ More

    Submitted 14 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: First revision: presentation improvements

  19. arXiv:2208.07951  [pdf, other

    cs.LG math.DS math.OC stat.ML

    On the generalization of learning algorithms that do not converge

    Authors: Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka

    Abstract: Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics… ▽ More

    Submitted 19 August, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 27 pages, under review

  20. arXiv:2204.10818  [pdf, ps, other

    cs.LG math.DG math.PR

    Convergence of the Riemannian Langevin Algorithm

    Authors: Khashayar Gatmiry, Santosh S. Vempala

    Abstract: We study the Riemannian Langevin Algorithm for the problem of sampling from a distribution with density $ν$ with respect to the natural measure on a manifold with metric $g$. We assume that the target density satisfies a log-Sobolev inequality with respect to the metric and prove that the manifold generalization of the Unadjusted Langevin Algorithm converges rapidly to $ν$ for Hessian manifolds. T… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    MSC Class: 60K60; 58D17 ACM Class: F.2; G.3

  21. arXiv:2008.03650  [pdf, ps, other

    cs.LG math.ST stat.ML

    Testing Determinantal Point Processes

    Authors: Khashayar Gatmiry, Maryam Aliakbarpour, Stefanie Jegelka

    Abstract: Determinantal point processes (DPPs) are popular probabilistic models of diversity. In this paper, we investigate DPPs from a new perspective: property testing of distributions. Given sample access to an unknown distribution $q$ over the subsets of a ground set, we aim to distinguish whether $q$ is a DPP distribution, or $ε$-far from all DPP distributions in $\ell_1$-distance. In this work, we pro… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

  22. arXiv:1811.07863  [pdf, other

    cs.SI cs.DM cs.DS cs.LG stat.ML

    Non-submodular Function Maximization subject to a Matroid Constraint, with Applications

    Authors: Khashayar Gatmiry, Manuel Gomez-Rodriguez

    Abstract: The standard greedy algorithm has been recently shown to enjoy approximation guarantees for constrained non-submodular nondecreasing set function maximization. While these recent results allow to better characterize the empirical success of the greedy algorithm, they are only applicable to simple cardinality constraints. In this paper, we study the problem of maximizing a non-submodular nondecreas… ▽ More

    Submitted 8 October, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

    Comments: Added missing citations and changed strong submodularity ratio to generalized curvature

  23. arXiv:1811.07307  [pdf, ps, other

    math.ST cs.IT

    Information Theoretic Bounds on Optimal Worst-case Error in Binary Mixture Identification

    Authors: Khashayar Gatmiry, Seyed Abolfazl Motahari

    Abstract: Identification of latent binary sequences from a pool of noisy observations has a wide range of applications in both statistical learning and population genetics. Each observed sequence is the result of passing one of the latent mother-sequences through a binary symmetric channel, which makes this configuration analogous to a special case of Bernoulli Mixture Models. This paper aims to attain an a… ▽ More

    Submitted 27 November, 2018; v1 submitted 18 November, 2018; originally announced November 2018.