-
Allocating Variance to Maximize Expectation
Authors:
Renato Purita Paes Leme,
Cliff Stein,
Yifeng Teng,
Pratik Worah
Abstract:
We design efficient approximation algorithms for maximizing the expectation of the supremum of families of Gaussian random variables. In particular, let $\mathrm{OPT}:=\max_{σ_1,\cdots,σ_n}\mathbb{E}\left[\sum_{j=1}^{m}\max_{i\in S_j} X_i\right]$, where $X_i$ are Gaussian, $S_j\subset[n]$ and $\sum_iσ_i^2=1$, then our theoretical results include:
- We characterize the optimal variance allocation…
▽ More
We design efficient approximation algorithms for maximizing the expectation of the supremum of families of Gaussian random variables. In particular, let $\mathrm{OPT}:=\max_{σ_1,\cdots,σ_n}\mathbb{E}\left[\sum_{j=1}^{m}\max_{i\in S_j} X_i\right]$, where $X_i$ are Gaussian, $S_j\subset[n]$ and $\sum_iσ_i^2=1$, then our theoretical results include:
- We characterize the optimal variance allocation -- it concentrates on a small subset of variables as $|S_j|$ increases,
- A polynomial time approximation scheme (PTAS) for computing $\mathrm{OPT}$ when $m=1$, and
- An $O(\log n)$ approximation algorithm for computing $\mathrm{OPT}$ for general $m>1$.
Such expectation maximization problems occur in diverse applications, ranging from utility maximization in auctions markets to learning mixture models in quantitative genetics.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Enhancing selectivity using Wasserstein distance based reweighing
Authors:
Pratik Worah
Abstract:
Given two labeled data-sets $\mathcal{S}$ and $\mathcal{T}$, we design a simple and efficient greedy algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on $\mathcal{S}$ approaches the limiting distribution that would have resulted by training on $\mathcal{T}$.
On the theoretical side, we prove that when the metric en…
▽ More
Given two labeled data-sets $\mathcal{S}$ and $\mathcal{T}$, we design a simple and efficient greedy algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on $\mathcal{S}$ approaches the limiting distribution that would have resulted by training on $\mathcal{T}$.
On the theoretical side, we prove that when the metric entropy of the input datasets is bounded, our greedy algorithm outputs a close to optimal reweighing, i.e., the two invariant distributions of network weights will be provably close in total variation distance. Moreover, the algorithm is simple and scalable, and we prove bounds on the efficiency of the algorithm as well.
As a motivating application, we train a neural net to recognize small molecule binders to MNK2 (a MAP Kinase, responsible for cell signaling) which are non-binders to MNK1 (a highly similar protein). In our example dataset, of the 43 distinct small molecules predicted to be most selective from the enamine catalog, 2 small molecules were experimentally verified to be selective, i.e., they reduced the enzyme activity of MNK2 below 50\% but not MNK1, at 10$μ$M -- a 5\% success rate.
△ Less
Submitted 25 February, 2025; v1 submitted 21 January, 2024;
originally announced January 2024.
-
Learning Rate Schedules in the Presence of Distribution Shift
Authors:
Matthew Fahrbach,
Adel Javanmard,
Vahab Mirrokni,
Pratik Worah
Abstract:
We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we…
▽ More
We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
△ Less
Submitted 20 August, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
The Landscape of the Proximal Point Method for Nonconvex-Nonconcave Minimax Optimization
Authors:
Benjamin Grimmer,
Haihao Lu,
Pratik Worah,
Vahab Mirrokni
Abstract:
Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax…
▽ More
Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax problems. We find that a classic generalization of the Moreau envelope by Attouch and Wets provides key insights. Critically, we show this envelope not only smooths the objective but can convexify and concavify it based on the level of interaction present between the minimizing and maximizing variables. From this, we identify three distinct regions of nonconvex-nonconcave problems. When interaction is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction is fairly weak, we derive local linear convergence guarantees with a proper initialization. Between these two settings, we show that PPM may diverge or converge to a limit cycle.
△ Less
Submitted 1 April, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.