-
Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints
Authors:
Benjamin Dubois-Taine,
Alexandre d'Aspremont
Abstract:
We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an upper bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage appr…
▽ More
We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an upper bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage approximates the optimal dual value with a large set of primal feasible solutions. In the second stage, this set is trimmed down to a primal solution by computing (approximate) Caratheodory representations. The main computational requirement of our method is tractability of the Fenchel conjugates of the component functions and their (sub)gradients. When the function domains are convex, the method recovers the classical duality gap bounds obtained via Shapley-Folkman. When the function domains are nonconvex, the method also recovers classical duality gap bounds from the literature, based on a more general notion of nonconvexity.
△ Less
Submitted 21 May, 2025; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Iteratively Reweighted Least Squares for Phase Unwrapping
Authors:
Benjamin Dubois-Taine,
Roland Akiki,
Alexandre d'Aspremont
Abstract:
The 2D phase unwrapping problem seeks to recover a phase image from its observation modulo 2$π$, and is a crucial step in a variety of imaging applications. In particular, it is one of the most time-consuming steps in the interferometric synthetic aperture radar (InSAR) pipeline. In this work we tackle the $L^1$-norm phase unwrapping problem. In optimization terms, this is a simple sparsity-induci…
▽ More
The 2D phase unwrapping problem seeks to recover a phase image from its observation modulo 2$π$, and is a crucial step in a variety of imaging applications. In particular, it is one of the most time-consuming steps in the interferometric synthetic aperture radar (InSAR) pipeline. In this work we tackle the $L^1$-norm phase unwrapping problem. In optimization terms, this is a simple sparsity-inducing problem, albeit in very large dimension. To solve this high-dimensional problem, we iteratively solve a series of numerically simpler weighted least squares problems, which are themselves solved using a preconditioned conjugate gradient method. Our algorithm guarantees a sublinear rate of convergence in function values, is simple to implement and can easily be ported to GPUs, where it significantly outperforms state of the art phase unwrapping methods.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization
Authors:
Benjamin Dubois-Taine,
Francis Bach,
Quentin Berthet,
Adrien Taylor
Abstract:
We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the…
▽ More
We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $ε$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrtε)$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrtε)$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
△ Less
Submitted 25 November, 2024; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent
Authors:
Sharan Vaswani,
Benjamin Dubois-Taine,
Reza Babanezhad
Abstract:
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an…
▽ More
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} \left(\exp \left( \frac{-T}κ \right) + \frac{σ^2}{T} \right)$ rate, without knowing $σ^2$. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal $\tilde{O} \left(\exp \left( \frac{-T}{\sqrtκ} \right) + \frac{σ^2}{T} \right)$ rate, without knowledge of $σ^2$. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. We empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.
△ Less
Submitted 20 June, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
SVRG Meets AdaGrad: Painless Variance Reduction
Authors:
Benjamin Dubois-Taine,
Sharan Vaswani,
Reza Babanezhad,
Mark Schmidt,
Simon Lacoste-Julien
Abstract:
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step…
▽ More
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge of problem-dependent constants that are often unknown and difficult to estimate. To address this, we use ideas from adaptive gradient methods to propose AdaSVRG, which is a more robust variant of SVRG, a common VR method. AdaSVRG uses AdaGrad in the inner loop of SVRG, making it robust to the choice of step-size. When minimizing a sum of n smooth convex functions, we prove that a variant of AdaSVRG requires $\tilde{O}(n + 1/ε)$ gradient evaluations to achieve an $O(ε)$-suboptimality, matching the typical rate, but without needing to know problem-dependent constants. Next, we leverage the properties of AdaGrad to propose a heuristic that adaptively determines the length of each inner-loop in AdaSVRG. Via experiments on synthetic and real-world datasets, we validate the robustness and effectiveness of AdaSVRG, demonstrating its superior performance over standard and other "tune-free" VR methods.
△ Less
Submitted 2 November, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.