-
Adaptive Open-Loop Step-Sizes for Accelerated Convergence Rates of the Frank-Wolfe Algorithm
Authors:
Elias Wirth,
Javier Peña,
Sebastian Pokutta
Abstract:
Recent work has shown that in certain settings, the Frank-Wolfe algorithm (FW) with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for a fixed parameter $\ell \in \mathbb{N},\, \ell \geq 2$, attains a convergence rate faster than the traditional $O(t^{-1})$ rate. In particular, when a strong growth property holds, the convergence rate attainable with open-loop step-sizes…
▽ More
Recent work has shown that in certain settings, the Frank-Wolfe algorithm (FW) with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for a fixed parameter $\ell \in \mathbb{N},\, \ell \geq 2$, attains a convergence rate faster than the traditional $O(t^{-1})$ rate. In particular, when a strong growth property holds, the convergence rate attainable with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ is $O(t^{-\ell})$. In this setting there is no single value of the parameter $\ell$ that prevails as superior. This paper shows that FW with log-adaptive open-loop step-sizes $η_t = \frac{2+\log(t+1)}{t+2+\log(t+1)}$ attains a convergence rate that is at least as fast as that attainable with fixed-parameter open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for any value of $\ell \in \mathbb{N},\,\ell\geq 2$. To establish our main convergence results, we extend our previous affine-invariant accelerated convergence results for FW to more general open-loop step-sizes of the form $η_t = g(t)/(t+g(t))$, where $g:\mathbb{N}\to\mathbb{R}_{\geq 0}$ is any non-decreasing function such that the sequence of step-sizes $(η_t)$ is non-increasing. This covers in particular the fixed-parameter case by choosing $g(t) = \ell$ and the log-adaptive case by choosing $g(t) = 2+ \log(t+1)$. To facilitate adoption of log-adaptive open-loop step-sizes, we have incorporated this rule into the {\tt FrankWolfe.jl} software package.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Improved algorithms and novel applications of the FrankWolfe.jl library
Authors:
Mathieu Besançon,
Sébastien Designolle,
Jannis Halbey,
Deborah Hendrych,
Dominik Kuzinowicz,
Sebastian Pokutta,
Hannah Troppens,
Daniel Viladrich Herrmannsdoerfer,
Elias Wirth
Abstract:
Frank-Wolfe (FW) algorithms have emerged as an essential class of methods for constrained optimization, especially on large-scale problems. In this paper, we summarize the algorithmic design choices and progress made in the last years of the development of FrankWolfe.jl, a Julia package gathering high-performance implementations of state-of-the-art FW variants. We review key use cases of the libra…
▽ More
Frank-Wolfe (FW) algorithms have emerged as an essential class of methods for constrained optimization, especially on large-scale problems. In this paper, we summarize the algorithmic design choices and progress made in the last years of the development of FrankWolfe.jl, a Julia package gathering high-performance implementations of state-of-the-art FW variants. We review key use cases of the library in the recent literature, which match its original dual purpose: first, becoming the de-facto toolbox for practitioners applying FW methods to their problem, and second, offering a modular ecosystem to algorithm designers who experiment with their own variants and implementations of algorithmic blocks. Finally, we demonstrate the performance of several FW variants on important problem classes in several experiments, which we curated in a separate repository for continuous benchmarking.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
The Pivoting Framework: Frank-Wolfe Algorithms with Active Set Size Control
Authors:
Elias Wirth,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
We propose the pivoting meta algorithm (PM) to enhance optimization algorithms that generate iterates as convex combinations of vertices of a feasible region $C\subseteq \mathbb{R}^n$, including Frank-Wolfe (FW) variants. PM guarantees that the active set (the set of vertices in the convex combination) of the modified algorithm remains as small as $\mathrm{dim}(C)+1$ as stipulated by Carathéodory'…
▽ More
We propose the pivoting meta algorithm (PM) to enhance optimization algorithms that generate iterates as convex combinations of vertices of a feasible region $C\subseteq \mathbb{R}^n$, including Frank-Wolfe (FW) variants. PM guarantees that the active set (the set of vertices in the convex combination) of the modified algorithm remains as small as $\mathrm{dim}(C)+1$ as stipulated by Carathéodory's theorem. PM achieves this by reformulating the active set expansion task into an equivalent linear program, which can be efficiently solved using a single pivot step akin to the primal simplex algorithm; the convergence rate of the original algorithms are maintained. Furthermore, we establish the connection between PM and active set identification, in particular showing under mild assumptions that PM applied to the away-step Frank-Wolfe algorithm or the blended pairwise Frank-Wolfe algorithm bounds the active set size by the dimension of the optimal face plus $1$. We provide numerical experiments to illustrate practicality and efficacy on active set size reduction.
△ Less
Submitted 28 October, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Fast convergence of Frank-Wolfe algorithms on polytopes
Authors:
Elias Wirth,
Javier Pena,
Sebastian Pokutta
Abstract:
We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended cu…
▽ More
We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended curvature. These properties depend solely on the polytope and objective function but not on any affine-dependent object like norms. For each one of the above algorithms, we derive rates of convergence ranging from sublinear to linear depending on the degree of the error bound.
△ Less
Submitted 15 February, 2025; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes
Authors:
Elias Wirth,
Javier Pena,
Sebastian Pokutta
Abstract:
Recent papers have shown that the Frank-Wolfe algorithm (FW) with open-loop step-sizes exhibits rates of convergence faster than the iconic $\mathcal{O}(t^{-1})$ rate. In particular, when the minimizer of a strongly convex function over a polytope lies in the relative interior of a feasible region face, the FW with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for…
▽ More
Recent papers have shown that the Frank-Wolfe algorithm (FW) with open-loop step-sizes exhibits rates of convergence faster than the iconic $\mathcal{O}(t^{-1})$ rate. In particular, when the minimizer of a strongly convex function over a polytope lies in the relative interior of a feasible region face, the FW with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for $\ell \in \mathbb{N}_{\geq 2}$ has accelerated convergence $\mathcal{O}(t^{-2})$ in contrast to the rate $Ω(t^{-1-ε})$ attainable with more complex line-search or short-step step-sizes. Given the relevance of this scenario in data science problems, research has grown to explore the settings enabling acceleration in open-loop FW. However, despite FW's well-known affine invariance, existing acceleration results for open-loop FW are affine-dependent. This paper remedies this gap in the literature by merging two recent research trajectories: affine invariance (Wirth et al., 2023b) and open-loop step-sizes (Pena, 2021). In particular, we extend all known non-affine-invariant convergence rates for FW with open-loop step-sizes to affine-invariant results.
△ Less
Submitted 20 January, 2025; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond
Authors:
David Martínez-Rubio,
Elias Wirth,
Sebastian Pokutta
Abstract:
It has recently been shown that ISTA, an unaccelerated optimization method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity and providing the same guarantees as the approximate personalized PageRank algorithm (APPR) [FRS+19]. In this work, we design an accelerated optimization algorithm for this problem that also performs sp…
▽ More
It has recently been shown that ISTA, an unaccelerated optimization method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity and providing the same guarantees as the approximate personalized PageRank algorithm (APPR) [FRS+19]. In this work, we design an accelerated optimization algorithm for this problem that also performs sparse updates, providing an affirmative answer to the COLT 2022 open question of [FY22]. Acceleration provides a reduced dependence on the condition number, while the dependence on the sparsity in our updates differs from the ISTA approach. Further, we design another algorithm by using conjugate directions to achieve an exact solution while exploiting sparsity. Both algorithms lead to faster convergence for certain parameter regimes. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positive-definite $M$-matrix.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Approximate Vanishing Ideal Computations at Scale
Authors:
Elias Wirth,
Hiroshi Kera,
Sebastian Pokutta
Abstract:
The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied b…
▽ More
The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied but their computational complexities remain expensive. In this paper, we scale up the oracle approximate vanishing ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. We prove that the computational complexity of OAVI is not superlinear, as previously claimed, but linear in the number of samples $m$. In addition, we propose two modifications that accelerate OAVI's training time: Our analysis reveals that replacing the pairwise conditional gradients algorithm, one of the solvers used in OAVI, with the faster blended pairwise conditional gradients algorithm leads to an exponential speed-up in the number of features $n$. Finally, using a new inverse Hessian boosting approach, intermediate convex optimization problems can be solved almost instantly, improving OAVI's training time by multiple orders of magnitude in a variety of numerical experiments.
△ Less
Submitted 10 February, 2023; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes
Authors:
Elias Wirth,
Thomas Kerdreux,
Sebastian Pokutta
Abstract:
Frank-Wolfe algorithms (FW) are popular first-order methods for solving constrained convex optimization problems that rely on a linear minimization oracle instead of potentially expensive projection-like oracles. Many works have identified accelerated convergence rates under various structural assumptions on the optimization problem and for specific FW variants when using line-search or short-step…
▽ More
Frank-Wolfe algorithms (FW) are popular first-order methods for solving constrained convex optimization problems that rely on a linear minimization oracle instead of potentially expensive projection-like oracles. Many works have identified accelerated convergence rates under various structural assumptions on the optimization problem and for specific FW variants when using line-search or short-step, requiring feedback from the objective function. Little is known about accelerated convergence regimes when utilizing open-loop step-size rules, a.k.a. FW with pre-determined step-sizes, which are algorithmically extremely simple and stable. Not only is FW with open-loop step-size rules not always subject to the same convergence rate lower bounds as FW with line-search or short-step, but in some specific cases, such as kernel herding in infinite dimensions, it has been empirically observed that FW with open-loop step-size rules enjoys to faster convergence rates than FW with line-search or short-step. We propose a partial answer to this unexplained phenomenon in kernel herding, characterize a general setting for which FW with open-loop step-size rules converges non-asymptotically faster than with line-search or short-step, and derive several accelerated convergence results for FW with open-loop step-size rules. Finally, we demonstrate that FW with open-loop step-sizes can compete with momentum-based open-loop FW variants.
△ Less
Submitted 15 September, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Efficient Online-Bandit Strategies for Minimax Learning Problems
Authors:
Christophe Roux,
Elias Wirth,
Sebastian Pokutta,
Thomas Kerdreux
Abstract:
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $w\in\mathcal{W}$ and the maximization over the empirical distribution $p\in\mathcal{K}$ of the training set indexes, w…
▽ More
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $w\in\mathcal{W}$ and the maximization over the empirical distribution $p\in\mathcal{K}$ of the training set indexes, where $\mathcal{K}$ is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of $\mathcal{K}$ and propose two properties of $\mathcal{K}$ that facilitate designing efficient algorithms. We focus on a specific family of sets $\mathcal{S}_{n,k}$ encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.
△ Less
Submitted 4 June, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.