-
Adaptive Open-Loop Step-Sizes for Accelerated Convergence Rates of the Frank-Wolfe Algorithm
Authors:
Elias Wirth,
Javier Peña,
Sebastian Pokutta
Abstract:
Recent work has shown that in certain settings, the Frank-Wolfe algorithm (FW) with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for a fixed parameter $\ell \in \mathbb{N},\, \ell \geq 2$, attains a convergence rate faster than the traditional $O(t^{-1})$ rate. In particular, when a strong growth property holds, the convergence rate attainable with open-loop step-sizes…
▽ More
Recent work has shown that in certain settings, the Frank-Wolfe algorithm (FW) with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for a fixed parameter $\ell \in \mathbb{N},\, \ell \geq 2$, attains a convergence rate faster than the traditional $O(t^{-1})$ rate. In particular, when a strong growth property holds, the convergence rate attainable with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ is $O(t^{-\ell})$. In this setting there is no single value of the parameter $\ell$ that prevails as superior. This paper shows that FW with log-adaptive open-loop step-sizes $η_t = \frac{2+\log(t+1)}{t+2+\log(t+1)}$ attains a convergence rate that is at least as fast as that attainable with fixed-parameter open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for any value of $\ell \in \mathbb{N},\,\ell\geq 2$. To establish our main convergence results, we extend our previous affine-invariant accelerated convergence results for FW to more general open-loop step-sizes of the form $η_t = g(t)/(t+g(t))$, where $g:\mathbb{N}\to\mathbb{R}_{\geq 0}$ is any non-decreasing function such that the sequence of step-sizes $(η_t)$ is non-increasing. This covers in particular the fixed-parameter case by choosing $g(t) = \ell$ and the log-adaptive case by choosing $g(t) = 2+ \log(t+1)$. To facilitate adoption of log-adaptive open-loop step-sizes, we have incorporated this rule into the {\tt FrankWolfe.jl} software package.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Fast Frank--Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
Authors:
Shota Takahashi,
Sebastian Pokutta,
Akiko Takeda
Abstract:
We propose a Frank--Wolfe (FW) algorithm with an adaptive Bregman step-size strategy for smooth adaptable (also called: relatively smooth) (weakly-) convex functions. This means that the gradient of the objective function is not necessarily Lipschitz continuous, and we only require the smooth adaptable property. Compared to existing FW algorithms, our assumptions are less restrictive. We establish…
▽ More
We propose a Frank--Wolfe (FW) algorithm with an adaptive Bregman step-size strategy for smooth adaptable (also called: relatively smooth) (weakly-) convex functions. This means that the gradient of the objective function is not necessarily Lipschitz continuous, and we only require the smooth adaptable property. Compared to existing FW algorithms, our assumptions are less restrictive. We establish convergence guarantees in various settings, such as sublinear to linear convergence rates, depending on the assumptions for convex and nonconvex objective functions. Assuming that the objective function is weakly convex and satisfies the local quadratic growth condition, we provide both local sublinear and local linear convergence regarding the primal gap. We also propose a variant of the away-step FW algorithm using Bregman distances over polytopes. We establish global faster (up to linear) convergence for convex optimization under the Hölder error bound condition and its local linear convergence for nonconvex optimization under the local quadratic growth condition. Numerical experiments demonstrate that our proposed FW algorithms outperform existing methods.
△ Less
Submitted 13 May, 2025; v1 submitted 5 April, 2025;
originally announced April 2025.
-
Scenario Reduction for Distributionally Robust Optimization
Authors:
Kevin-Martin Aigner,
Sebastian Denzler,
Frauke Liers,
Sebastian Pokutta,
Kartikey Sharma
Abstract:
Stochastic and (distributionally) robust optimization problems often become computationally challenging as the number of scenarios increases. Scenario reduction is therefore a key technique for improving tractability. We introduce a general scenario reduction method for distributionally robust optimization (DRO), which includes stochastic and robust optimization as special cases. Our approach cons…
▽ More
Stochastic and (distributionally) robust optimization problems often become computationally challenging as the number of scenarios increases. Scenario reduction is therefore a key technique for improving tractability. We introduce a general scenario reduction method for distributionally robust optimization (DRO), which includes stochastic and robust optimization as special cases. Our approach constructs the reduced DRO problem by projecting the original ambiguity set onto a reduced set of scenarios. Under mild conditions, we establish bounds on the relative quality of the reduction. The methodology is applicable to random variables following either discrete or continuous probability distributions, with representative scenarios appropriately selected in both cases. Given the relevance of optimization problems with linear and quadratic objectives, we further refine our approach for these settings. Finally, we demonstrate its effectiveness through numerical experiments on mixed-integer benchmark instances from MIPLIB and portfolio optimization problems. Our results show that the proposed approximation significantly reduces solution time while maintaining high solution quality with only minor errors.
△ Less
Submitted 17 March, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Mixed-Integer Optimization for Loopless Flux Distributions in Metabolic Networks
Authors:
Hannah Troppens,
Mathieu Besançon,
St. Elmo Wilken,
Sebastian Pokutta
Abstract:
Constraint-based metabolic models can be used to investigate the intracellular physiology of microorganisms. These models couple genes to reactions, and typically seek to predict metabolite fluxes that optimize some biologically important metric. Classical techniques, like Flux Balance Analysis (FBA), formulate the metabolism of a microbe as an optimization problem where growth rate is maximized.…
▽ More
Constraint-based metabolic models can be used to investigate the intracellular physiology of microorganisms. These models couple genes to reactions, and typically seek to predict metabolite fluxes that optimize some biologically important metric. Classical techniques, like Flux Balance Analysis (FBA), formulate the metabolism of a microbe as an optimization problem where growth rate is maximized. While FBA has found widespread use, it often leads to thermodynamically infeasible solutions that contain internal cycles (loops). To address this shortcoming, Loopless-Flux Balance Analysis (ll-FBA) seeks to predict flux distributions that do not contain these loops. ll-FBA is a disjunctive program, usually reformulated as a mixed-integer program, and is challenging to solve for biological models that often contain thousands of reactions and metabolites. In this paper, we compare various reformulations of ll-FBA and different solution approaches. Overall, the combinatorial Benders' decomposition is the most promising of the tested approaches with which we could solve most instances. However, the model size and numerical instability pose a challenge to the combinatorial Benders' method.
△ Less
Submitted 12 May, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
Secant Line Search for Frank-Wolfe Algorithms
Authors:
Deborah Hendrych,
Mathieu Besançon,
David Martínez-Rubio,
Sebastian Pokutta
Abstract:
We present a new step-size strategy based on the secant method for Frank-Wolfe algorithms. This strategy, which requires mild assumptions about the function under consideration, can be applied to any Frank-Wolfe algorithm. It is as effective as full line search and, in particular, allows for adapting to the local smoothness of the function, such as in (Pedregosa et al., 2020), but comes with a sig…
▽ More
We present a new step-size strategy based on the secant method for Frank-Wolfe algorithms. This strategy, which requires mild assumptions about the function under consideration, can be applied to any Frank-Wolfe algorithm. It is as effective as full line search and, in particular, allows for adapting to the local smoothness of the function, such as in (Pedregosa et al., 2020), but comes with a significantly reduced computational cost, leading to higher effective rates of convergence. We provide theoretical guarantees and demonstrate the effectiveness of the strategy through numerical experiments.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Beyond Short Steps in Frank-Wolfe Algorithms
Authors:
David Martínez-Rubio,
Sebastian Pokutta
Abstract:
We introduce novel techniques to enhance Frank-Wolfe algorithms by leveraging function smoothness beyond traditional short steps. Our study focuses on Frank-Wolfe algorithms with step sizes that incorporate primal-dual guarantees, offering practical stopping criteria. We present a new Frank-Wolfe algorithm utilizing an optimistic framework and provide a primal-dual convergence proof. Additionally,…
▽ More
We introduce novel techniques to enhance Frank-Wolfe algorithms by leveraging function smoothness beyond traditional short steps. Our study focuses on Frank-Wolfe algorithms with step sizes that incorporate primal-dual guarantees, offering practical stopping criteria. We present a new Frank-Wolfe algorithm utilizing an optimistic framework and provide a primal-dual convergence proof. Additionally, we propose a generalized short-step strategy aimed at optimizing a computable primal-dual gap. Interestingly, this new generalized short-step strategy is also applicable to gradient descent algorithms beyond Frank-Wolfe methods. As a byproduct, our work revisits and refines primal-dual techniques for analyzing Frank-Wolfe algorithms, achieving tighter primal-dual convergence rates. Empirical results demonstrate that our optimistic algorithm outperforms existing methods, highlighting its practical advantages.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?
Authors:
Konrad Mundinger,
Max Zimmer,
Aldo Kiem,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem from discrete geometry and combinatorics about coloring the plane avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem as an optimization task wit…
▽ More
We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem from discrete geometry and combinatorics about coloring the plane avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvements in thirty years to the off-diagonal variant of the original problem (Mundinger et al., 2024a). Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional results and numerical insights.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Implicit Riemannian Optimism with Applications to Min-Max Problems
Authors:
Christophe Roux,
David Martínez-Rubio,
Sebastian Pokutta
Abstract:
We introduce a Riemannian optimistic online learning algorithm for Hadamard manifolds based on inexact implicit updates. Unlike prior work, our method can handle in-manifold constraints, and matches the best known regret bounds in the Euclidean setting with no dependence on geometric constants, like the minimum curvature. Building on this, we develop algorithms for g-convex, g-concave smooth min-m…
▽ More
We introduce a Riemannian optimistic online learning algorithm for Hadamard manifolds based on inexact implicit updates. Unlike prior work, our method can handle in-manifold constraints, and matches the best known regret bounds in the Euclidean setting with no dependence on geometric constants, like the minimum curvature. Building on this, we develop algorithms for g-convex, g-concave smooth min-max problems on Hadamard manifolds. Notably, one method nearly matches the gradient oracle complexity of the lower bound for Euclidean problems, for the first time.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Improved algorithms and novel applications of the FrankWolfe.jl library
Authors:
Mathieu Besançon,
Sébastien Designolle,
Jannis Halbey,
Deborah Hendrych,
Dominik Kuzinowicz,
Sebastian Pokutta,
Hannah Troppens,
Daniel Viladrich Herrmannsdoerfer,
Elias Wirth
Abstract:
Frank-Wolfe (FW) algorithms have emerged as an essential class of methods for constrained optimization, especially on large-scale problems. In this paper, we summarize the algorithmic design choices and progress made in the last years of the development of FrankWolfe.jl, a Julia package gathering high-performance implementations of state-of-the-art FW variants. We review key use cases of the libra…
▽ More
Frank-Wolfe (FW) algorithms have emerged as an essential class of methods for constrained optimization, especially on large-scale problems. In this paper, we summarize the algorithmic design choices and progress made in the last years of the development of FrankWolfe.jl, a Julia package gathering high-performance implementations of state-of-the-art FW variants. We review key use cases of the library in the recent literature, which match its original dual purpose: first, becoming the de-facto toolbox for practitioners applying FW methods to their problem, and second, offering a modular ecosystem to algorithm designers who experiment with their own variants and implementations of algorithmic blocks. Finally, we demonstrate the performance of several FW variants on important problem classes in several experiments, which we curated in a separate repository for continuous benchmarking.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
State-of-the-art Methods for Pseudo-Boolean Solving with SCIP
Authors:
Gioni Mexi,
Dominik Kamp,
Yuji Shinano,
Shanwen Pu,
Alexander Hoen,
Ksenia Bestuzheva,
Christopher Hojny,
Matthias Walter,
Marc E. Pfetsch,
Sebastian Pokutta,
Thorsten Koch
Abstract:
The Pseudo-Boolean problem deals with linear or polynomial constraints with integer coefficients over Boolean variables. The objective lies in optimizing a linear objective function, or finding a feasible solution, or finding a solution that satisfies as many constraints as possible. In the 2024 Pseudo-Boolean competition, solvers incorporating the SCIP framework won five out of six categories it…
▽ More
The Pseudo-Boolean problem deals with linear or polynomial constraints with integer coefficients over Boolean variables. The objective lies in optimizing a linear objective function, or finding a feasible solution, or finding a solution that satisfies as many constraints as possible. In the 2024 Pseudo-Boolean competition, solvers incorporating the SCIP framework won five out of six categories it was competing in. From a total of 1,207 instances, SCIP successfully solved 759, while its parallel version FiberSCIP solved 776. Based on the results from the competition, we further enhanced SCIP's Pseudo-Boolean capabilities. This article discusses the results and presents the winning algorithmic ideas.
△ Less
Submitted 8 January, 2025; v1 submitted 6 January, 2025;
originally announced January 2025.
-
S-CFE: Simple Counterfactual Explanations
Authors:
Shpresim Sadiku,
Moritz Wagner,
Sai Ganesh Nagarajan,
Sebastian Pokutta
Abstract:
We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the probl…
▽ More
We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.
△ Less
Submitted 28 January, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Flexible block-iterative analysis for the Frank-Wolfe algorithm
Authors:
Gábor Braun,
Sebastian Pokutta,
Zev Woodstock
Abstract:
We prove that the block-coordinate Frank-Wolfe (BCFW) algorithm converges with state-of-the-art rates in both convex and nonconvex settings under a very mild "block-iterative" assumption, newly allowing for (I) progress without activating the most-expensive linear minimization oracle(s), LMO(s), at every iteration, (II) parallelized updates that do not require all LMOs, and therefore (III) determi…
▽ More
We prove that the block-coordinate Frank-Wolfe (BCFW) algorithm converges with state-of-the-art rates in both convex and nonconvex settings under a very mild "block-iterative" assumption, newly allowing for (I) progress without activating the most-expensive linear minimization oracle(s), LMO(s), at every iteration, (II) parallelized updates that do not require all LMOs, and therefore (III) deterministic parallel update strategies that take into account the numerical cost of the problem's LMOs. Our results apply for short-step BCFW as well as an adaptive method for convex functions. New relationships between updated coordinates and primal progress are proven, and a favorable speedup is demonstrated using FrankWolfe.jl.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Better bounds on Grothendieck constants of finite orders
Authors:
Sébastien Designolle,
Tamás Vértesi,
Sebastian Pokutta
Abstract:
Grothendieck constants $K_G(d)$ bound the advantage of $d$-dimensional strategies over $1$-dimensional ones in a specific optimisation task. They have applications ranging from approximation algorithms to quantum nonlocality. However, apart from $d=2$, their values are unknown. Here, we exploit a recent Frank-Wolfe approach to provide good candidates for lower bounding some of these constants. The…
▽ More
Grothendieck constants $K_G(d)$ bound the advantage of $d$-dimensional strategies over $1$-dimensional ones in a specific optimisation task. They have applications ranging from approximation algorithms to quantum nonlocality. However, apart from $d=2$, their values are unknown. Here, we exploit a recent Frank-Wolfe approach to provide good candidates for lower bounding some of these constants. The complete proof relies on solving difficult binary quadratic optimisation problems. For $d\in\{3,4,5\}$, we construct specific rectangular instances that we can solve to certify better bounds than those previously known; by monotonicity, our lower bounds improve on the state of the art for $d\leqslant9$. For $d\in\{4,7,8\}$, we exploit elegant structures to build highly symmetric instances achieving even greater bounds; however, we can only solve them heuristically. We also recall the standard relation with violations of Bell inequalities and elaborate on it to interpret generalised Grothendieck constants $K_G(d\mapsto2)$ as the advantage of complex quantum mechanics over real quantum mechanics. Motivated by this connection, we also improve the bounds on $K_G(d\mapsto2)$.
△ Less
Submitted 20 December, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
The Pivoting Framework: Frank-Wolfe Algorithms with Active Set Size Control
Authors:
Elias Wirth,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
We propose the pivoting meta algorithm (PM) to enhance optimization algorithms that generate iterates as convex combinations of vertices of a feasible region $C\subseteq \mathbb{R}^n$, including Frank-Wolfe (FW) variants. PM guarantees that the active set (the set of vertices in the convex combination) of the modified algorithm remains as small as $\mathrm{dim}(C)+1$ as stipulated by Carathéodory'…
▽ More
We propose the pivoting meta algorithm (PM) to enhance optimization algorithms that generate iterates as convex combinations of vertices of a feasible region $C\subseteq \mathbb{R}^n$, including Frank-Wolfe (FW) variants. PM guarantees that the active set (the set of vertices in the convex combination) of the modified algorithm remains as small as $\mathrm{dim}(C)+1$ as stipulated by Carathéodory's theorem. PM achieves this by reformulating the active set expansion task into an equivalent linear program, which can be efficiently solved using a single pivot step akin to the primal simplex algorithm; the convergence rate of the original algorithms are maintained. Furthermore, we establish the connection between PM and active set identification, in particular showing under mild assumptions that PM applied to the away-step Frank-Wolfe algorithm or the blended pairwise Frank-Wolfe algorithm bounds the active set size by the dimension of the optimal face plus $1$. We provide numerical experiments to illustrate practicality and efficacy on active set size reduction.
△ Less
Submitted 28 October, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Fast convergence of Frank-Wolfe algorithms on polytopes
Authors:
Elias Wirth,
Javier Pena,
Sebastian Pokutta
Abstract:
We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended cu…
▽ More
We provide a template to derive convergence rates for the following popular versions of the Frank-Wolfe algorithm on polytopes: vanilla Frank-Wolfe, Frank-Wolfe with away steps, Frank-Wolfe with blended pairwise steps, and Frank-Wolfe with in-face directions. Our template shows how the convergence rates follow from two affine-invariant properties of the problem, namely, error bound and extended curvature. These properties depend solely on the polytope and objective function but not on any affine-dependent object like norms. For each one of the above algorithms, we derive rates of convergence ranging from sublinear to linear depending on the degree of the error bound.
△ Less
Submitted 15 February, 2025; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Extending the Continuum of Six-Colorings
Authors:
Konrad Mundinger,
Sebastian Pokutta,
Christoph Spiegel,
Max Zimmer
Abstract:
We present two novel six-colorings of the Euclidean plane that avoid monochromatic pairs of points at unit distance in five colors and monochromatic pairs at another specified distance $d$ in the sixth color. Such colorings have previously been known to exist for $0.41 < \sqrt{2} - 1 \le d \le 1 / \sqrt{5} < 0.45$. Our results significantly expand that range to $0.354 \le d \le 0.657$, the first i…
▽ More
We present two novel six-colorings of the Euclidean plane that avoid monochromatic pairs of points at unit distance in five colors and monochromatic pairs at another specified distance $d$ in the sixth color. Such colorings have previously been known to exist for $0.41 < \sqrt{2} - 1 \le d \le 1 / \sqrt{5} < 0.45$. Our results significantly expand that range to $0.354 \le d \le 0.657$, the first improvement in 30 years. Notably, the constructions underlying this were derived by formalizing colorings suggested by a custom machine learning approach.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Neural Parameter Regression for Explicit Representations of PDE Solution Operators
Authors:
Konrad Mundinger,
Max Zimmer,
Sebastian Pokutta
Abstract:
We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizi…
▽ More
We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizing each solution based on specific initial conditions, it effectively approximates a mapping between function spaces. Our method enhances parameter efficiency by incorporating low-rank matrices, thereby boosting computational efficiency and scalability. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference, even in cases of out-of-distribution examples.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Norm-induced Cuts: Optimization with Lipschitzian Black-box Functions
Authors:
Adrian Göß,
Alexander Martin,
Sebastian Pokutta,
Kartikey Sharma
Abstract:
In this paper, we consider a finite dimensional optimization problem minimizing a continuous objective on a compact domain subject to a multi-dimensional constraint function. For the latter, we only assume the availability of a Lipschitz property. In recent literature methods based on non-convex outer approximation are proposed for tackling one dimensional equality constraints on bounded polyhedra…
▽ More
In this paper, we consider a finite dimensional optimization problem minimizing a continuous objective on a compact domain subject to a multi-dimensional constraint function. For the latter, we only assume the availability of a Lipschitz property. In recent literature methods based on non-convex outer approximation are proposed for tackling one dimensional equality constraints on bounded polyhedral domains, which are Lipschitz with respect to the maximum norm. To the best of our knowledge, however, there exists no non-convex outer approximation method for a general problem class. We introduce a meta-level solution framework to solve such problems and tackle the underlying theoretical foundations. Considering the feasible domain without the constraint function as manageable, our method relaxes the multidimensional constraint and iteratively refines the feasible region by means of norm-induced cuts, relying on an oracle for the resulting sub-problems. We show the method's correctness and investigate the problem complexity. In order to account for discussions about functionality, limits, and extensions, we present computational examples including illustrations.
△ Less
Submitted 23 September, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
Authors:
David Martínez-Rubio,
Christophe Roux,
Sebastian Pokutta
Abstract:
In this work, we analyze two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point. We quantify their rates of convergence and produce different variants with several trade-offs. Crucially, we show the iterates naturally stay in a ball around an optimizer, of radius depending on the initial distance and,…
▽ More
In this work, we analyze two of the most fundamental algorithms in geodesically convex optimization: Riemannian gradient descent and (possibly inexact) Riemannian proximal point. We quantify their rates of convergence and produce different variants with several trade-offs. Crucially, we show the iterates naturally stay in a ball around an optimizer, of radius depending on the initial distance and, in some cases, on the curvature. In contrast, except for limited cases, previous works bounded the maximum distance between iterates and an optimizer only by assumption, leading to incomplete analyses and unquantified rates. We also provide an implementable inexact proximal point algorithm yielding new results on minmax problems, and we prove several new useful properties of Riemannian proximal methods: they work when positive curvature is present, the proximal operator does not move points away from any optimizer, and we quantify the smoothness of its induced Moreau envelope. Further, we explore beyond our theory with empirical tests.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Network Design for the Traffic Assignment Problem with Mixed-Integer Frank-Wolfe
Authors:
Kartikey Sharma,
Deborah Hendrych,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
We tackle the network design problem for centralized traffic assignment, which can be cast as a mixed-integer convex optimization (MICO) problem. For this task, we propose different formulations and solution methods in both a deterministic and a stochastic setting in which the demand is unknown in the design phase. We leverage the recently proposed Boscia framework, which can solve MICO problems w…
▽ More
We tackle the network design problem for centralized traffic assignment, which can be cast as a mixed-integer convex optimization (MICO) problem. For this task, we propose different formulations and solution methods in both a deterministic and a stochastic setting in which the demand is unknown in the design phase. We leverage the recently proposed Boscia framework, which can solve MICO problems when the main nonlinearity stems from a differentiable objective function. Boscia tackles these problems by branch-and-bound with continuous relaxations solved approximately with Frank-Wolfe algorithms. We compare different linear relaxations and the corresponding subproblems solved by Frank-Wolfe, and alternative problem formulations to identify the situations in which each performs best. Our experiments evaluate the different approaches on instances from the Transportation Networks library and highlight the suitability of the mixed-integer Frank-Wolfe algorithm for this problem. In particular, we find that the Boscia framework is particularly applicable to this problem and that a mixed-integer linear Frank-Wolfe subproblem performs well for the deterministic case, while a penalty-based approach, with decoupled feasible regions for the design and flow variables, dominates other approaches for stochastic instances with many scenarios.
△ Less
Submitted 7 February, 2025; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Solving the Optimal Experiment Design Problem with Mixed-Integer Convex Methods
Authors:
Deborah Hendrych,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
We tackle the Optimal Experiment Design Problem, which consists of choosing experiments to run or observations to select from a finite set to estimate the parameters of a system. The objective is to maximize some measure of information gained about the system from the observations, leading to a convex integer optimization problem. We leverage Boscia.jl , a recent algorithmic framework, which is ba…
▽ More
We tackle the Optimal Experiment Design Problem, which consists of choosing experiments to run or observations to select from a finite set to estimate the parameters of a system. The objective is to maximize some measure of information gained about the system from the observations, leading to a convex integer optimization problem. We leverage Boscia.jl , a recent algorithmic framework, which is based on a nonlinear branch-and-bound algorithm with node relaxations solved to approximate optimality using Frank-Wolfe algorithms. One particular advantage of the method is its efficient utilization of the polytope formed by the original constraints which is preserved by the method, unlike alternative methods relying on epigraph-based formulations. We assess our method against both generic and specialized convex mixed-integer approaches. Computational results highlight the performance of our proposed method, especially on large and challenging instances.
△ Less
Submitted 13 May, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
The Four-Color Ramsey Multiplicity of Triangles
Authors:
Aldo Kiem,
Sebastian Pokutta,
Christoph Spiegel
Abstract:
We study a generalization of a famous result of Goodman and establish that asymptotically at least a $1/256$ fraction of all triangles needs to be monochromatic in any four-coloring of the edges of a complete graph. We also show that any large enough extremal construction must be based on a blow-up of one of the two $R(3,3,3)$ Ramsey-colorings of $K_{16}$. This result is obtained through an effici…
▽ More
We study a generalization of a famous result of Goodman and establish that asymptotically at least a $1/256$ fraction of all triangles needs to be monochromatic in any four-coloring of the edges of a complete graph. We also show that any large enough extremal construction must be based on a blow-up of one of the two $R(3,3,3)$ Ramsey-colorings of $K_{16}$. This result is obtained through an efficient flag algebra formulation by exploiting problem-specific combinatorial symmetries that also allows us to study some related problems.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Strong Convexity of Sets in Riemannian Manifolds
Authors:
Damien Scieur,
David Martínez-Rubio,
Thomas Kerdreux,
Alexandre d'Aspremont,
Sebastian Pokutta
Abstract:
Curvature properties of convex objects, such as strong convexity, are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, p…
▽ More
Curvature properties of convex objects, such as strong convexity, are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic strongly convex nature of sets, and analyze the convergence of optimization algorithms over those sets. In particular, we demonstrate that the Riemannian Frank-Wolfe algorithm enjoys a global linear convergence rate when the Riemannian scaling inequalities hold.
△ Less
Submitted 13 December, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Challenges and Opportunities in Quantum Optimization
Authors:
Amira Abbas,
Andris Ambainis,
Brandon Augustino,
Andreas Bärtschi,
Harry Buhrman,
Carleton Coffrin,
Giorgio Cortiana,
Vedran Dunjko,
Daniel J. Egger,
Bruce G. Elmegreen,
Nicola Franco,
Filippo Fratini,
Bryce Fuller,
Julien Gacon,
Constantin Gonciulea,
Sander Gribling,
Swati Gupta,
Stuart Hadfield,
Raoul Heese,
Gerhard Kircher,
Thomas Kleinert,
Thorsten Koch,
Georgios Korpas,
Steve Lenk,
Jakub Marecek
, et al. (21 additional authors not shown)
Abstract:
Recent advances in quantum computers are demonstrating the ability to solve problems at a scale beyond brute force classical simulation. As such, a widespread interest in quantum algorithms has developed in many areas, with optimization being one of the most pronounced domains. Across computer science and physics, there are a number of different approaches for major classes of optimization problem…
▽ More
Recent advances in quantum computers are demonstrating the ability to solve problems at a scale beyond brute force classical simulation. As such, a widespread interest in quantum algorithms has developed in many areas, with optimization being one of the most pronounced domains. Across computer science and physics, there are a number of different approaches for major classes of optimization problems, such as combinatorial optimization, convex optimization, non-convex optimization, and stochastic extensions. This work draws on multiple approaches to study quantum optimization. Provably exact versus heuristic settings are first explained using computational complexity theory - highlighting where quantum advantage is possible in each context. Then, the core building blocks for quantum optimization algorithms are outlined to subsequently define prominent problem classes and identify key open questions that, if answered, will advance the field. The effects of scaling relevant problems on noisy quantum devices are also outlined in detail, alongside meaningful benchmarking problems. We underscore the importance of benchmarking by proposing clear metrics to conduct appropriate comparisons with classical optimization techniques. Lastly, we highlight two domains - finance and sustainability - as rich sources of optimization problems that could be used to benchmark, and eventually validate, the potential real-world impact of quantum optimization.
△ Less
Submitted 17 November, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
GSE: Group-wise Sparse and Explainable Adversarial Attacks
Authors:
Shpresim Sadiku,
Moritz Wagner,
Sebastian Pokutta
Abstract:
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding ligh…
▽ More
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
△ Less
Submitted 27 November, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Splitting the Conditional Gradient Algorithm
Authors:
Zev Woodstock,
Sebastian Pokutta
Abstract:
We propose a novel generalization of the conditional gradient (CG / Frank-Wolfe) algorithm for minimizing a smooth function $f$ under an intersection of compact convex sets, using a first-order oracle for $\nabla f$ and linear minimization oracles (LMOs) for the individual sets. Although this computational framework presents many advantages, there are only a small number of algorithms which requir…
▽ More
We propose a novel generalization of the conditional gradient (CG / Frank-Wolfe) algorithm for minimizing a smooth function $f$ under an intersection of compact convex sets, using a first-order oracle for $\nabla f$ and linear minimization oracles (LMOs) for the individual sets. Although this computational framework presents many advantages, there are only a small number of algorithms which require one LMO evaluation per set per iteration; furthermore, these algorithms require $f$ to be convex. Our algorithm appears to be the first in this class which is proven to also converge in the nonconvex setting. Our approach combines a penalty method and a product-space relaxation. We show that one conditional gradient step is a sufficient subroutine for our penalty method to converge, and we provide several analytical results on the product-space relaxation's properties and connections to other problems in optimization. We prove that our average Frank-Wolfe gap converges at a rate of $\mathcal{O}(\ln t/\sqrt{t})$, -- only a log factor worse than the vanilla CG algorithm with one set.
△ Less
Submitted 10 October, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
The Frank-Wolfe algorithm: a short introduction
Authors:
Sebastian Pokutta
Abstract:
In this paper we provide an introduction to the Frank-Wolfe algorithm, a method for smooth convex optimization in the presence of (relatively) complicated constraints. We will present the algorithm, introduce key concepts, and establish important baseline results, such as e.g., primal and dual convergence. We will also discuss some of its properties, present a new adaptive step-size strategy as we…
▽ More
In this paper we provide an introduction to the Frank-Wolfe algorithm, a method for smooth convex optimization in the presence of (relatively) complicated constraints. We will present the algorithm, introduce key concepts, and establish important baseline results, such as e.g., primal and dual convergence. We will also discuss some of its properties, present a new adaptive step-size strategy as well as applications.
△ Less
Submitted 28 November, 2023; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Symmetric multipartite Bell inequalities via Frank-Wolfe algorithms
Authors:
Sébastien Designolle,
Tamás Vértesi,
Sebastian Pokutta
Abstract:
In multipartite Bell scenarios, we study the nonlocality robustness of the Greenberger-Horne-Zeilinger (GHZ) state. When each party performs planar measurements forming a regular polygon, we exploit the symmetry of the resulting correlation tensor to drastically accelerate the computation of (i) a Bell inequality via Frank-Wolfe algorithms, and (ii) the corresponding local bound. The Bell inequali…
▽ More
In multipartite Bell scenarios, we study the nonlocality robustness of the Greenberger-Horne-Zeilinger (GHZ) state. When each party performs planar measurements forming a regular polygon, we exploit the symmetry of the resulting correlation tensor to drastically accelerate the computation of (i) a Bell inequality via Frank-Wolfe algorithms, and (ii) the corresponding local bound. The Bell inequalities obtained are facets of the symmetrised local polytope and they give the best known upper bounds on the nonlocality robustness of the GHZ state for three to ten parties. Moreover, for four measurements per party, we generalise our facets and hence show, for any number of parties, an improvement on Mermin's inequality in terms of noise robustness. We also compute the detection efficiency of our inequalities and show that some give rise to activation of nonlocality in star networks, a property that was only shown with an infinite number of measurements.
△ Less
Submitted 8 February, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes
Authors:
Elias Wirth,
Javier Pena,
Sebastian Pokutta
Abstract:
Recent papers have shown that the Frank-Wolfe algorithm (FW) with open-loop step-sizes exhibits rates of convergence faster than the iconic $\mathcal{O}(t^{-1})$ rate. In particular, when the minimizer of a strongly convex function over a polytope lies in the relative interior of a feasible region face, the FW with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for…
▽ More
Recent papers have shown that the Frank-Wolfe algorithm (FW) with open-loop step-sizes exhibits rates of convergence faster than the iconic $\mathcal{O}(t^{-1})$ rate. In particular, when the minimizer of a strongly convex function over a polytope lies in the relative interior of a feasible region face, the FW with open-loop step-sizes $η_t = \frac{\ell}{t+\ell}$ for $\ell \in \mathbb{N}_{\geq 2}$ has accelerated convergence $\mathcal{O}(t^{-2})$ in contrast to the rate $Ω(t^{-1-ε})$ attainable with more complex line-search or short-step step-sizes. Given the relevance of this scenario in data science problems, research has grown to explore the settings enabling acceleration in open-loop FW. However, despite FW's well-known affine invariance, existing acceleration results for open-loop FW are affine-dependent. This paper remedies this gap in the literature by merging two recent research trajectories: affine invariance (Wirth et al., 2023b) and open-loop step-sizes (Pena, 2021). In particular, we extend all known non-affine-invariant convergence rates for FW with open-loop step-sizes to affine-invariant results.
△ Less
Submitted 20 January, 2025; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Existence and Uniqueness of Solutions of the Koopman--von Neumann Equation on Bounded Domains
Authors:
Marian Stengl,
Patrick Gelß,
Stefan Klus,
Sebastian Pokutta
Abstract:
The Koopman--von Neumann equation describes the evolution of a complex-valued wavefunction corresponding to the probability distribution given by an associated classical Liouville equation. Typically, it is defined on the whole Euclidean space. The investigation of bounded domains, particularly in practical scenarios involving quantum-based simulations of dynamical systems, has received little att…
▽ More
The Koopman--von Neumann equation describes the evolution of a complex-valued wavefunction corresponding to the probability distribution given by an associated classical Liouville equation. Typically, it is defined on the whole Euclidean space. The investigation of bounded domains, particularly in practical scenarios involving quantum-based simulations of dynamical systems, has received little attention so far. We consider the Koopman--von Neumann equation associated with an ordinary differential equation on a bounded domain whose trajectories are contained in the set's closure. Our main results are the construction of a strongly continuous semigroup together with the existence and uniqueness of solutions of the associated initial value problem. To this end, a functional-analytic framework connected to Sobolev spaces is proposed and analyzed. Moreover, the connection of the Koopman--von Neumann framework to transport equations is highlighted.
△ Less
Submitted 1 October, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Kissing polytopes
Authors:
Antoine Deza,
Shmuel Onn,
Sebastian Pokutta,
Lionel Pournin
Abstract:
We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in…
▽ More
We investigate the following question: how close can two disjoint lattice polytopes contained in a fixed hypercube be? This question stems from various contexts where the minimal distance between such polytopes appears in complexity bounds of optimization algorithms. We provide nearly matching lower and upper bounds on this distance and discuss its exact computation. We also give similar bounds in the case of disjoint rational polytopes whose binary encoding length is prescribed.
△ Less
Submitted 29 November, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties
Authors:
David Martínez-Rubio,
Christophe Roux,
Christopher Criscitiello,
Sebastian Pokutta
Abstract:
In this work, we study optimization problems of the form $\min_x \max_y f(x, y)$, where $f(x, y)$ is defined on a product Riemannian manifold $\mathcal{M} \times \mathcal{N}$ and is $μ_x$-strongly geodesically convex (g-convex) in $x$ and $μ_y$-strongly g-concave in $y$, for $μ_x, μ_y \geq 0$. We design accelerated methods when $f$ is $(L_x, L_y, L_{xy})$-smooth and $\mathcal{M}$, $\mathcal{N}$ ar…
▽ More
In this work, we study optimization problems of the form $\min_x \max_y f(x, y)$, where $f(x, y)$ is defined on a product Riemannian manifold $\mathcal{M} \times \mathcal{N}$ and is $μ_x$-strongly geodesically convex (g-convex) in $x$ and $μ_y$-strongly g-concave in $y$, for $μ_x, μ_y \geq 0$. We design accelerated methods when $f$ is $(L_x, L_y, L_{xy})$-smooth and $\mathcal{M}$, $\mathcal{N}$ are Hadamard. To that aim we introduce new g-convex optimization results, of independent interest: we show global linear convergence for metric-projected Riemannian gradient descent and improve existing accelerated methods by reducing geometric constants. Additionally, we complete the analysis of two previous works applying to the Riemannian min-max case by removing an assumption about iterates staying in a pre-specified compact set.
△ Less
Submitted 30 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Learning Cuts via Enumeration Oracles
Authors:
Daniel Thuerck,
Boro Sofranac,
Marc E. Pfetsch,
Sebastian Pokutta
Abstract:
Cutting-planes are one of the most important building blocks for solving large-scale integer programming (IP) problems to (near) optimality. The majority of cutting plane approaches rely on explicit rules to derive valid inequalities that can separate the target point from the feasible set. Local cuts, on the other hand, seek to directly derive the facets of the underlying polyhedron and use them…
▽ More
Cutting-planes are one of the most important building blocks for solving large-scale integer programming (IP) problems to (near) optimality. The majority of cutting plane approaches rely on explicit rules to derive valid inequalities that can separate the target point from the feasible set. Local cuts, on the other hand, seek to directly derive the facets of the underlying polyhedron and use them as cutting planes. However, current approaches rely on solving Linear Programming (LP) problems in order to derive such a hyperplane. In this paper, we present a novel generic approach for learning the facets of the underlying polyhedron by accessing it implicitly via an enumeration oracle in a reduced dimension. This is achieved by embedding the oracle in a variant of the Frank-Wolfe algorithm which is capable of generating strong cutting planes, effectively turning the enumeration oracle into a separation oracle. We demonstrate the effectiveness of our approach with a case study targeting the multidimensional knapsack problem (MKP).
△ Less
Submitted 25 January, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Data-driven Distributionally Robust Optimization over Time
Authors:
Kevin-Martin Aigner,
Andreas Bärmann,
Kristin Braun,
Frauke Liers,
Sebastian Pokutta,
Oskar Schneider,
Kartikey Sharma,
Sebastian Tschuppik
Abstract:
Stochastic Optimization (SO) is a classical approach for optimization under uncertainty that typically requires knowledge about the probability distribution of uncertain parameters. As the latter is often unknown, Distributionally Robust Optimization (DRO) provides a strong alternative that determines the best guaranteed solution over a set of distributions (ambiguity set). In this work, we presen…
▽ More
Stochastic Optimization (SO) is a classical approach for optimization under uncertainty that typically requires knowledge about the probability distribution of uncertain parameters. As the latter is often unknown, Distributionally Robust Optimization (DRO) provides a strong alternative that determines the best guaranteed solution over a set of distributions (ambiguity set). In this work, we present an approach for DRO over time that uses online learning and scenario observations arriving as a data stream to learn more about the uncertainty. Our robust solutions adapt over time and reduce the cost of protection with shrinking ambiguity. For various kinds of ambiguity sets, the robust solutions converge to the SO solution. Our algorithm achieves the optimization and learning goals without solving the DRO problem exactly at any step. We also provide a regret bound for the quality of the online strategy which converges at a rate of $\mathcal{O}(\log T / \sqrt{T})$, where $T$ is the number of iterations. Furthermore, we illustrate the effectiveness of our procedure by numerical experiments on mixed-integer optimization instances from popular benchmark libraries and give practical examples stemming from telecommunications and routing. Our algorithm is able to solve the DRO over time problem significantly faster than standard reformulations.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Online Learning for Scheduling MIP Heuristics
Authors:
Antonia Chmiela,
Ambros Gleixner,
Pawel Lichocki,
Sebastian Pokutta
Abstract:
Mixed Integer Programming (MIP) is NP-hard, and yet modern solvers often solve large real-world problems within minutes. This success can partially be attributed to heuristics. Since their behavior is highly instance-dependent, relying on hard-coded rules derived from empirical testing on a large heterogeneous corpora of benchmark instances might lead to sub-optimal performance. In this work, we p…
▽ More
Mixed Integer Programming (MIP) is NP-hard, and yet modern solvers often solve large real-world problems within minutes. This success can partially be attributed to heuristics. Since their behavior is highly instance-dependent, relying on hard-coded rules derived from empirical testing on a large heterogeneous corpora of benchmark instances might lead to sub-optimal performance. In this work, we propose an online learning approach that adapts the application of heuristics towards the single instance at hand. We replace the commonly used static heuristic handling with an adaptive framework exploiting past observations about the heuristic's behavior to make future decisions. In particular, we model the problem of controlling Large Neighborhood Search and Diving - two broad and complex classes of heuristics - as a multi-armed bandit problem. Going beyond existing work in the literature, we control two different classes of heuristics simultaneously by a single learning agent. We verify our approach numerically and show consistent node reductions over the MIPLIB 2017 Benchmark set. For harder instances that take at least 1000 seconds to solve, we observe a speedup of 4%.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond
Authors:
David Martínez-Rubio,
Elias Wirth,
Sebastian Pokutta
Abstract:
It has recently been shown that ISTA, an unaccelerated optimization method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity and providing the same guarantees as the approximate personalized PageRank algorithm (APPR) [FRS+19]. In this work, we design an accelerated optimization algorithm for this problem that also performs sp…
▽ More
It has recently been shown that ISTA, an unaccelerated optimization method, presents sparse updates for the $\ell_1$-regularized personalized PageRank problem, leading to cheap iteration complexity and providing the same guarantees as the approximate personalized PageRank algorithm (APPR) [FRS+19]. In this work, we design an accelerated optimization algorithm for this problem that also performs sparse updates, providing an affirmative answer to the COLT 2022 open question of [FY22]. Acceleration provides a reduced dependence on the condition number, while the dependence on the sparsity in our updates differs from the ISTA approach. Further, we design another algorithm by using conjugate directions to achieve an exact solution while exploiting sparsity. Both algorithms lead to faster convergence for certain parameter regimes. Our findings apply beyond PageRank and work for any quadratic objective whose Hessian is a positive-definite $M$-matrix.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
On a Frank-Wolfe Approach for Abs-smooth Functions
Authors:
Timo Kreimeier,
Sebastian Pokutta,
Andrea Walther,
Zev Woodstock
Abstract:
We propose an algorithm which appears to be the first bridge between the fields of conditional gradient methods and abs-smooth optimization. Our problem setting is motivated by various applications that lead to nonsmoothness, such as $\ell_1$ regularization, phase retrieval problems, or ReLU activation in machine learning. To handle the nonsmoothness in our problem, we propose a generalization to…
▽ More
We propose an algorithm which appears to be the first bridge between the fields of conditional gradient methods and abs-smooth optimization. Our problem setting is motivated by various applications that lead to nonsmoothness, such as $\ell_1$ regularization, phase retrieval problems, or ReLU activation in machine learning. To handle the nonsmoothness in our problem, we propose a generalization to the traditional Frank-Wolfe gap and prove that first-order minimality is achieved when it vanishes. We derive a convergence rate for our algorithm which is {\em identical} to the smooth case. Although our algorithm necessitates the solution of a subproblem which is more challenging than the smooth case, we provide an efficient numerical method for its partial solution, and we identify several applications where our approach fully solves the subproblem. Numerical and theoretical convergence is demonstrated, yielding several conjectures.
△ Less
Submitted 19 July, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Improved local models and new Bell inequalities via Frank-Wolfe algorithms
Authors:
Sébastien Designolle,
Gabriele Iommazzo,
Mathieu Besançon,
Sebastian Knebel,
Patrick Gelß,
Sebastian Pokutta
Abstract:
In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities. We take advantage of the recent developments in so-called Frank-Wolfe algorithms to significantly increase the convergence rate of existing methods. As an application, we…
▽ More
In Bell scenarios with two outcomes per party, we algorithmically consider the two sides of the membership problem for the local polytope: constructing local models and deriving separating hyperplanes, that is, Bell inequalities. We take advantage of the recent developments in so-called Frank-Wolfe algorithms to significantly increase the convergence rate of existing methods. As an application, we study the threshold value for the nonlocality of two-qubit Werner states under projective measurements. Here, we improve on both the upper and lower bounds present in the literature. Importantly, our bounds are entirely analytical; moreover, they yield refined bounds on the value of the Grothendieck constant of order three: $1.4367\leqslant K_G(3)\leqslant1.4546$. We also demonstrate the efficiency of our approach in multipartite Bell scenarios, and present the first local models for all projective measurements with visibilities noticeably higher than the entanglement threshold. We make our entire code accessible as a Julia library called BellPolytopes.jl.
△ Less
Submitted 18 October, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Alternating Linear Minimization: Revisiting von Neumann's alternating projections
Authors:
Gábor Braun,
Sebastian Pokutta,
Robert Weismantel
Abstract:
In 1933 von Neumann proved a beautiful result that one can approximate a point in the intersection of two convex sets by alternating projections, i.e., successively projecting on one set and then the other. This algorithm assumes that one has access to projection operators for both sets. In this note, we consider the much weaker setup where we have only access to linear minimization oracles over t…
▽ More
In 1933 von Neumann proved a beautiful result that one can approximate a point in the intersection of two convex sets by alternating projections, i.e., successively projecting on one set and then the other. This algorithm assumes that one has access to projection operators for both sets. In this note, we consider the much weaker setup where we have only access to linear minimization oracles over the convex sets and present an algorithm to find a point in the intersection of two convex sets.
△ Less
Submitted 17 February, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Accelerated Riemannian Optimization: Handling Constraints with a Prox to Bound Geometric Penalties
Authors:
David Martínez-Rubio,
Sebastian Pokutta
Abstract:
We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in a wide class of Hadamard manifolds. We achieve the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors.
Crucially, we can enforce our method to stay within a compact set we define. Prior…
▽ More
We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in a wide class of Hadamard manifolds. We achieve the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors.
Crucially, we can enforce our method to stay within a compact set we define. Prior fully accelerated works \emph{resort to assuming} that the iterates of their algorithms stay in some pre-specified compact set, except for two previous methods of limited applicability. For our manifolds, this solves the open question in [KY22] about obtaining global general acceleration without iterates assumptively staying in the feasible set.
In our solution, we design an accelerated Riemannian inexact proximal point algorithm, which is a result that was unknown even with exact access to the proximal operator, and is of independent interest. For smooth functions, we show we can implement the prox step inexactly with first-order methods in Riemannian balls of certain diameter that is enough for global accelerated optimization.
△ Less
Submitted 13 January, 2023; v1 submitted 26 November, 2022;
originally announced November 2022.
-
Conditional Gradient Methods
Authors:
Gábor Braun,
Alejandro Carderera,
Cyrille W. Combettes,
Hamed Hassani,
Amin Karbasi,
Aryan Mokhtari,
Sebastian Pokutta
Abstract:
The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especially useful in convex optimization when linear optimization is cheaper than projections.
The selection of the material has been guided by the principle of highli…
▽ More
The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especially useful in convex optimization when linear optimization is cheaper than projections.
The selection of the material has been guided by the principle of highlighting crucial ideas as well as presenting new approaches that we believe might become important in the future, with ample citations even of old works imperative in the development of newer methods. Yet, our selection is sometimes biased, and need not reflect consensus of the research community, and we have certainly missed recent important contributions. After all the research area of Frank--Wolfe is very active, making it a moving target. We apologize sincerely in advance for any such distortions and we fully acknowledge: We stand on the shoulder of giants.
△ Less
Submitted 28 March, 2025; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Convex mixed-integer optimization with Frank-Wolfe methods
Authors:
Deborah Hendrych,
Hannah Troppens,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
Mixed-integer nonlinear optimization encompasses a broad class of problems that present both theoretical and computational challenges. We propose a new type of method to solve these problems based on a branch-and-bound algorithm with convex node relaxations. These relaxations are solved with a Frank-Wolfe algorithm over the convex hull of mixed-integer feasible points instead of the continuous rel…
▽ More
Mixed-integer nonlinear optimization encompasses a broad class of problems that present both theoretical and computational challenges. We propose a new type of method to solve these problems based on a branch-and-bound algorithm with convex node relaxations. These relaxations are solved with a Frank-Wolfe algorithm over the convex hull of mixed-integer feasible points instead of the continuous relaxation via calls to a mixed-integer linear solver as the linear minimization oracle. The proposed method computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of mixed-integer linear solvers without an outer approximation scheme and can exploit inexact solutions of node subproblems.
△ Less
Submitted 18 July, 2024; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Approximate Vanishing Ideal Computations at Scale
Authors:
Elias Wirth,
Hiroshi Kera,
Sebastian Pokutta
Abstract:
The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied b…
▽ More
The vanishing ideal of a set of points $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite subset of generators. In practice, to accommodate noise in the data, algorithms that construct generators of the approximate vanishing ideal are widely studied but their computational complexities remain expensive. In this paper, we scale up the oracle approximate vanishing ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. We prove that the computational complexity of OAVI is not superlinear, as previously claimed, but linear in the number of samples $m$. In addition, we propose two modifications that accelerate OAVI's training time: Our analysis reveals that replacing the pairwise conditional gradients algorithm, one of the solvers used in OAVI, with the faster blended pairwise conditional gradients algorithm leads to an exponential speed-up in the number of features $n$. Finally, using a new inverse Hessian boosting approach, intermediate convex optimization problems can be solved almost instantly, improving OAVI's training time by multiple orders of magnitude in a variety of numerical experiments.
△ Less
Submitted 10 February, 2023; v1 submitted 4 July, 2022;
originally announced July 2022.
-
New Ramsey Multiplicity Bounds and Search Heuristics
Authors:
Olaf Parczyk,
Sebastian Pokutta,
Christoph Spiegel,
Tibor Szabó
Abstract:
We study two related problems concerning the number of homogeneous subsets of given size in graphs that go back to questions of Erdős. Most notably, we improve the upper bounds on the Ramsey multiplicity of $K_4$ and $K_5$ and settle the minimum number of independent sets of size $4$ in graphs with clique number at most $4$. Motivated by the elusiveness of the symmetric Ramsey multiplicity problem…
▽ More
We study two related problems concerning the number of homogeneous subsets of given size in graphs that go back to questions of Erdős. Most notably, we improve the upper bounds on the Ramsey multiplicity of $K_4$ and $K_5$ and settle the minimum number of independent sets of size $4$ in graphs with clique number at most $4$. Motivated by the elusiveness of the symmetric Ramsey multiplicity problem, we also introduce an off-diagonal variant and obtain tight results when counting monochromatic $K_4$ or $K_5$ in only one of the colors and triangles in the other. The extremal constructions for each problem turn out to be blow-ups of a graph of constant size and were found through search heuristics. They are complemented by lower bounds established using flag algebras, resulting in a fully computer-assisted approach. For some of our theorems we can also derive that the extremal construction is stable in a very strong sense. More broadly, these problems lead us to the study of the region of possible pairs of clique and independent set densities that can be realized as the limit of some sequence of graphs.
△ Less
Submitted 13 September, 2024; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes
Authors:
Elias Wirth,
Thomas Kerdreux,
Sebastian Pokutta
Abstract:
Frank-Wolfe algorithms (FW) are popular first-order methods for solving constrained convex optimization problems that rely on a linear minimization oracle instead of potentially expensive projection-like oracles. Many works have identified accelerated convergence rates under various structural assumptions on the optimization problem and for specific FW variants when using line-search or short-step…
▽ More
Frank-Wolfe algorithms (FW) are popular first-order methods for solving constrained convex optimization problems that rely on a linear minimization oracle instead of potentially expensive projection-like oracles. Many works have identified accelerated convergence rates under various structural assumptions on the optimization problem and for specific FW variants when using line-search or short-step, requiring feedback from the objective function. Little is known about accelerated convergence regimes when utilizing open-loop step-size rules, a.k.a. FW with pre-determined step-sizes, which are algorithmically extremely simple and stable. Not only is FW with open-loop step-size rules not always subject to the same convergence rate lower bounds as FW with line-search or short-step, but in some specific cases, such as kernel herding in infinite dimensions, it has been empirically observed that FW with open-loop step-size rules enjoys to faster convergence rates than FW with line-search or short-step. We propose a partial answer to this unexplained phenomenon in kernel herding, characterize a general setting for which FW with open-loop step-size rules converges non-asymptotically faster than with line-search or short-step, and derive several accelerated convergence results for FW with open-loop step-size rules. Finally, we demonstrate that FW with open-loop step-sizes can compete with momentum-based open-loop FW variants.
△ Less
Submitted 15 September, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Compression-aware Training of Neural Networks using Frank-Wolfe
Authors:
Max Zimmer,
Christoph Spiegel,
Sebastian Pokutta
Abstract:
Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework c…
▽ More
Many existing Neural Network pruning approaches rely on either retraining or inducing a strong bias in order to converge to a sparse solution throughout training. A third paradigm, 'compression-aware' training, aims to obtain state-of-the-art dense models that are robust to a wide range of compression ratios using a single dense training run while also avoiding retraining. We propose a framework centered around a versatile family of norm constraints and the Stochastic Frank-Wolfe (SFW) algorithm that encourage convergence to well-performing solutions while inducing robustness towards convolutional filter pruning and low-rank matrix decomposition. Our method is able to outperform existing compression-aware approaches and, in the case of low-rank matrix decomposition, it also requires significantly less computational resources than approaches based on nuclear-norm regularization. Our findings indicate that dynamically adjusting the learning rate of SFW, as suggested by Pokutta et al. (2020), is crucial for convergence and robustness of SFW-trained models and we establish a theoretical foundation for that practice.
△ Less
Submitted 14 February, 2024; v1 submitted 24 May, 2022;
originally announced May 2022.
-
The complexity of geometric scaling
Authors:
Antoine Deza,
Sebastian Pokutta,
Lionel Pournin
Abstract:
Geometric scaling, introduced by Schulz and Weismantel in 2002, solves the integer optimization problem $\max \{c\mathord{\cdot}x: x \in P \cap \mathbb Z^n\}$ by means of primal augmentations, where $P \subset \mathbb R^n$ is a polytope. We restrict ourselves to the important case when $P$ is a $0/1$-polytope. Schulz and Weismantel showed that no more than $O(n \log n \|c\|_\infty)$ calls to an au…
▽ More
Geometric scaling, introduced by Schulz and Weismantel in 2002, solves the integer optimization problem $\max \{c\mathord{\cdot}x: x \in P \cap \mathbb Z^n\}$ by means of primal augmentations, where $P \subset \mathbb R^n$ is a polytope. We restrict ourselves to the important case when $P$ is a $0/1$-polytope. Schulz and Weismantel showed that no more than $O(n \log n \|c\|_\infty)$ calls to an augmentation oracle are required. This upper bound can be improved to $O(n \log \|c\|_\infty)$ using the early-stopping policy proposed in 2018 by Le Bodic, Pavelka, Pfetsch, and Pokutta. Considering both the maximum ratio augmentation variant of the method as well as its approximate version, we show that these upper bounds are essentially tight by maximizing over a $n$-dimensional simplex with vectors $c$ such that $\|c\|_\infty$ is either $n$ or $2^n$.
△ Less
Submitted 3 July, 2023; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Minimizing a low-dimensional convex function over a high-dimensional cube
Authors:
Christoph Hunkenschröder,
Sebastian Pokutta,
Robert Weismantel
Abstract:
For a matrix $W \in \mathbb{Z}^{m \times n}$, $m \leq n$, and a convex function $g: \mathbb{R}^m \rightarrow \mathbb{R}$, we are interested in minimizing $f(x) = g(Wx)$ over the set $\{0,1\}^n$. We will study separable convex functions and sharp convex functions $g$. Moreover, the matrix $W$ is unknown to us. Only the number of rows $m \leq n$ and $\|W\|_{\infty}$ is revealed. The composite functi…
▽ More
For a matrix $W \in \mathbb{Z}^{m \times n}$, $m \leq n$, and a convex function $g: \mathbb{R}^m \rightarrow \mathbb{R}$, we are interested in minimizing $f(x) = g(Wx)$ over the set $\{0,1\}^n$. We will study separable convex functions and sharp convex functions $g$. Moreover, the matrix $W$ is unknown to us. Only the number of rows $m \leq n$ and $\|W\|_{\infty}$ is revealed. The composite function $f(x)$ is presented by a zeroth and first order oracle only. Our main result is a proximity theorem that ensures that an integral minimum and a continuous minimum for separable convex and sharp convex functions are always "close" by. This will be a key ingredient to develop an algorithm for detecting an integer minimum that achieves a running time of roughly $(m \| W \|_{\infty})^{\mathcal{O}(m^3)} \cdot \text{poly}(n)$. In the special case when $(i)$ $W$ is given explicitly and $(ii)$ $g$ is separable convex one can also adapt an algorithm of Hochbaum and Shanthikumar. The running time of this adapted algorithm matches with the running time of our general algorithm.
△ Less
Submitted 16 August, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights
Authors:
Maxime Gasse,
Quentin Cappart,
Jonas Charfreitag,
Laurent Charlin,
Didier Chételat,
Antonia Chmiela,
Justin Dumouchelle,
Ambros Gleixner,
Aleksandr M. Kazachkov,
Elias Khalil,
Pawel Lichocki,
Andrea Lodi,
Miles Lubin,
Chris J. Maddison,
Christopher Morris,
Dimitri J. Papageorgiou,
Augustin Parjadis,
Sebastian Pokutta,
Antoine Prouvost,
Lara Scavuzzo,
Giulia Zarpellon,
Linxin Yang,
Sha Lai,
Akang Wang,
Xiaodong Luo
, et al. (16 additional authors not shown)
Abstract:
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir…
▽ More
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.
△ Less
Submitted 17 March, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps
Authors:
Kazuma Tsuji,
Ken'ichiro Tanaka,
Sebastian Pokutta
Abstract:
The Pairwise Conditional Gradients (PCG) algorithm is a powerful extension of the Frank-Wolfe algorithm leading to particularly sparse solutions, which makes PCG very appealing for problems such as sparse signal recovery, sparse regression, and kernel herding. Unfortunately, PCG exhibits so-called swap steps that might not provide sufficient primal progress. The number of these bad steps is bounde…
▽ More
The Pairwise Conditional Gradients (PCG) algorithm is a powerful extension of the Frank-Wolfe algorithm leading to particularly sparse solutions, which makes PCG very appealing for problems such as sparse signal recovery, sparse regression, and kernel herding. Unfortunately, PCG exhibits so-called swap steps that might not provide sufficient primal progress. The number of these bad steps is bounded by a function in the dimension and as such known guarantees do not generalize to the infinite-dimensional case, which would be needed for kernel herding. We propose a new variant of PCG, the so-called Blended Pairwise Conditional Gradients (BPCG). This new algorithm does not exhibit any swap steps, is very easy to implement, and does not require any internal gradient alignment procedures. The convergence rate of BPCG is basically that of PCG if no drop steps would occur and as such is no worse than PCG but improves and provides new rates in many cases. Moreover, we observe in the numerical experiments that BPCG's solutions are much sparser than those of PCG. We apply BPCG to the kernel herding setting, where we derive nice quadrature rules and provide numerical results demonstrating the performance of our method.
△ Less
Submitted 8 February, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.