-
Convex mixed-integer optimization with Frank-Wolfe methods
Authors:
Deborah Hendrych,
Hannah Troppens,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
Mixed-integer nonlinear optimization encompasses a broad class of problems that present both theoretical and computational challenges. We propose a new type of method to solve these problems based on a branch-and-bound algorithm with convex node relaxations. These relaxations are solved with a Frank-Wolfe algorithm over the convex hull of mixed-integer feasible points instead of the continuous rel…
▽ More
Mixed-integer nonlinear optimization encompasses a broad class of problems that present both theoretical and computational challenges. We propose a new type of method to solve these problems based on a branch-and-bound algorithm with convex node relaxations. These relaxations are solved with a Frank-Wolfe algorithm over the convex hull of mixed-integer feasible points instead of the continuous relaxation via calls to a mixed-integer linear solver as the linear minimization oracle. The proposed method computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of mixed-integer linear solvers without an outer approximation scheme and can exploit inexact solutions of node subproblems.
△ Less
Submitted 18 July, 2024; v1 submitted 23 August, 2022;
originally announced August 2022.
-
The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights
Authors:
Maxime Gasse,
Quentin Cappart,
Jonas Charfreitag,
Laurent Charlin,
Didier Chételat,
Antonia Chmiela,
Justin Dumouchelle,
Ambros Gleixner,
Aleksandr M. Kazachkov,
Elias Khalil,
Pawel Lichocki,
Andrea Lodi,
Miles Lubin,
Chris J. Maddison,
Christopher Morris,
Dimitri J. Papageorgiou,
Augustin Parjadis,
Sebastian Pokutta,
Antoine Prouvost,
Lara Scavuzzo,
Giulia Zarpellon,
Linxin Yang,
Sha Lai,
Akang Wang,
Xiaodong Luo
, et al. (16 additional authors not shown)
Abstract:
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir…
▽ More
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.
△ Less
Submitted 17 March, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
Efficient Online-Bandit Strategies for Minimax Learning Problems
Authors:
Christophe Roux,
Elias Wirth,
Sebastian Pokutta,
Thomas Kerdreux
Abstract:
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $w\in\mathcal{W}$ and the maximization over the empirical distribution $p\in\mathcal{K}$ of the training set indexes, w…
▽ More
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $w\in\mathcal{W}$ and the maximization over the empirical distribution $p\in\mathcal{K}$ of the training set indexes, where $\mathcal{K}$ is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of $\mathcal{K}$ and propose two properties of $\mathcal{K}$ that facilitate designing efficient algorithms. We focus on a specific family of sets $\mathcal{S}_{n,k}$ encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.
△ Less
Submitted 4 June, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps
Authors:
Alejandro Carderera,
Mathieu Besançon,
Sebastian Pokutta
Abstract:
Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $γ_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. Th…
▽ More
Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $γ_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
△ Less
Submitted 8 April, 2024; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Parameter-free Locally Accelerated Conditional Gradients
Authors:
Alejandro Carderera,
Jelena Diakonikolas,
Cheuk Yin Lin,
Sebastian Pokutta
Abstract:
Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible. Unlike in projection-based methods, globally accelerated convergence rates are in general unattainable for CG. However, a very recent work on Lo…
▽ More
Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible. Unlike in projection-based methods, globally accelerated convergence rates are in general unattainable for CG. However, a very recent work on Locally accelerated CG (LaCG) has demonstrated that local acceleration for CG is possible for many settings of interest. The main downside of LaCG is that it requires knowledge of the smoothness and strong convexity parameters of the objective function. We remove this limitation by introducing a novel, Parameter-Free Locally accelerated CG (PF-LaCG) algorithm, for which we provide rigorous convergence guarantees. Our theoretical results are complemented by numerical experiments, which demonstrate local acceleration and showcase the practical improvements of PF-LaCG over non-accelerated algorithms, both in terms of iteration count and wall-clock time.
△ Less
Submitted 15 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
CINDy: Conditional gradient-based Identification of Non-linear Dynamics -- Noise-robust recovery
Authors:
Alejandro Carderera,
Sebastian Pokutta,
Christof Schütte,
Martin Weiser
Abstract:
Governing equations are essential to the study of nonlinear dynamics, often enabling the prediction of previously unseen behaviors as well as the inclusion into control strategies. The discovery of governing equations from data thus has the potential to transform data-rich fields where well-established dynamical models remain unknown. This work contributes to the recent trend in data-driven sparse…
▽ More
Governing equations are essential to the study of nonlinear dynamics, often enabling the prediction of previously unseen behaviors as well as the inclusion into control strategies. The discovery of governing equations from data thus has the potential to transform data-rich fields where well-established dynamical models remain unknown. This work contributes to the recent trend in data-driven sparse identification of nonlinear dynamics of finding the best sparse fit to observational data in a large library of potential nonlinear models. We propose an efficient first-order Conditional Gradient algorithm for solving the underlying optimization problem. In comparison to the most prominent alternative algorithms, the new algorithm shows significantly improved performance on several essential issues like sparsity-induction, structure-preservation, noise robustness, and sample efficiency. We demonstrate these advantages on several dynamics from the field of synchronization, particle dynamics, and enzyme chemistry.
△ Less
Submitted 28 April, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
Second-order Conditional Gradient Sliding
Authors:
Alejandro Carderera,
Sebastian Pokutta
Abstract:
Constrained second-order convex optimization algorithms are the method of choice when a high accuracy solution to a problem is needed, due to their local quadratic convergence. These algorithms require the solution of a constrained quadratic subproblem at every iteration. We present the \emph{Second-Order Conditional Gradient Sliding} (SOCGS) algorithm, which uses a projection-free algorithm to so…
▽ More
Constrained second-order convex optimization algorithms are the method of choice when a high accuracy solution to a problem is needed, due to their local quadratic convergence. These algorithms require the solution of a constrained quadratic subproblem at every iteration. We present the \emph{Second-Order Conditional Gradient Sliding} (SOCGS) algorithm, which uses a projection-free algorithm to solve the constrained quadratic subproblems inexactly. When the feasible region is a polytope the algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations. Once in the quadratic regime the SOCGS algorithm requires $\mathcal{O}(\log(\log 1/\varepsilon))$ first-order and Hessian oracle calls and $\mathcal{O}(\log (1/\varepsilon) \log(\log1/\varepsilon))$ linear minimization oracle calls to achieve an $\varepsilon$-optimal solution. This algorithm is useful when the feasible region can only be accessed efficiently through a linear optimization oracle, and computing first-order information of the function, although possible, is costly.
△ Less
Submitted 11 June, 2025; v1 submitted 20 February, 2020;
originally announced February 2020.
-
IPBoost -- Non-Convex Boosting via Integer Programming
Authors:
Marc E. Pfetsch,
Sebastian Pokutta
Abstract:
Recently non-convex optimization approaches for solving machine learning problems have gained significant attention. In this paper we explore non-convex boosting in classification by means of integer programming and demonstrate real-world practicability of the approach while circumventing shortcomings of convex boosting approaches. We report results that are comparable to or better than the curren…
▽ More
Recently non-convex optimization approaches for solving machine learning problems have gained significant attention. In this paper we explore non-convex boosting in classification by means of integer programming and demonstrate real-world practicability of the approach while circumventing shortcomings of convex boosting approaches. We report results that are comparable to or better than the current state-of-the-art.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Locally Accelerated Conditional Gradients
Authors:
Jelena Diakonikolas,
Alejandro Carderera,
Sebastian Pokutta
Abstract:
Conditional gradients constitute a class of projection-free first-order algorithms for smooth convex optimization. As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of orthogonal projections would be prohibitive. However, they do not enjoy the optimal convergence rates achieved by projection-based accelerated methods;…
▽ More
Conditional gradients constitute a class of projection-free first-order algorithms for smooth convex optimization. As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of orthogonal projections would be prohibitive. However, they do not enjoy the optimal convergence rates achieved by projection-based accelerated methods; moreover, achieving such globally-accelerated rates is information-theoretically impossible for these methods. To address this issue, we present Locally Accelerated Conditional Gradients -- an algorithmic framework that couples accelerated steps with conditional gradient steps to achieve local acceleration on smooth strongly convex problems. Our approach does not require projections onto the feasible set, but only on (typically low-dimensional) simplices, thus keeping the computational cost of projections at bay. Further, it achieves the optimal accelerated local convergence. Our theoretical results are supported by numerical experiments, which demonstrate significant speedups of our framework over state of the art methods in both per-iteration progress and wall-clock time.
△ Less
Submitted 11 October, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Blended Matching Pursuit
Authors:
Cyrille W. Combettes,
Sebastian Pokutta
Abstract:
Matching pursuit algorithms are an important class of algorithms in signal processing and machine learning. We present a blended matching pursuit algorithm, combining coordinate descent-like steps with stronger gradient descent steps, for minimizing a smooth convex function over a linear space spanned by a set of atoms. We derive sublinear to linear convergence rates according to the smoothness an…
▽ More
Matching pursuit algorithms are an important class of algorithms in signal processing and machine learning. We present a blended matching pursuit algorithm, combining coordinate descent-like steps with stronger gradient descent steps, for minimizing a smooth convex function over a linear space spanned by a set of atoms. We derive sublinear to linear convergence rates according to the smoothness and sharpness orders of the function and demonstrate computational superiority of our approach. In particular, we derive linear rates for a wide class of non-strongly convex functions, and we demonstrate in experiments that our algorithm enjoys very fast rates of convergence and wall-clock speed while maintaining a sparsity of iterates very comparable to that of the (much slower) orthogonal matching pursuit.
△ Less
Submitted 19 November, 2019; v1 submitted 28 April, 2019;
originally announced April 2019.
-
An Online-Learning Approach to Inverse Optimization
Authors:
Andreas Bärmann,
Alexander Martin,
Sebastian Pokutta,
Oskar Schneider
Abstract:
In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. We present exact algorithms for this online version of inverse optimization which converge at a rate of $ \mathcal{O}(1/\sqrt{T}) $ in the number of observations~$T$ and compare their further propert…
▽ More
In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. We present exact algorithms for this online version of inverse optimization which converge at a rate of $ \mathcal{O}(1/\sqrt{T}) $ in the number of observations~$T$ and compare their further properties. Especially, they all allow taking decisions which are essentially as good as those of the observed decision-maker already after relatively few iterations, but are suited best for different settings each. Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle. As such, it generalizes previous approaches based on KKT-system decomposition and dualization. We also introduce several generalizations, such as the approximate learning of non-linear objective functions, dynamically changing as well as parameterized objectives and the case of suboptimal observed decisions. When applied to the stochastic offline case, our algorithms are able to give guarantees on the quality of the learned objectives in expectation. Finally, we show the effectiveness and possible applications of our methods in indicative computational experiments.
△ Less
Submitted 28 March, 2020; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Principled Deep Neural Network Training through Linear Programming
Authors:
Daniel Bienstock,
Gonzalo Muñoz,
Sebastian Pokutta
Abstract:
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training prob…
▽ More
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample-size. Notably, the size of the polyhedral representation depends only linearly on the sample-size, and a better dependency on several other network parameters is unlikely (assuming $P\neq NP$). Additionally, we use our polyhedral representation to obtain new and better computational complexity results for training problems of well-known neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal a strong structure arising from these problems.
△ Less
Submitted 1 March, 2022; v1 submitted 7 October, 2018;
originally announced October 2018.
-
Reinforcement Learning under Model Mismatch
Authors:
Aurko Roy,
Huan Xu,
Sebastian Pokutta
Abstract:
We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of…
▽ More
We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the model-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define robust versions of Q-learning, SARSA, and TD-learning and prove convergence to an approximately optimal robust policy and approximate value function respectively. We scale up the robust algorithms to large MDPs via function approximation and prove convergence under two different settings. We prove convergence of robust approximate policy iteration and robust approximate value iteration for linear architectures (under mild assumptions). We also define a robust loss function, the mean squared robust projected Bellman error and give stochastic gradient descent algorithms that are guaranteed to converge to a local minimum.
△ Less
Submitted 8 November, 2017; v1 submitted 14 June, 2017;
originally announced June 2017.
-
Conditional Accelerated Lazy Stochastic Gradient Descent
Authors:
Guanghui Lan,
Sebastian Pokutta,
Yi Zhou,
Daniel Zink
Abstract:
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate $O\left(\frac{1}{\varepsilon^2}\right)$ improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate $O\left(\frac{1}{\varepsilon^4}\right)$.
△ Less
Submitted 15 February, 2018; v1 submitted 16 March, 2017;
originally announced March 2017.
-
Sequential Information Guided Sensing
Authors:
Ruiyang Song,
Yao Xie,
Sebastian Pokutta
Abstract:
We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal distribution is parameterized through Gaussian or Gaussian mixtures with estimated mean and covariance matrices, and we can measure compressively through a noisy linea…
▽ More
We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal distribution is parameterized through Gaussian or Gaussian mixtures with estimated mean and covariance matrices, and we can measure compressively through a noisy linear projection or using one-sparse vectors, i.e., observing one entry of the signal each time. We establish a set of performance bounds for the bias and variance of the signal estimator via posterior mean, by capturing the conditional entropy (which is also related to the size of the uncertainty), and the additional power required due to inaccurate information to reach a desired precision. Based on this, we further study how to estimate covariance based on direct samples or covariance sketching. Numerical examples also demonstrate the superior performance of Info-Greedy Sensing algorithms compared with their random and non-adaptive counterparts.
△ Less
Submitted 31 August, 2015;
originally announced September 2015.
-
Sequential Sensing with Model Mismatch
Authors:
Ruiyang Song,
Yao Xie,
Sebastian Pokutta
Abstract:
We characterize the performance of sequential information guided sensing, Info-Greedy Sensing, when there is a mismatch between the true signal model and the assumed model, which may be a sample estimate. In particular, we consider a setup where the signal is low-rank Gaussian and the measurements are taken in the directions of eigenvectors of the covariance matrix in a decreasing order of eigenva…
▽ More
We characterize the performance of sequential information guided sensing, Info-Greedy Sensing, when there is a mismatch between the true signal model and the assumed model, which may be a sample estimate. In particular, we consider a setup where the signal is low-rank Gaussian and the measurements are taken in the directions of eigenvectors of the covariance matrix in a decreasing order of eigenvalues. We establish a set of performance bounds when a mismatched covariance matrix is used, in terms of the gap of signal posterior entropy, as well as the additional amount of power required to achieve the same signal recovery precision. Based on this, we further study how to choose an initialization for Info-Greedy Sensing using the sample covariance matrix, or using an efficient covariance sketching scheme.
△ Less
Submitted 16 March, 2015; v1 submitted 25 January, 2015;
originally announced January 2015.
-
Info-Greedy sequential adaptive compressed sensing
Authors:
Gabor Braun,
Sebastian Pokutta,
Yao Xie
Abstract:
We present an information-theoretic framework for sequential adaptive compressed sensing, Info-Greedy Sensing, where measurements are chosen to maximize the extracted information conditioned on the previous measurements. We show that the widely used bisection approach is Info-Greedy for a family of $k$-sparse signals by connecting compressed sensing and blackbox complexity of sequential query algo…
▽ More
We present an information-theoretic framework for sequential adaptive compressed sensing, Info-Greedy Sensing, where measurements are chosen to maximize the extracted information conditioned on the previous measurements. We show that the widely used bisection approach is Info-Greedy for a family of $k$-sparse signals by connecting compressed sensing and blackbox complexity of sequential query algorithms, and present Info-Greedy algorithms for Gaussian and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse Info-Greedy measurements. Numerical examples demonstrate the good performance of the proposed algorithms using simulated and real data: Info-Greedy Sensing shows significant improvement over random projection for signals with sparse and low-rank covariance matrices, and adaptivity brings robustness when there is a mismatch between the assumed and the true distributions.
△ Less
Submitted 2 February, 2015; v1 submitted 2 July, 2014;
originally announced July 2014.