-
Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations
Authors:
Vyacheslav Kungurtsev,
Gustav Sir,
Akhil Anand,
Sebastien Gros,
Haozhe Tian,
Homayoun Hamedmoghadam
Abstract:
Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robot…
▽ More
Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robotic care, which would normally involve a two-level Reinforcement Learning procedure that trains a policy for both lower level physical movement decisions as well as higher level conceptual tasks and their sub-components. In order to deliver greater safety and reliability in the system, we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance. Here, the notion of reliability pertains to physical safety and interpretability into an otherwise black box operation of autonomous agents, concerning users and regulators. This work presents the necessary background and general formulation of the optimization framework, detailing each component and its integration with the others.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Direct-search methods for decentralized blackbox optimization
Authors:
El Houcine Bergou,
Youssef Diouane,
Vyacheslav Kungurtsev,
Clément W. Royer
Abstract:
Derivative-free optimization algorithms are particularly useful for tackling blackbox optimization problems where the objective function arises from complex and expensive procedures that preclude the use of classical gradient-based methods. In contemporary decentralized environments, such functions are defined locally on different computational nodes due to technical or privacy constraints, introd…
▽ More
Derivative-free optimization algorithms are particularly useful for tackling blackbox optimization problems where the objective function arises from complex and expensive procedures that preclude the use of classical gradient-based methods. In contemporary decentralized environments, such functions are defined locally on different computational nodes due to technical or privacy constraints, introducing additional challenges within the optimization process.
In this paper, we adapt direct-search methods, a classical technique in derivative-free optimization, to the decentralized setting. In contrast with zeroth-order algorithms, our algorithms rely on positive spanning sets to define suitable search directions, while still possessing global convergence guarantees thanks to carefully chosen stepsizes. Numerical experiments highlight the advantages of direct-search techniques over gradient-approximation-based strategies.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Probabilistic Iterative Hard Thresholding for Sparse Learning
Authors:
Matteo Bergamaschi,
Andrea Cristofari,
Vyacheslav Kungurtsev,
Francesco Rinaldi
Abstract:
For statistical modeling wherein the data regime is unfavorable in terms of dimensionality relative to the sample size, finding hidden sparsity in the ground truth can be critical in formulating an accurate statistical model. The so-called "l0 norm" which counts the number of non-zero components in a vector, is a strong reliable mechanism of enforcing sparsity when incorporated into an optimizatio…
▽ More
For statistical modeling wherein the data regime is unfavorable in terms of dimensionality relative to the sample size, finding hidden sparsity in the ground truth can be critical in formulating an accurate statistical model. The so-called "l0 norm" which counts the number of non-zero components in a vector, is a strong reliable mechanism of enforcing sparsity when incorporated into an optimization problem. However, in big data settings wherein noisy estimates of the gradient must be evaluated out of computational necessity, the literature is scant on methods that reliably converge. In this paper we present an approach towards solving expectation objective optimization problems with cardinality constraints. We prove convergence of the underlying stochastic process, and demonstrate the performance on two Machine Learning problems.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Empirical Bayes for Dynamic Bayesian Networks Using Generalized Variational Inference
Authors:
Vyacheslav Kungurtsev,
Apaar,
Aarya Khandelwal,
Parth Sandeep Rastogi,
Bapi Chatterjee,
Jakub Mareček
Abstract:
In this work, we demonstrate the Empirical Bayes approach to learning a Dynamic Bayesian Network. By starting with several point estimates of structure and weights, we can use a data-driven prior to subsequently obtain a model to quantify uncertainty. This approach uses a recent development of Generalized Variational Inference, and indicates the potential of sampling the uncertainty of a mixture o…
▽ More
In this work, we demonstrate the Empirical Bayes approach to learning a Dynamic Bayesian Network. By starting with several point estimates of structure and weights, we can use a data-driven prior to subsequently obtain a model to quantify uncertainty. This approach uses a recent development of Generalized Variational Inference, and indicates the potential of sampling the uncertainty of a mixture of DAG structures as well as a parameter posterior.
△ Less
Submitted 28 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons
Authors:
Vyacheslav Kungurtsev,
Fadwa Idlahcen,
Petr Rysavy,
Pavel Rytir,
Ales Wodecki
Abstract:
In this paper, we present a guide to the foundations of learning Dynamic Bayesian Networks (DBNs) from data in the form of multiple samples of trajectories for some length of time. We present the formalism for a generic as well as a set of common types of DBNs for particular variable distributions. We present the analytical form of the models, with a comprehensive discussion on the interdependence…
▽ More
In this paper, we present a guide to the foundations of learning Dynamic Bayesian Networks (DBNs) from data in the form of multiple samples of trajectories for some length of time. We present the formalism for a generic as well as a set of common types of DBNs for particular variable distributions. We present the analytical form of the models, with a comprehensive discussion on the interdependence between structure and weights in a DBN model and their implications for learning. Next, we give a broad overview of learning methods and describe and categorize them based on the most important statistical features, and how they treat the interplay between learning structure and weights. We give the analytical form of the likelihood and Bayesian score functions, emphasizing the distinction from the static case. We discuss functions used in optimization to enforce structural requirements. We briefly discuss more complex extensions and representations. Finally we present a set of comparisons in different settings for various distinct but representative algorithms across the variants.
△ Less
Submitted 30 August, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Optimal Control of Two-Phase Membrane Problem
Authors:
Farid Bozorgnia,
Vyacheslav Kungurtsev
Abstract:
We consider an optimal control problem where the state is governed by a free boundary problem called the two-phase membrane problem and the control appears in the coefficients of the characteristic function of the positivity and negativity parts of the solution. Our investigation focuses on various properties associated with the control-to-state map. Due to the non-differentiability of this map, w…
▽ More
We consider an optimal control problem where the state is governed by a free boundary problem called the two-phase membrane problem and the control appears in the coefficients of the characteristic function of the positivity and negativity parts of the solution. Our investigation focuses on various properties associated with the control-to-state map. Due to the non-differentiability of this map, we regularize the state equation. The existence, uniqueness, and characterization of the optimal pairs are established.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Spectral Methods for Quantum Optimal Control: Artificial Boundary Conditions
Authors:
Ales Wodecki,
Jakub Marecek,
Vyacheslav Kungurtsev,
Pavel Eichler,
Georgios Korpas,
Philip Intallura
Abstract:
The problem of quantum state preparation is one of the main challenges in achieving the quantum advantage. Furthermore, classically, for multi-level problems, our ability to solve the corresponding quantum optimal control problems is rather limited. The ability of the latter to feed into the former may result in significant progress in quantum computing. To address this challenge, we propose a for…
▽ More
The problem of quantum state preparation is one of the main challenges in achieving the quantum advantage. Furthermore, classically, for multi-level problems, our ability to solve the corresponding quantum optimal control problems is rather limited. The ability of the latter to feed into the former may result in significant progress in quantum computing. To address this challenge, we propose a formulation of quantum optimal control that makes use of artificial boundary conditions for the Schrödinger equation in combination with spectral methods. The resulting formulations are well suited for investigating periodic potentials and lend themselves to direct numerical treatment using conventional methods for bounded domains.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
The stochastic Ravine accelerated gradient method with general extrapolation coefficients
Authors:
Hedy Attouch,
Jalal Fadili,
Vyacheslav Kungurtsev
Abstract:
In a real Hilbert space domain setting, we study the convergence properties of the stochastic Ravine accelerated gradient method for convex differentiable optimization. We consider the general form of this algorithm where the extrapolation coefficients can vary with each iteration, and where the evaluation of the gradient is subject to random errors. This general treatment models a breadth of prac…
▽ More
In a real Hilbert space domain setting, we study the convergence properties of the stochastic Ravine accelerated gradient method for convex differentiable optimization. We consider the general form of this algorithm where the extrapolation coefficients can vary with each iteration, and where the evaluation of the gradient is subject to random errors. This general treatment models a breadth of practical algorithms and numerical implementations. We show that, under a proper tuning of the extrapolation parameters, and when the error variance associated with the gradient evaluations or the step-size sequences vanish sufficiently fast, the Ravine method provides fast convergence of the values both in expectation and almost surely. We also improve the convergence rates from O(.) to o(.). Moreover, we show almost sure summability property of the gradients, which implies the fast convergence of the gradients towards zero. This property reflects the fact that the high-resolution ODE of the Ravine method includes a Hessian-driven damping term. When the space is also separable, our analysis allows also to establish almost sure weak convergence of the sequence of iterates provided by the algorithm. We finally specialize the analysis to consider different parameter choices, including vanishing and constant (heavy ball method with friction) damping parameter, and present a comprehensive landscape of the tradeoffs in speed and accuracy associated with these parameter choices and statistical properties on the sequence of errors in the gradient computations. We provide a thorough discussion of the similarities and differences with the Nesterov accelerated gradient which satisfies similar asymptotic convergence rates.
△ Less
Submitted 21 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Truss topology design under harmonic loads: Peak power minimization with semidefinite programming
Authors:
Shenyuan Ma,
Jakub Marecek,
Vyacheslav Kungurtsev,
Marek Tyburec
Abstract:
Designing lightweight yet stiff structures that can withstand vibrations is a crucial task in structural optimization. Here, we present a novel framework for truss topology optimization under undamped harmonic oscillations. Our approach minimizes the peak power of the structure under harmonic loads, overcoming the limitations of single-frequency and in-phase assumptions found in previous methods.…
▽ More
Designing lightweight yet stiff structures that can withstand vibrations is a crucial task in structural optimization. Here, we present a novel framework for truss topology optimization under undamped harmonic oscillations. Our approach minimizes the peak power of the structure under harmonic loads, overcoming the limitations of single-frequency and in-phase assumptions found in previous methods. For this, we leverage the concept of semidefinite representable (SDr) functions, demonstrating that while compliance readily conforms to an SDr representation, peak power requires a derivation based on the non-negativity of trigonometric functions. Finally, we introduce convex relaxations for the minimization problem and provide promising computational results.
△ Less
Submitted 19 February, 2025; v1 submitted 29 January, 2024;
originally announced January 2024.
-
The Effects of Transmission-Rights Pricing on Multi-Stage Electricity Markets
Authors:
Erwann de Belloy de Saint-Lienard,
Jakub Marecek,
Vyacheslav Kungurtsev
Abstract:
Cross-border transmission infrastructure is pivotal in balancing modern power systems, but requires fair allocation of cross-border transmission capacity, possibly via fair pricing thereof. This requirement can be implemented using multi-stage market mechanisms for Physical Transmission Rights (PTRs). We analyse the related dynamics, and show prisoner's dilemma arises. Understanding these dynamics…
▽ More
Cross-border transmission infrastructure is pivotal in balancing modern power systems, but requires fair allocation of cross-border transmission capacity, possibly via fair pricing thereof. This requirement can be implemented using multi-stage market mechanisms for Physical Transmission Rights (PTRs). We analyse the related dynamics, and show prisoner's dilemma arises. Understanding these dynamics enables the development of novel market-settlement mechanisms to enhance market efficiency and incentivize renewable energy use.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Mini-batch stochastic subgradient for functional constrained optimization
Authors:
Nitesh Kumar Singh,
Ion Necoara,
Vyacheslav Kungurtsev
Abstract:
In this paper we consider finite sum composite convex optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both sm…
▽ More
In this paper we consider finite sum composite convex optimization problems with many functional constraints. The objective function is expressed as a finite sum of two terms, one of which admits easy computation of (sub)gradients while the other is amenable to proximal evaluations. We assume a generalized bounded gradient condition on the objective which allows us to simultaneously tackle both smooth and nonsmooth problems. We also consider the cases of both with and without a strong convexity property. Further, we assume that each constraint set is given as the level set of a convex but not necessarily differentiable function. We reformulate the constrained finite sum problem into a stochastic optimization problem for which the stochastic subgradient projection method from [17] specializes to a collection of mini-batch variants, with different mini-batch sizes for the objective function and functional constraints, respectively. More specifically, at each iteration, our algorithm takes a mini-batch stochastic proximal subgradient step aimed at minimizing the objective function and then a subsequent mini-batch subgradient projection step minimizing the feasibility violation. By specializing different mini-batching strategies, we derive exact expressions for the stepsizes as a function of the mini-batch size and in some cases we also derive insightful stepsize-switching rules which describe when one should switch from a constant to a decreasing stepsize regime. We also prove sublinear convergence rates for the mini-batch subgradient projection algorithm which depend explicitly on the mini-batch sizes and on the properties of the objective function. Numerical results also show a better performance of our mini-batch scheme over its single-batch counterpart.
△ Less
Submitted 1 December, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Scheduling a Multi-Product Pipeline: A Discretized MILP Formulation
Authors:
Ales Wodecki,
Pavel Rytir,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
Multi-product pipelines are a highly efficient means of transporting liquids. Traditionally used to transport petroleum, its products and derivatives, they are now being repurposed to transport liquified natural gas admixed with hydrogen of various colors. We propose a novel mixed-integer linear programming (MILP) formulation, which optimizes efficiency while satisfying a wide range of real-world…
▽ More
Multi-product pipelines are a highly efficient means of transporting liquids. Traditionally used to transport petroleum, its products and derivatives, they are now being repurposed to transport liquified natural gas admixed with hydrogen of various colors. We propose a novel mixed-integer linear programming (MILP) formulation, which optimizes efficiency while satisfying a wide range of real-world constraints developed to meet the needs of the Czech national pipeline operator CEPRO. We provide tests on well-known synthetic (path-graph) networks and demonstrate the formulation's scaling properties using open-source and commercial MILP solvers.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Parallel variational quantum algorithms with gradient-informed restart to speed up optimisation in the presence of barren plateaus
Authors:
Daniel Mastropietro,
Georgios Korpas,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
Inspired by the Fleming-Viot stochastic process, we propose a parallel implementation of variational quantum algorithms with the aim of reducing the time spent by the algorithm in barren plateaus, where optimization direction is unclear. In the Fleming-Viot tradition, parallel searches are called particles. In the proposed approach, the search by a Fleming-Viot particle is stopped when it encounte…
▽ More
Inspired by the Fleming-Viot stochastic process, we propose a parallel implementation of variational quantum algorithms with the aim of reducing the time spent by the algorithm in barren plateaus, where optimization direction is unclear. In the Fleming-Viot tradition, parallel searches are called particles. In the proposed approach, the search by a Fleming-Viot particle is stopped when it encounters a region where the gradient is too small or noisy, suggesting a barren plateau area. The stopped particle continues the search after being regenerated at another location of the parameter space, potentially taking the exploration away from barren plateaus. We first analyze the behavior of the Fleming-Viot particles from a theoretical standpoint. We show that, when simulated annealing optimizers are used as particles, the Fleming-Viot system is expected to find the global optimum faster than a single simulated annealing optimizer, with a relative efficiency that increases proportionally to the percentage of barren plateaus in the domain. This result is corroborated by numerical experiments carried out on synthetic problems as well as on instances of the Max-Cut problem, which show that our method performs better than plain simulated annealing when large barren plateaus are present in the domain.
△ Less
Submitted 15 December, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Inexact Direct-Search Methods for Bilevel Optimization Problems
Authors:
Youssef Diouane,
Vyacheslav Kungurtsev,
Francesco Rinaldi,
Damiano Zeffiro
Abstract:
In this work, we introduce new direct search schemes for the solution of bilevel optimization (BO) problems. Our methods rely on a fixed accuracy black box oracle for the lower-level problem, and deal both with smooth and potentially nonsmooth true objectives. We thus analyze for the first time in the literature direct search schemes in these settings, giving convergence guarantees to approximate…
▽ More
In this work, we introduce new direct search schemes for the solution of bilevel optimization (BO) problems. Our methods rely on a fixed accuracy black box oracle for the lower-level problem, and deal both with smooth and potentially nonsmooth true objectives. We thus analyze for the first time in the literature direct search schemes in these settings, giving convergence guarantees to approximate stationary points, as well as complexity bounds in the smooth case. We also propose the first adaptation of mesh adaptive direct search schemes for BO. Some preliminary numerical results on a standard set of bilevel optimization problems show the effectiveness of our new approaches.
△ Less
Submitted 13 September, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Stochastic Approximation for Expectation Objective and Expectation Inequality-Constrained Nonconvex Optimization
Authors:
Francisco Facchinei,
Vyacheslav Kungurtsev
Abstract:
Stochastic Approximation has been a prominent set of tools for solving problems with noise and uncertainty. Increasingly, it becomes important to solve optimization problems wherein there is noise in both a set of constraints that a practitioner requires the system to adhere to, as well as the objective, which typically involves some empirical loss. We present the first stochastic approximation ap…
▽ More
Stochastic Approximation has been a prominent set of tools for solving problems with noise and uncertainty. Increasingly, it becomes important to solve optimization problems wherein there is noise in both a set of constraints that a practitioner requires the system to adhere to, as well as the objective, which typically involves some empirical loss. We present the first stochastic approximation approach for solving this class of problems using the Ghost framework of incorporating penalty functions for analysis of a sequential convex programming approach together with a Monte Carlo estimator of nonlinear maps. We provide almost sure convergence guarantees and demonstrate the performance of the procedure on some representative examples.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Hybrid Methods in Polynomial Optimisation
Authors:
Johannes Aspman,
Gilles Bareilles,
Vyacheslav Kungurtsev,
Jakub Marecek,
Martin Takáč
Abstract:
The Moment/Sum-of-squares hierarchy provides a way to compute the global minimizers of polynomial optimization problems (POP), at the cost of solving a sequence of increasingly large semidefinite programs (SDPs). We consider large-scale POPs, for which interior-point methods are no longer able to solve the resulting SDPs. We propose an algorithm that combines a first-order method for solving the S…
▽ More
The Moment/Sum-of-squares hierarchy provides a way to compute the global minimizers of polynomial optimization problems (POP), at the cost of solving a sequence of increasingly large semidefinite programs (SDPs). We consider large-scale POPs, for which interior-point methods are no longer able to solve the resulting SDPs. We propose an algorithm that combines a first-order method for solving the SDP relaxation, and a second-order method on a non-convex problem obtained from the POP. The switch from the first to the second-order method is based on a quantitative criterion, whose satisfaction ensures that Newton's method converges quadratically from its first iteration. This criterion leverages the point-estimation theory of Smale and the active-set identification. We illustrate the methodology to obtain global minimizers of large-scale optimal power flow problems.
△ Less
Submitted 12 September, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
A Stochastic-Gradient-based Interior-Point Algorithm for Solving Smooth Bound-Constrained Optimization Problems
Authors:
Frank E. Curtis,
Vyacheslav Kungurtsev,
Daniel P. Robinson,
Qi Wang
Abstract:
A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using sto…
▽ More
A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform projection-based methods.
△ Less
Submitted 13 March, 2024; v1 submitted 28 April, 2023;
originally announced April 2023.
-
A Survey of Quantum Alternatives to Randomized Algorithms: Monte Carlo Integration and Beyond
Authors:
Philip Intallura,
Georgios Korpas,
Sudeepto Chakraborty,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
Monte Carlo sampling is a powerful toolbox of algorithmic techniques widely used for a number of applications wherein some noisy quantity, or summary statistic thereof, is sought to be estimated. In this paper, we survey the literature for implementing Monte Carlo procedures using quantum circuits, focusing on the potential to obtain a quantum advantage in the computational speed of these procedur…
▽ More
Monte Carlo sampling is a powerful toolbox of algorithmic techniques widely used for a number of applications wherein some noisy quantity, or summary statistic thereof, is sought to be estimated. In this paper, we survey the literature for implementing Monte Carlo procedures using quantum circuits, focusing on the potential to obtain a quantum advantage in the computational speed of these procedures. We revisit the quantum algorithms that could replace classical Monte Carlo and then consider both the existing quantum algorithms and the potential quantum realizations that include adaptive enhancements as alternatives to the classical procedure.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
A Sequential Quadratic Programming Method for Optimization with Stochastic Objective Functions, Deterministic Inequality Constraints and Robust Subproblems
Authors:
Songqiang Qiu,
Vyacheslav Kungurtsev
Abstract:
In this paper, a robust sequential quadratic programming method for constrained optimization is generalized to problem with an {expectation} objective function {and} deterministic equality and inequality constraints. A stochastic line search scheme is employed to globalize the steps. {We show theoretically that sequences generated by the algorithm converge almost surely to a Karush-Kuhn-Tucker poi…
▽ More
In this paper, a robust sequential quadratic programming method for constrained optimization is generalized to problem with an {expectation} objective function {and} deterministic equality and inequality constraints. A stochastic line search scheme is employed to globalize the steps. {We show theoretically that sequences generated by the algorithm converge almost surely to a Karush-Kuhn-Tucker point under the assumption of the extended Mangasarian-Fromovitz constraint qualification}. Encouraging numerical results are reported.
△ Less
Submitted 4 October, 2024; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Riemannian Stochastic Approximation for Minimizing Tame Nonsmooth Objective Functions
Authors:
Johannes Aspman,
Vyacheslav Kungurtsev,
Reza Roohi Seraji
Abstract:
In many learning applications, the parameters in a model are structurally constrained in a way that can be modeled as them lying on a Riemannian manifold. Riemannian optimization, wherein procedures to enforce an iterative minimizing sequence to be constrained to the manifold, is used to train such models. At the same time, tame geometry has become a significant topological description of nonsmoot…
▽ More
In many learning applications, the parameters in a model are structurally constrained in a way that can be modeled as them lying on a Riemannian manifold. Riemannian optimization, wherein procedures to enforce an iterative minimizing sequence to be constrained to the manifold, is used to train such models. At the same time, tame geometry has become a significant topological description of nonsmooth functions that appear in the landscapes of training neural networks and other important models with structural compositions of continuous nonlinear functions with nonsmooth maps. In this paper, we study the properties of such stratifiable functions on a manifold and the behavior of retracted stochastic gradient descent, with diminishing stepsizes, for minimizing such functions.
△ Less
Submitted 29 December, 2024; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Jump-Diffusion Langevin Dynamics for Multimodal Posterior Sampling
Authors:
Jacopo Guidolin,
Vyacheslav Kungurtsev,
Ondřej Kuželka
Abstract:
Bayesian methods of sampling from a posterior distribution are becoming increasingly popular due to their ability to precisely display the uncertainty of a model fit. Classical methods based on iterative random sampling and posterior evaluation such as Metropolis-Hastings are known to have desirable long run mixing properties, however are slow to converge. Gradient based methods, such as Langevin…
▽ More
Bayesian methods of sampling from a posterior distribution are becoming increasingly popular due to their ability to precisely display the uncertainty of a model fit. Classical methods based on iterative random sampling and posterior evaluation such as Metropolis-Hastings are known to have desirable long run mixing properties, however are slow to converge. Gradient based methods, such as Langevin Dynamics (and its stochastic gradient counterpart) exhibit favorable dimension-dependence and fast mixing times for log-concave, and "close" to log-concave distributions, however also have long escape times from local minimizers. Many contemporary applications such as Bayesian Neural Networks are both high-dimensional and highly multimodal. In this paper we investigate the performance of a hybrid Metropolis and Langevin sampling method akin to Jump Diffusion on a range of synthetic and real data, indicating that careful calibration of mixing sampling jumps with gradient based chains significantly outperforms both pure gradient-based or sampling based schemes.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Time-Varying Semidefinite Programming: Path Following a Burer-Monteiro Factorization
Authors:
Antonio Bellon,
Mareike Dressler,
Vyacheslav Kungurtsev,
Jakub Marecek,
André Uschmajew
Abstract:
We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer-Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure t…
▽ More
We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer-Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure the local injectivity of the low-rank factorization. The method produces a sequence of approximate solutions for the original TV-SDP problem, for which we show that they stay close to the optimal solution path if properly initialized. Numerical experiments for a time-varying max-cut SDP relaxation demonstrate the computational advantages of the proposed method for tracking TV-SDPs in terms of runtime compared to off-the-shelf interior point methods.
△ Less
Submitted 9 January, 2024; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Iteration Complexity of Variational Quantum Algorithms
Authors:
Vyacheslav Kungurtsev,
Georgios Korpas,
Jakub Marecek,
Elton Yechao Zhu
Abstract:
There has been much recent interest in near-term applications of quantum computers, i.e., using quantum circuits that have short decoherence times due to hardware limitations. Variational quantum algorithms (VQA), wherein an optimization algorithm implemented on a classical computer evaluates a parametrized quantum circuit as an objective function, are a leading framework in this space. An enormou…
▽ More
There has been much recent interest in near-term applications of quantum computers, i.e., using quantum circuits that have short decoherence times due to hardware limitations. Variational quantum algorithms (VQA), wherein an optimization algorithm implemented on a classical computer evaluates a parametrized quantum circuit as an objective function, are a leading framework in this space. An enormous breadth of algorithms in this framework have been proposed for solving a range of problems in machine learning, forecasting, applied physics, and combinatorial optimization, among others.
In this paper, we analyze the iteration complexity of VQA, that is, the number of steps that VQA requires until its iterates satisfy a surrogate measure of optimality. We argue that although VQA procedures incorporate algorithms that can, in the idealized case, be modeled as classic procedures in the optimization literature, the particular nature of noise in near-term devices invalidates the claim of applicability of off-the-shelf analyses of these algorithms. Specifically, noise makes the evaluations of the objective function via quantum circuits biased. Commonly used optimization procedures, such as SPSA and the parameter shift rule, can thus be seen as derivative-free optimization algorithms with biased function evaluations, for which there are currently no iteration complexity guarantees in the literature. We derive the missing guarantees and find that the rate of convergence is unaffected. However, the level of bias contributes unfavorably to both the constant therein, and the asymptotic distance to stationarity, i.e., the more bias, the farther one is guaranteed, at best, to reach a stationary point of the VQA objective.
△ Less
Submitted 8 September, 2024; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces
Authors:
Saeed Vahidian,
Mahdi Morafah,
Weijia Wang,
Vyacheslav Kungurtsev,
Chen Chen,
Mubarak Shah,
Bill Lin
Abstract:
Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same clus…
▽ More
Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same cluster can leverage each other's data to better perform federated learning. However, prior clustered FL algorithms attempt to learn these distribution similarities indirectly during training, which can be quite time consuming as many rounds of federated learning may be required until the formation of clusters is stabilized. In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal angles between the client data subspaces. Each client applies a truncated singular value decomposition (SVD) step on its local data in a single-shot manner to derive a small set of principal vectors, which provides a signature that succinctly captures the main characteristics of the underlying distribution. This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters. This is achieved by comparing the similarities of the principal angles between the client data subspaces spanned by those principal vectors. The approach provides a simple, yet effective clustered FL framework that addresses a broad range of data heterogeneity issues beyond simpler forms of Non-IIDness like label skews. Our clustered FL approach also enables convergence guarantees for non-convex objectives. Our code is available at https://github.com/MMorafah/PACFL.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Stochastic Langevin Differential Inclusions with Applications to Machine Learning
Authors:
Fabio V. Difonzo,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, who…
▽ More
Stochastic differential equations of Langevin-diffusion form have received significant attention, thanks to their foundational role in both Bayesian sampling algorithms and optimization in machine learning. In the latter, they serve as a conceptual model of the stochastic gradient flow in training over-parameterized models. However, the literature typically assumes smoothness of the potential, whose gradient is the drift term. Nevertheless, there are many problems for which the potential function is not continuously differentiable, and hence the drift is not Lipschitz continuous everywhere. This is exemplified by robust losses and Rectified Linear Units in regression problems. In this paper, we show some foundational results regarding the flow and asymptotic properties of Langevin-type Stochastic Differential Inclusions under assumptions appropriate to the machine-learning settings. In particular, we show strong existence of the solution, as well as an asymptotic minimization of the canonical free-energy functional.
△ Less
Submitted 12 May, 2024; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Retraction based Direct Search Methods for Derivative Free Riemannian Optimization
Authors:
Vyacheslav Kungurtsev,
Francesco Rinaldi,
Damiano Zeffiro
Abstract:
Direct search methods represent a robust and reliable class of algorithms for solving black-box optimization problems.
In this paper, we explore the application of those strategies to Riemannian optimization, wherein minimization is to be performed with respect to variables restricted to lie on a manifold. More specifically, we consider classic and line search extrapolated variants of direct sea…
▽ More
Direct search methods represent a robust and reliable class of algorithms for solving black-box optimization problems.
In this paper, we explore the application of those strategies to Riemannian optimization, wherein minimization is to be performed with respect to variables restricted to lie on a manifold. More specifically, we consider classic and line search extrapolated variants of direct search, and, by making use of retractions, we devise tailored strategies for the minimization of both smooth and nonsmooth functions.
As such we analyze, for the first time in the literature, a class of retraction based algorithms for minimizing nonsmooth objectives on a Riemannian manifold without having access to (sub)derivatives. Along with convergence guarantees we provide a set of numerical performance illustrations on a standard set of problems.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Stochastic Model Predictive Control, Iterated Function Systems, and Stability
Authors:
Vyacheslav Kungurtsev,
Jakub Marecek,
Robert Shorten
Abstract:
We present the observation that the process of stochastic model predictive control can be formulated in the framework of iterated function systems. The latter has a rich ergodic theory that can be applied to study the system's long-run behavior. We show how such a framework can be realized for specific problems and illustrate the required conditions for the application of relevant theoretical guar…
▽ More
We present the observation that the process of stochastic model predictive control can be formulated in the framework of iterated function systems. The latter has a rich ergodic theory that can be applied to study the system's long-run behavior. We show how such a framework can be realized for specific problems and illustrate the required conditions for the application of relevant theoretical guarantees.
△ Less
Submitted 13 October, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
On the Ergodic Control of Ensembles in the Presence of Non-linear Filters
Authors:
Vyacheslav Kungurtsev,
Jakub Marecek,
Ramen Ghosh,
Robert N. Shorten
Abstract:
In many sharing-economy applications, as well as in conventional economy applications, one wishes to regulate the behaviour of an ensemble of agents with guarantees on both the regulation of the ensemble in aggregate and the revenue or quality of service associated with each agent. Previous work [Automatica, Volume 108, 108483, arXiv:1807.03256 ] has developed guarantees of unique ergodicity when…
▽ More
In many sharing-economy applications, as well as in conventional economy applications, one wishes to regulate the behaviour of an ensemble of agents with guarantees on both the regulation of the ensemble in aggregate and the revenue or quality of service associated with each agent. Previous work [Automatica, Volume 108, 108483, arXiv:1807.03256 ] has developed guarantees of unique ergodicity when there are linear filters. Here, we extend the guarantees to systems including non-linear elements, such as non-linear filters.
△ Less
Submitted 12 September, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Randomized Algorithms for Monotone Submodular Function Maximization on the Integer Lattice
Authors:
Alberto Schiabel,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
Optimization problems with set submodular objective functions have many real-world applications. In discrete scenarios, where the same item can be selected more than once, the domain is generalized from a 2-element set to a bounded integer lattice. In this work, we consider the problem of maximizing a monotone submodular function on the bounded integer lattice subject to a cardinality constraint.…
▽ More
Optimization problems with set submodular objective functions have many real-world applications. In discrete scenarios, where the same item can be selected more than once, the domain is generalized from a 2-element set to a bounded integer lattice. In this work, we consider the problem of maximizing a monotone submodular function on the bounded integer lattice subject to a cardinality constraint. In particular, we focus on maximizing DR-submodular functions, i.e., functions defined on the integer lattice that exhibit the diminishing returns property. Given any epsilon > 0, we present a randomized algorithm with probabilistic guarantees of O(1 - 1/e - epsilon) approximation, using a framework inspired by a Stochastic Greedy algorithm developed for set submodular functions by Mirzasoleiman et al. We then show that, on synthetic DR-submodular functions, applying our proposed algorithm on the integer lattice is faster than the alternatives, including reducing a target problem to the set domain and then applying the fastest known set submodular maximization algorithm.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
Decentralized Asynchronous Non-convex Stochastic Optimization on Directed Graphs
Authors:
Vyacheslav Kungurtsev,
Mahdi Morafah,
Tara Javidi,
Gesualdo Scutari
Abstract:
Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a no…
▽ More
Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a noisy estimate of the gradient of its own function (one component of the sum of objective functions). We proposed an asynchronous distributed algorithm for such a class of problems. The algorithm combines stochastic gradients with tracking in an asynchronous push-sum framework and obtain the standard sublinear convergence rate for general non-convex functions, matching the rate of centralized stochastic gradient descent SGD.
Our experiments on a non-convex image classification task using convolutional neural network validate the convergence of our proposed algorithm across different number of nodes and graph connectivity percentages.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo
Authors:
Vyacheslav Kungurtsev,
Adam Cobb,
Tara Javidi,
Brian Jalaian
Abstract:
Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large para…
▽ More
Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large parameter dimensions. Such methods have only recently appeared in the decentralized setting, and either exclusively use stochastic gradient Langevin and Hamiltonian Monte Carlo approaches that require a diminishing stepsize to asymptotically sample from the posterior and are known in practice to characterize uncertainty less faithfully than constant step-size methods with a Metropolis adjustment, or assume strong convexity properties of the potential function. We present the first approach to incorporating constant stepsize Metropolis-adjusted HMC in the decentralized sampling framework, show theoretical guarantees for consensus and probability distance to the posterior stationary distribution, and demonstrate their effectiveness numerically on standard real world problems, including decentralized learning of neural networks which is known to be highly non-convex.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Regularized quasi-monotone method for stochastic optimization
Authors:
Vyacheslav Kungurtsev,
Vladimir Shikhman
Abstract:
We adapt the quasi-monotone method from [2] for composite convex minimization in the stochastic setting. For the proposed numerical scheme we derive the optimal convergence rate in terms of the last iterate, rather than on average as it is standard for subgradient methods. The theoretical guarantee for individual convergence of the regularized quasi-monotone method is confirmed by numerical experi…
▽ More
We adapt the quasi-monotone method from [2] for composite convex minimization in the stochastic setting. For the proposed numerical scheme we derive the optimal convergence rate in terms of the last iterate, rather than on average as it is standard for subgradient methods. The theoretical guarantee for individual convergence of the regularized quasi-monotone method is confirmed by numerical experiments on l1-regularized robust linear regression.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
On the effect of perturbations in first-order optimization methods with inertia and Hessian driven damping
Authors:
Hedy Attouch,
Jalal Fadili,
Vyacheslav Kungurtsev
Abstract:
Second-order continuous-time dissipative dynamical systems with viscous and Hessian driven damping have inspired effective first-order algorithms for solving convex optimization problems. While preserving the fast convergence properties of the Nesterov-type acceleration, the Hessian driven damping makes it possible to significantly attenuate the oscillations. To study the stability of these algori…
▽ More
Second-order continuous-time dissipative dynamical systems with viscous and Hessian driven damping have inspired effective first-order algorithms for solving convex optimization problems. While preserving the fast convergence properties of the Nesterov-type acceleration, the Hessian driven damping makes it possible to significantly attenuate the oscillations. To study the stability of these algorithms with respect to perturbations, we analyze the behaviour of the corresponding continuous systems when the gradient computation is subject to exogenous additive errors. We provide a quantitative analysis of the asymptotic behaviour of two types of systems, those with implicit and explicit Hessian driven damping. We consider convex, strongly convex, and non-smooth objective functions defined on a real Hilbert space and show that, depending on the formulation, different integrability conditions on the perturbations are sufficient to maintain the convergence rates of the systems. We highlight the differences between the implicit and explicit Hessian damping, and in particular point out that the assumptions on the objective and perturbations needed in the implicit case are more stringent than in the explicit case.
△ Less
Submitted 17 March, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Trilevel and Multilevel Optimization using Monotone Operator Theory
Authors:
Allahkaram Shafiei,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
We consider rather a general class of multi-level optimization problems, where a convex objective function is to be minimized subject to constraints of optimality of nested convex optimization problems. As a special case, we consider a trilevel optimization problem, where the objective of the two lower layers consists of a sum of a smooth and a non-smooth term.~Based on fixed-point theory and rela…
▽ More
We consider rather a general class of multi-level optimization problems, where a convex objective function is to be minimized subject to constraints of optimality of nested convex optimization problems. As a special case, we consider a trilevel optimization problem, where the objective of the two lower layers consists of a sum of a smooth and a non-smooth term.~Based on fixed-point theory and related arguments, we present a natural first-order algorithm and analyze its convergence and rates of convergence in several regimes of parameters.
△ Less
Submitted 19 October, 2023; v1 submitted 19 May, 2021;
originally announced May 2021.
-
Parametric Semidefinite Programming: Geometry of the Trajectory of Solutions
Authors:
Antonio Bellon,
Didier Henrion,
Vyacheslav Kungurtsev,
Jakub Marecek
Abstract:
In many applications, solutions of convex optimization problems are updated on-line, as functions of time. In this paper, we consider parametric semidefinite programs, which are linear optimization problems in the semidefinite cone whose coefficients (input data) depend on a time parameter. We are interested in the geometry of the solution (output data) trajectory, defined as the set of solutions…
▽ More
In many applications, solutions of convex optimization problems are updated on-line, as functions of time. In this paper, we consider parametric semidefinite programs, which are linear optimization problems in the semidefinite cone whose coefficients (input data) depend on a time parameter. We are interested in the geometry of the solution (output data) trajectory, defined as the set of solutions depending on the parameter. We propose an exhaustive description of the geometry of the solution trajectory. As our main result, we show that only six distinct behaviors can be observed at a neighborhood of a given point along the solution trajectory. Each possible behavior is then illustrated by an example.
△ Less
Submitted 10 October, 2023; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Decentralized Langevin Dynamics over a Directed Graph
Authors:
Alexander Kolesov,
Vyacheslav Kungurtsev
Abstract:
The prevalence of technologies in the space of the Internet of Things and use of multi-processing computing platforms to aid in the computation required to perform learning and inference from large volumes of data has necessitated the extensive study of algorithms on decentralized platforms. In these settings, computing nodes send and receive data across graph-structured communication links, and u…
▽ More
The prevalence of technologies in the space of the Internet of Things and use of multi-processing computing platforms to aid in the computation required to perform learning and inference from large volumes of data has necessitated the extensive study of algorithms on decentralized platforms. In these settings, computing nodes send and receive data across graph-structured communication links, and using a combination of local computation and consensus-seeking communication, cooperately solve a problem of interest. Recently, Langevin dynamics as a tool for high dimensional sampling and posterior Bayesian inference has been studied in the context of a decentralized operation. However, this work has been limited to undirected graphs, wherein all communication is two-sided, i.e., if node A can send data to node B, then node B can also send data to node A. We extend the state of the art in considering Langevin dynamics on directed graphs.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Asynchronous Optimization over Graphs: Linear Convergence under Error Bound Conditions
Authors:
Loris Cannelli,
Francisco Facchinei,
Gesualdo Scutari,
Vyacheslav Kungurtsev
Abstract:
We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our k…
▽ More
We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our knowledge, the first distributed, asynchronous algorithm with rate guarantees for this class of problems. When the objective function is nonconvex, the algorithm provably converges to a stationary solution at a sublinear rate whereas linear rate is achieved when the objective satisfies under the renowned Luo-Tseng error bound condition (which is less stringent than strong convexity). Numerical results on matrix completion and LASSO problems show the effectiveness of our method.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Sensitivity Assisted Alternating Directions Method of Multipliers for Distributed Optimization and Statistical Learning
Authors:
Dinesh Krishnamoorthy,
Vyacheslav Kungurtsev
Abstract:
This paper considers the problem of distributed model fitting using the alternating directions method of multipliers (ADMM). ADMM splits the learning problem into several smaller subproblems, usually by partitioning the data samples. The different subproblems can be solved in parallel by a set of worker computing nodes coordinated by a master node, and the subproblems are repeatedly solved until c…
▽ More
This paper considers the problem of distributed model fitting using the alternating directions method of multipliers (ADMM). ADMM splits the learning problem into several smaller subproblems, usually by partitioning the data samples. The different subproblems can be solved in parallel by a set of worker computing nodes coordinated by a master node, and the subproblems are repeatedly solved until convergence. At each iteration, the worker nodes must solve a convex optimization problem whose difficulty increases with the size of the problem. In this paper, we propose a sensitivity-assisted ADMM algorithm that leverages the parametric sensitivities such that the subproblems solutions can be approximated using a tangential predictor, thus easing the computational burden to computing one linear solve. We study the convergence properties of the proposed sensitivity-assisted ADMM algorithm. The numerical performance of the algorithm is illustrated on a nonlinear parameter estimation problem, and a multilayer perceptron learning problem.
△ Less
Submitted 2 March, 2022; v1 submitted 12 September, 2020;
originally announced September 2020.
-
A Nonmonotone Matrix-Free Algorithm for Nonlinear Equality-Constrained Least-Squares Problems
Authors:
E. Bergou,
Y. Diouane,
V. Kungurtsev,
C. W. Royer
Abstract:
Least squares form one of the most prominent classes of optimization problems, with numerous applications in scientific computing and data fitting. When such formulations aim at modeling complex systems, the optimization process must account for nonlinear dynamics by incorporating constraints. In addition, these systems often incorporate a large number of variables, which increases the difficulty…
▽ More
Least squares form one of the most prominent classes of optimization problems, with numerous applications in scientific computing and data fitting. When such formulations aim at modeling complex systems, the optimization process must account for nonlinear dynamics by incorporating constraints. In addition, these systems often incorporate a large number of variables, which increases the difficulty of the problem, and motivates the need for efficient algorithms amenable to large-scale implementations.
In this paper, we propose and analyze a Levenberg-Marquardt algorithm for nonlinear least squares subject to nonlinear equality constraints. Our algorithm is based on inexact solves of linear least-squares problems, that only require Jacobian-vector products. Global convergence is guaranteed by the combination of a composite step approach and a nonmonotone step acceptance rule. We illustrate the performance of our method on several test cases from data assimilation and inverse problems: our algorithm is able to reach the vicinity of a solution from an arbitrary starting point, and can outperform the most natural alternatives for these classes of problems.
△ Less
Submitted 28 May, 2021; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Convergence and Complexity Analysis of a Levenberg-Marquardt Algorithm for Inverse Problems
Authors:
E. Bergou,
Y. Diouane,
V. Kungurtsev
Abstract:
The Levenberg-Marquardt algorithm is one of the most popular algorithms for finding the solution of nonlinear least squares problems. Across different modified variations of the basic procedure, the algorithm enjoys global convergence, a competitive worst case iteration complexity rate, and a guaranteed rate of local convergence for both zero and nonzero small residual problems, under suitable ass…
▽ More
The Levenberg-Marquardt algorithm is one of the most popular algorithms for finding the solution of nonlinear least squares problems. Across different modified variations of the basic procedure, the algorithm enjoys global convergence, a competitive worst case iteration complexity rate, and a guaranteed rate of local convergence for both zero and nonzero small residual problems, under suitable assumptions. We introduce a novel Levenberg-Marquardt method that matches, simultaneously, the state of the art in all of these convergence properties with a single seamless algorithm. Numerical experiments confirm the theoretical behavior of our proposed algorithm.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Complexity iteration analysis for strongly convex multi-objective optimization using a Newton path-following procedure
Authors:
E. Bergou,
Y. Diouane,
V. Kungurtsev
Abstract:
In this note we consider the iteration complexity of solving strongly convex multi objective optimization. We discuss the precise meaning of this problem, and indicate it is loosely defined, but the most natural notion is to find a set of Pareto optimal points across a grid of scalarized problems. We derive that in most cases, performing sensitivity based path-following after obtaining one solutio…
▽ More
In this note we consider the iteration complexity of solving strongly convex multi objective optimization. We discuss the precise meaning of this problem, and indicate it is loosely defined, but the most natural notion is to find a set of Pareto optimal points across a grid of scalarized problems. We derive that in most cases, performing sensitivity based path-following after obtaining one solution is the optimal strategy for this task in terms of iteration complexity.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Zero Order Stochastic Weakly Convex Composite Optimization
Authors:
V. Kungurtsev,
F. Rinaldi
Abstract:
In this paper we consider stochastic weakly convex composite problems, however without the existence of a stochastic subgradient oracle. We present a derivative free algorithm that uses a two point approximation for computing a gradient estimate of the smoothed function. We prove convergence at a similar rate as state of the art methods, however with a larger constant, and report some numerical re…
▽ More
In this paper we consider stochastic weakly convex composite problems, however without the existence of a stochastic subgradient oracle. We present a derivative free algorithm that uses a two point approximation for computing a gradient estimate of the smoothed function. We prove convergence at a similar rate as state of the art methods, however with a larger constant, and report some numerical results showing the effectiveness of the approach.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Decentralized Langevin Dynamics
Authors:
Vyacheslav Kungurtsev
Abstract:
Langevin MCMC gradient optimization is a class of increasingly popular methods for estimating a posterior distribution. This paper addresses the algorithm as applied in a decentralized setting, wherein data is distributed across a network of agents which act to cooperatively solve the problem using peer-to-peer gossip communication. We show, theoretically, results in 1) the time-complexity to $ε$-…
▽ More
Langevin MCMC gradient optimization is a class of increasingly popular methods for estimating a posterior distribution. This paper addresses the algorithm as applied in a decentralized setting, wherein data is distributed across a network of agents which act to cooperatively solve the problem using peer-to-peer gossip communication. We show, theoretically, results in 1) the time-complexity to $ε$-consensus for the continuous time stochastic differential equation, 2) convergence rate in $L^2$ norm to consensus for the discrete implementation as defined by the Euler-Maruyama discretization and 3) convergence rate in the Wasserstein metric to the optimal stationary distribution for the discretized dynamics.
△ Less
Submitted 19 September, 2020; v1 submitted 2 January, 2020;
originally announced January 2020.
-
Distributed Stochastic Nonsmooth Nonconvex Optimization
Authors:
Vyacheslav Kungurtsev
Abstract:
Distributed consensus optimization has received considerable attention in recent years; several distributed consensus-based algorithms have been proposed for (nonsmooth) convex and (smooth) nonconvex objective functions. However, the behavior of these distributed algorithms on {\it nonconvex, nonsmooth and stochastic} objective functions is not understood. This class of functions and distributed s…
▽ More
Distributed consensus optimization has received considerable attention in recent years; several distributed consensus-based algorithms have been proposed for (nonsmooth) convex and (smooth) nonconvex objective functions. However, the behavior of these distributed algorithms on {\it nonconvex, nonsmooth and stochastic} objective functions is not understood. This class of functions and distributed setting are motivated by several applications, including problems in machine learning and signal processing.
This paper presents the first convergence analysis of the decentralized stochastic subgradient method for such classes of problems, over networks modeled as undirected, fixed, graphs.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Authors:
Vyacheslav Kungurtsev,
Malcolm Egan,
Bapi Chatterjee,
Dan Alistarh
Abstract:
Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees exist beyond cases where closed-form proximal operator solutions are available. As most popular contemporary deep neural networks lead to nonsmooth and nonconvex…
▽ More
Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees exist beyond cases where closed-form proximal operator solutions are available. As most popular contemporary deep neural networks lead to nonsmooth and nonconvex objectives, there is now a pressing need for such convergence guarantees. In this paper, we analyze for the first time the convergence of stochastic asynchronous optimization for this general class of objectives. In particular, we focus on stochastic subgradient methods allowing for block variable partitioning, where the shared-memory-based model is asynchronously updated by concurrent processes. To this end, we first introduce a probabilistic model which captures key features of real asynchronous scheduling between concurrent processes; under this model, we establish convergence with probability one to an invariant set for stochastic subgradient methods with momentum.
From the practical perspective, one issue with the family of methods we consider is that it is not efficiently supported by machine learning frameworks, as they mostly focus on distributed data-parallel strategies. To address this, we propose a new implementation strategy for shared-memory based training of deep neural networks, whereby concurrent parameter servers are utilized to train a partitioned but shared model in single- and multi-GPU settings. Based on this implementation, we achieve on average 1.2x speed-up in comparison to state-of-the-art training methods for popular image classification tasks without compromising accuracy.
△ Less
Submitted 11 July, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
A Comparison of Nonsmooth, Nonconvex, Constrained Optimization Solvers for the Design of Time-Delay Compensators
Authors:
Vyacheslav Kungurtsev,
Tim Mitchell,
Tomas Vyhlidal
Abstract:
We present a detailed set of performance comparisons of two state-of-the-art solvers for the application of designing time-delay compensators, an important problem in the field of robust control. Formulating such robust control mechanics as constrained optimization problems often involves objective and constraint functions that are both nonconvex and nonsmooth, both of which present significant ch…
▽ More
We present a detailed set of performance comparisons of two state-of-the-art solvers for the application of designing time-delay compensators, an important problem in the field of robust control. Formulating such robust control mechanics as constrained optimization problems often involves objective and constraint functions that are both nonconvex and nonsmooth, both of which present significant challenges to many solvers and their end-users hoping to obtain good solutions to these problems. In our particular engineering task, the main difficulty in the optimization arises in a nonsmooth and nonconvex stability constraint, which states that the infinite spectrum of zeros of the so-called shaper should remain in the open left half-plane. To perform our evaluation, we make use $β$-relative minimization profiles, recently introduced visualization tools that are particularly suited for benchmarking solvers on nonsmooth, nonconvex, constrained optimization problems. Furthermore, we also introduce new visualization tools, called Global-Local Profiles, which for a given problem and a fixed computational budget, assess the tradeoffs of distributing the budget over few or many starting points, with the former getting more budget per point and latter less.
△ Less
Submitted 30 December, 2018;
originally announced December 2018.
-
A Subsampling Line-Search Method with Second-Order Results
Authors:
El-houcine Bergou,
Youssef Diouane,
Vladimir Kunc,
Vyacheslav Kungurtsev,
Clément W. Royer
Abstract:
In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization such as a trust region or a…
▽ More
In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization such as a trust region or a line search. Using subsampled function values is particularly challenging for the latter strategy, which relies upon multiple evaluations. On top of that all, there has been an increasing interest for nonconvex formulations of data-related problems, such as training deep learning models. For such instances, one aims at developing methods that converge to second-order stationary points quickly, i.e., escape saddle points efficiently. This is particularly delicate to ensure when one only accesses subsampled approximations of the objective and its derivatives.
In this paper, we describe a stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective. A line-search technique is used to enforce suitable decrease for this model, and for a sufficiently large sample, a similar amount of reduction holds for the true objective. By using probabilistic reasoning, we can then obtain worst-case complexity guarantees for our framework, leading us to discuss appropriate notions of stationarity in a subsampling context. Our analysis encompasses the deterministic regime, and allows us to identify sampling requirements for second-order line-search paradigms. As we illustrate through real data experiments, these worst-case estimates need not be satisfied for our method to be competitive with first-order strategies in practice.
△ Less
Submitted 30 June, 2021; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Second-order Guarantees of Distributed Gradient Algorithms
Authors:
Amir Daneshmand,
Gesualdo Scutari,
Vyacheslav Kungurtsev
Abstract:
We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more rec…
▽ More
We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more recent class of distributed algorithms based on gradient tracking--implementable also over digraphs--likely converges to exact SoS solutions, thus avoiding (strict) saddle-points. Furthermore, new convergence rate results to first-order critical points is established for the latter class of algorithms.
△ Less
Submitted 25 May, 2020; v1 submitted 23 September, 2018;
originally announced September 2018.
-
A stochastic Levenberg-Marquardt method using random models with complexity results
Authors:
E. Bergou,
Y. Diouane,
V. Kungurtsev,
C. W. Royer
Abstract:
Globally convergent variants of the Gauss-Newton algorithm are often the methods of choice to tackle nonlinear least-squares problems. Among such frameworks, Levenberg-Marquardt and trust-region methods are two well-established, similar paradigms. Both schemes have been studied when the Gauss-Newton model is replaced by a random model that is only accurate with a given probability. Trust-region sc…
▽ More
Globally convergent variants of the Gauss-Newton algorithm are often the methods of choice to tackle nonlinear least-squares problems. Among such frameworks, Levenberg-Marquardt and trust-region methods are two well-established, similar paradigms. Both schemes have been studied when the Gauss-Newton model is replaced by a random model that is only accurate with a given probability. Trust-region schemes have also been applied to problems where the objective value is subject to noise: this setting is of particular interest in fields such as data assimilation, where efficient methods that can adapt to noise are needed to account for the intrinsic uncertainty in the input data.
In this paper, we describe a stochastic Levenberg-Marquardt algorithm that handles noisy objective function values and random models, provided sufficient accuracy is achieved in probability. Our method relies on a specific scaling of the regularization parameter, that allows us to leverage existing results for trust-region algorithms. Moreover, we exploit the structure of our objective through the use of a family of stationarity criteria tailored to least-squares problems. Provided the probability of accurate function estimates and models is sufficiently large, we bound the expected number of iterations needed to reach an approximate stationary point, which generalizes results based on using deterministic models or noiseless function values. We illustrate the link between our approach and several applications related to inverse problems and machine learning.
△ Less
Submitted 18 November, 2021; v1 submitted 5 July, 2018;
originally announced July 2018.
-
Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems
Authors:
Vyacheslav Kungurtsev,
Tomas Pevny
Abstract:
Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the p…
▽ More
Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the problem. However, in addition, there are a number of interesting problems for which the objective function is non- smooth and nonseparable. In this paper, we summarize the primary challenges involved, the state of the art, and present some numerical results on an interesting and representative class of problems.
△ Less
Submitted 30 June, 2018;
originally announced July 2018.