-
IPAS: An Adaptive Sample Size Method for Weighted Finite Sum Problems with Linear Equality Constraints
Authors:
Nataša Krejić,
Nataša Krklec Jerinkić,
Sanja Rapajić,
Luka Rutešić
Abstract:
Optimization problems with the objective function in the form of weighted sum and linear equality constraints are considered. Given that the number of local cost functions can be large as well as the number of constraints, a stochastic optimization method is proposed. The method belongs to the class of variable sample size first order methods, where the sample size is adaptive and governed by the…
▽ More
Optimization problems with the objective function in the form of weighted sum and linear equality constraints are considered. Given that the number of local cost functions can be large as well as the number of constraints, a stochastic optimization method is proposed. The method belongs to the class of variable sample size first order methods, where the sample size is adaptive and governed by the additional sampling technique earlier proposed in unconstrained optimization framework. The resulting algorithm may be a mini-batch method, increasing sample size method, or even deterministic in a sense that it eventually reaches the full sample size, depending on the problem and similarity of the local cost functions. Regarding the constraints, the method uses controlled, but inexact projections on the feasible set, yielding possibly infeasible iterates. Almost sure convergence is proved under some standard assumptions for the stochastic framework, without imposing the convexity. Numerical results on relevant problems from CUTEst collection and real-world data sets for logistic regression show the stability and the efficiency of the proposed method when compared to the state-of-the-art methods.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
SMOP: Stochastic trust region method for multi-objective problems
Authors:
Nataša Krejić,
Nataša Krklec Jerinkić,
Luka Rutešić
Abstract:
The problem considered is a multi-objective optimization problem, in which the goal is to find an optimal value of a vector function representing various criteria. The aim of this work is to develop an algorithm which utilizes the trust region framework with probabilistic model functions, able to cope with noisy problems, using inaccurate functions and gradients. We prove the almost sure convergen…
▽ More
The problem considered is a multi-objective optimization problem, in which the goal is to find an optimal value of a vector function representing various criteria. The aim of this work is to develop an algorithm which utilizes the trust region framework with probabilistic model functions, able to cope with noisy problems, using inaccurate functions and gradients. We prove the almost sure convergence of the proposed algorithm to a Pareto critical point if the model functions are good approximations in probabilistic sense. Numerical results demonstrate effectiveness of the probabilistic trust region by comparing it to competitive stochastic multi-objective solvers. The application in supervised machine learning is showcased by training non discriminatory Logistic Regression models on different size data groups. Additionally, we use several test examples with irregularly shaped fronts to exhibit the efficiency of the algorithm.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Parallel Inexact Levenberg-Marquardt Method for Nearly-Separable Nonlinear Least Squares
Authors:
Lidija Fodor,
Dusan Jakovetic,
Natasa Krejic,
Greta Malaspina
Abstract:
Motivated by localization problems such as cadastral maps refinements, we consider a generic Nonlinear Least Squares (NLS) problem of minimizing an aggregate squared fit across all nonlinear equations (measurements) with respect to the set of unknowns, e.g., coordinates of the unknown points' locations. In a number of scenarios, NLS problems exhibit a nearly-separable structure: the set of measure…
▽ More
Motivated by localization problems such as cadastral maps refinements, we consider a generic Nonlinear Least Squares (NLS) problem of minimizing an aggregate squared fit across all nonlinear equations (measurements) with respect to the set of unknowns, e.g., coordinates of the unknown points' locations. In a number of scenarios, NLS problems exhibit a nearly-separable structure: the set of measurements can be partitioned into disjoint groups (blocks), such that the unknowns that correspond to different blocks are only loosely coupled. We propose an efficient parallel method, termed Parallel Inexact Levenberg Marquardt (PILM), to solve such generic large scale NLS problems. PILM builds upon the classical Levenberg-Marquard (LM) method, with a main novelty in that the nearly-block separable structure is leveraged in order to obtain a scalable parallel method. Therein, the problem-wide system of linear equations that needs to be solved at every LM iteration is tackled iteratively. At each (inner) iteration, the block-wise systems of linear equations are solved in parallel, while the problem-wide system is then handled via sparse, inexpensive inter-block communication. We establish strong convergence guarantees of PILM that are analogous to those of the classical LM; provide PILM implementation in a master-worker parallel compute environment; and demonstrate its efficiency on huge scale cadastral map refinement problems.
△ Less
Submitted 14 January, 2025; v1 submitted 14 December, 2023;
originally announced December 2023.
-
A non-monotone trust-region method with noisy oracles and additional sampling
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Angeles Martinez,
Mahsa Yousefi
Abstract:
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradie…
▽ More
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.
△ Less
Submitted 17 January, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums
Authors:
Stefania Bellavia,
Nataša Krejić,
Nataša Krklec Jerinkić,
Marcos Raydan
Abstract:
The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum,…
▽ More
The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum, we keep the same subsample for several iterations before subsampling again. We analyze the required algorithmic features and the conditions for almost sure convergence, and present initial numerical results that show the advantages of the proposed method.
△ Less
Submitted 8 October, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Distributed Inexact Newton Method with Adaptive Step Sizes
Authors:
Dusan Jakovetic,
Natasa Krejic,
Greta Malaspina
Abstract:
We consider two formulations for distributed optimization wherein $N$ agents in a generic connected network solve a problem of common interest: distributed personalized optimization and consensus optimization. A new method termed DINAS (Distributed Inexact Newton method with Adaptive Stepsize) is proposed. DINAS employs large adaptively computed step-sizes, requires a reduced global parameters kno…
▽ More
We consider two formulations for distributed optimization wherein $N$ agents in a generic connected network solve a problem of common interest: distributed personalized optimization and consensus optimization. A new method termed DINAS (Distributed Inexact Newton method with Adaptive Stepsize) is proposed. DINAS employs large adaptively computed step-sizes, requires a reduced global parameters knowledge with respect to existing alternatives, and can operate without any local Hessian inverse calculations nor Hessian communications. When solving personalized distributed learning formulations, DINAS achieves quadratic convergence with respect to computational cost and linear convergence with respect to communication cost, the latter rate being independent of the local functions condition numbers or of the network topology. When solving consensus optimization problems, DINAS is shown to converge to the global solution. Extensive numerical experiments demonstrate significant improvements of DINAS over existing alternatives. As a result of independent interest, we provide for the first time convergence analysis of the Newton method with the adaptive Polyak's step-size when the Newton direction is computed inexactly in centralized environment.
△ Less
Submitted 14 January, 2025; v1 submitted 23 May, 2023;
originally announced May 2023.
-
A low-cost alternating projection approach for a continuous formulation of convex and cardinality constrained optimization
Authors:
Nataša Krejić,
Evelin H. M. Krulikovski,
Marcos Raydan
Abstract:
We consider convex constrained optimization problems that also include a cardinality constraint. In general, optimization problems with cardinality constraints are difficult mathematical programs which are usually solved by global techniques from discrete optimization. We assume that the region defined by the convex constraints can be written as the intersection of a finite collection of convex se…
▽ More
We consider convex constrained optimization problems that also include a cardinality constraint. In general, optimization problems with cardinality constraints are difficult mathematical programs which are usually solved by global techniques from discrete optimization. We assume that the region defined by the convex constraints can be written as the intersection of a finite collection of convex sets, such that it is easy and inexpensive to project onto each one of them (e.g., boxes, hyper-planes, or half-spaces).
Taking advantage of a recently developed continuous reformulation that relaxes the cardinality constraint, we propose a specialized penalty gradient projection scheme combined with alternating projection ideas to solve these problems. To illustrate the combined scheme, we focus on the standard mean-variance portfolio optimization problem for which we can only invest in a preestablished limited number of assets.
For these portfolio problems with cardinality constraints we present a numerical study on a variety of data sets involving real-world capital market indices from major stock markets. On those data sets we illustrate the computational performance of the proposed scheme to produce the effective frontiers for different values of the limited number of allowed assets.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Splitted Levenberg-Marquardt Method for Large-Scale Sparse Problems
Authors:
Natasa Krejic,
Greta Malaspina,
Lense Swaenen
Abstract:
We consider large-scale nonlinear least squares problems with sparse residuals, each of them depending on a small number of variables. A decoupling procedure which results in a splitting of the original problems into a sequence of independent problems of smaller sizes is proposed and analysed. The smaller size problems are modified in a way that offsets the error made by disregarding dependencies…
▽ More
We consider large-scale nonlinear least squares problems with sparse residuals, each of them depending on a small number of variables. A decoupling procedure which results in a splitting of the original problems into a sequence of independent problems of smaller sizes is proposed and analysed. The smaller size problems are modified in a way that offsets the error made by disregarding dependencies that allow us to split the original problem. The resulting method is a modification of the Levenberg-Marquardt method with smaller computational costs. Global convergence is proved as well as local linear convergence under suitable assumptions on sparsity. The method is tested on the network localization simulated problems with up to one million variables and its efficiency is demonstrated.
△ Less
Submitted 11 January, 2023; v1 submitted 10 June, 2022;
originally announced June 2022.
-
A Hessian inversion-free exact second order method for distributed consensus optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of…
▽ More
We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of local cost Hessians. However, existing methods involve explicit calculation of local Hessian inverses at each iteration that may be very costly when the dimension of the optimization variable is large. In this paper we develop a novel method termed INDO Inexact Newton method for Distributed Optimization that alleviates the need for Hessian inverse calculation. INDO follows the PMM framework but unlike existing work approximates the Newton direction through a generic fixed point method, e.g., Jacobi Overrelaxation, that does not involve Hessian inverses. We prove exact global linear convergence of INDO and provide analytical studies on how the degree of inexactness in the Newton direction calculation affects the overall methods convergence factor. Numerical experiments on several real data sets demonstrate that INDOs speed is on par or better as state of the art methods iterationwise hence having a comparable communication cost. At the same time, for sufficiently large optimization problem dimensions n (even at n on the order of couple of hundreds), INDO achieves savings in computational cost by at least an order of magnitude.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Spectral Projected Subgradient Method for Nonsmooth Convex Optimization Problems
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Tijana Ostojic
Abstract:
We consider constrained optimization problems with a nonsmooth objective function in the form of mathematical expectation. The Sample Average Approximation (SAA) is used to estimate the objective function and variable sample size strategy is employed. The proposed algorithm combines an SAA subgradient with the spectral coefficient in order to provide a suitable direction which improves the perform…
▽ More
We consider constrained optimization problems with a nonsmooth objective function in the form of mathematical expectation. The Sample Average Approximation (SAA) is used to estimate the objective function and variable sample size strategy is employed. The proposed algorithm combines an SAA subgradient with the spectral coefficient in order to provide a suitable direction which improves the performance of the first order method as shown by numerical results. The step sizes are chosen from the predefined interval and the almost sure convergence of the method is proved under the standard assumptions in stochastic environment. To enhance the performance of the proposed algorithm, we further specify the choice of the step size by introducing an Armijo-like procedure adapted to this framework. Considering the computational cost on machine learning problems, we conclude that the line search improves the performance significantly. Numerical experiments conducted on finite sums problems also reveal that the variable sample strategy outperforms the full sample approach.
△ Less
Submitted 8 August, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
A harmonic framework for stepsize selection in gradient methods
Authors:
Giulia Ferrandi,
Michiel E. Hochstenbach,
Natasa Krejic
Abstract:
We study the use of inverse harmonic Rayleigh quotients with target for the stepsize selection in gradient methods for nonlinear unconstrained optimization problems. This provides not only an elegant and flexible framework to parametrize and reinterpret existing stepsize schemes, but also gives inspiration for new flexible and tunable families of steplengths. In particular, we analyze and extend t…
▽ More
We study the use of inverse harmonic Rayleigh quotients with target for the stepsize selection in gradient methods for nonlinear unconstrained optimization problems. This provides not only an elegant and flexible framework to parametrize and reinterpret existing stepsize schemes, but also gives inspiration for new flexible and tunable families of steplengths. In particular, we analyze and extend the adaptive Barzilai-Borwein method to a new family of stepsizes. While this family exploits negative values for the target, we also consider positive targets. We present a convergence analysis for quadratic problems extending results by Dai and Liao (2002), and carry out experiments outlining the potential of the approaches.
△ Less
Submitted 20 October, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
A stochastic first-order trust-region method with inexact restoration for finite-sum minimization
Authors:
Stefania Bellavia,
Natasa Krejic,
Benedetta Morini,
Simone Rebegoldi
Abstract:
We propose a stochastic first-order trust-region method with inexact function and gradient evaluations for solving finite-sum minimization problems. Using a suitable reformulation of the given problem, our method combines the inexact restoration approach for constrained optimization with the trust-region procedure and random models. Differently from other recent stochastic trust-region schemes, ou…
▽ More
We propose a stochastic first-order trust-region method with inexact function and gradient evaluations for solving finite-sum minimization problems. Using a suitable reformulation of the given problem, our method combines the inexact restoration approach for constrained optimization with the trust-region procedure and random models. Differently from other recent stochastic trust-region schemes, our proposed algorithm improves feasibility and optimality in a modular way. We provide the expected number of iterations for reaching a near-stationary point by imposing some probability accuracy requirements on random functions and gradients which are, in general, less stringent than the corresponding ones in literature. We validate the proposed algorithm on some nonconvex optimization problems arising in binary classification and regression, showing that it performs well in terms of cost and accuracy, and allows to reduce the burdensome tuning of the hyper-parameters involved.
△ Less
Submitted 22 October, 2022; v1 submitted 7 July, 2021;
originally announced July 2021.
-
An inexact restoration-nonsmooth algorithm with variable accuracy for stochastic nonsmooth convex optimization problems in machine learning and stochastic linear complementarity problems
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Tijana Ostojic
Abstract:
We study unconstrained optimization problems with nonsmooth and convex objective function in the form of a mathematical expectation. The proposed method approximates the expected objective function with a sample average function using Inexact Restoration-based adapted sample sizes. The sample size is chosen in an adaptive manner based on Inexact Restoration. The algorithm uses line search and assu…
▽ More
We study unconstrained optimization problems with nonsmooth and convex objective function in the form of a mathematical expectation. The proposed method approximates the expected objective function with a sample average function using Inexact Restoration-based adapted sample sizes. The sample size is chosen in an adaptive manner based on Inexact Restoration. The algorithm uses line search and assumes descent directions with respect to the current approximate function. We prove the a.s. convergence under standard assumptions. Numerical results for two types of problems, machine learning loss function for training classifiers and stochastic linear complementarity problems, prove the efficiency of the proposed scheme.
△ Less
Submitted 2 November, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
EFIX: Exact Fixed Point Methods for Distributed Optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider strongly convex distributed consensus optimization over connected networks. EFIX, the proposed method, is derived using quadratic penalty approach. In more detail, we use the standard reformulation { transforming the original problem into a constrained problem in a higher dimensional space { to define a sequence of suitable quadratic penalty subproblems with increasing penalty paramete…
▽ More
We consider strongly convex distributed consensus optimization over connected networks. EFIX, the proposed method, is derived using quadratic penalty approach. In more detail, we use the standard reformulation { transforming the original problem into a constrained problem in a higher dimensional space { to define a sequence of suitable quadratic penalty subproblems with increasing penalty parameters. For quadratic objectives, the corresponding sequence consists of quadratic penalty subproblems. For the generic strongly convex case, the objective function is approximated with a quadratic model and hence the sequence of the resulting penalty subproblems is again quadratic. EFIX is then derived by solving each of the quadratic penalty subproblems via a fixed point (R)-linear solver, e.g., Jacobi Over-Relaxation method. The exact convergence is proved as well as the worst case complexity of order O(epsilon^-1) for the quadratic case. In the case of strongly convex generic functions, a standard result for penalty methods is obtained. Numerical results indicate that the method is highly competitive with state-of-the-art exact first order methods, requires smaller computational and communication effort, and is robust to the choice of algorithm parameters.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Economic inexact restoration for derivative-free expensive function minimization and applications
Authors:
Ernesto G. Birgin,
Natasa Krejić,
José Mario Martínez
Abstract:
The Inexact Restoration approach has proved to be an adequate tool for handling the problem of minimizing an expensive function within an arbitrary feasible set by using different degrees of precision in the objective function. The Inexact Restoration framework allows one to obtain suitable convergence and complexity results for an approach that rationally combines low- and high-precision evaluati…
▽ More
The Inexact Restoration approach has proved to be an adequate tool for handling the problem of minimizing an expensive function within an arbitrary feasible set by using different degrees of precision in the objective function. The Inexact Restoration framework allows one to obtain suitable convergence and complexity results for an approach that rationally combines low- and high-precision evaluations. In the present research, it is recognized that many problems with expensive objective functions are nonsmooth and, sometimes, even discontinuous. Having this in mind, the Inexact Restoration approach is extended to the nonsmooth or discontinuous case. Although optimization phases that rely on smoothness cannot be used in this case, basic convergence and complexity results are recovered. A derivative-free optimization phase is defined and the subproblems that arise at this phase are solved using a regularization approach that take advantage of different notions of stationarity. The new methodology is applied to the problem of reproducing a controlled experiment that mimics the failure of a dam.
△ Less
Submitted 3 June, 2021; v1 submitted 18 September, 2020;
originally announced September 2020.
-
LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums
Authors:
Daniela di Serafino,
Nataša Krejić,
Nataša Krklec Jerinkić,
Marco Viola
Abstract:
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allo…
▽ More
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allows us to control the error present in the line-search procedure. Stationarity of limit points is proved in the almost-sure sense, while almost-sure convergence of the sequence of approximations to the solution holds with the additional hypothesis that the functions are strongly convex. Numerical experiments, including comparisons with state-of-the art stochastic optimization methods, show the efficiency of our approach.
△ Less
Submitted 27 June, 2022; v1 submitted 31 July, 2020;
originally announced July 2020.
-
Linear Convergence Rate Analysis of a Class of Exact First-Order Distributed Methods for Weight-Balanced Time-Varying Networks and Uncoordinated Step Sizes
Authors:
Greta Malaspina,
Dusan Jakovetic,
Natasa Krejic
Abstract:
We analyze a class of exact distributed first order methods under a general setting on the underlying network and step-sizes. In more detail, we allow simultaneously for time-varying uncoordinated stepsizes and time-varying directed weight-balanced networks, jointly connected over bounded intervals. The analyzed class of methods subsumes several existing algorithms like the unified Extra and unifi…
▽ More
We analyze a class of exact distributed first order methods under a general setting on the underlying network and step-sizes. In more detail, we allow simultaneously for time-varying uncoordinated stepsizes and time-varying directed weight-balanced networks, jointly connected over bounded intervals. The analyzed class of methods subsumes several existing algorithms like the unified Extra and unified DIGing (Jakovetic, 2019), or the exact spectral gradient method (Jakovetic, Krejic, Krklec Jerinkic, 2019) that have been analyzed before under more restrictive assumptions. Under the assumed setting, we establish R-linear convergence of the methods and present several implications that our results have on the literature. Most notably, we show that the unification strategy in (Jakovetic, 2019) and the spectral step-size selection strategy in (Jakovetic, Krejic, Krklec Jerinkic, 2019) exhibit a high degree of robustness to uncoordinated time-varying step sizes and to time-varying networks.
△ Less
Submitted 12 May, 2023; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Distributed Fixed Point Method for Solving Systems of Linear Algebraic Equations
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic,
Greta Malaspina,
Alessandra Micheletti
Abstract:
We present a class of iterative fully distributed fixed point methods to solve a system of linear equations, such that each agent in the network holds one of the equations of the system. Under a generic directed, strongly connected network, we prove a convergence result analogous to the one for fixed point methods in the classical, centralized, framework: the proposed method converges to the solut…
▽ More
We present a class of iterative fully distributed fixed point methods to solve a system of linear equations, such that each agent in the network holds one of the equations of the system. Under a generic directed, strongly connected network, we prove a convergence result analogous to the one for fixed point methods in the classical, centralized, framework: the proposed method converges to the solution of the system of linear equations at a linear rate. We further explicitly quantify the rate in terms of the linear system and the network parameters. Next, we show that the algorithm provably works under time-varying directed networks provided that the underlying graph is connected over bounded iteration intervals, and we establish a linear convergence rate for this setting as well. A set of numerical results is presented, demonstrating practical benefits of the method over existing alternatives.
△ Less
Submitted 12 January, 2020;
originally announced January 2020.
-
Inexact restoration with subsampled trust-region methods for finite-sum minimization
Authors:
Stefania Bellavia,
Natasa Krejic,
Benedetta Morini
Abstract:
Convex and nonconvex finite-sum minimization arises in many scientific computing and machine learning applications. Recently, first-order and second-order methods where objective functions, gradients and Hessians are approximated by randomly sampling components of the sum have received great attention. We propose a new trust-region method which employs suitable approximations of the objective func…
▽ More
Convex and nonconvex finite-sum minimization arises in many scientific computing and machine learning applications. Recently, first-order and second-order methods where objective functions, gradients and Hessians are approximated by randomly sampling components of the sum have received great attention. We propose a new trust-region method which employs suitable approximations of the objective function, gradient and Hessian built via random subsampling techniques. The choice of the sample size is deterministic and ruled by the inexact restoration approach. We discuss local and global properties for finding approximate first- and second-order optimal points and function evaluation complexity results. Numerical experience shows that the new procedure is more efficient, in terms of overall computational cost, than the standard trust-region scheme with subsampled Hessians.
△ Less
Submitted 10 May, 2020; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Exact Spectral-Like Gradient Method for Distributed Optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where $n$ nodes…
▽ More
Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where $n$ nodes constitute an arbitrary connected network and collaboratively minimize the sum of their local convex cost functions. In this setting, building from existing exact distributed gradient methods, we propose a novel exact distributed gradient method wherein nodes' step-sizes are designed according to the novel rules akin to those in spectral gradient methods. We refer to the proposed method as Distributed Spectral Gradient method (DSG).
The method exhibits R-linear convergence under standard assumptions for the nodes' local costs and safeguarding on the algorithm step-sizes. We illustrate the method's performance through simulation examples.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Subsampled Inexact Newton methods for minimizing large sums of convex functions
Authors:
Stefania Bellavia,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
This paper deals with the minimization of large sum of convex functions by Inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The Conjugate Gradient method is used to compute the inexact Newton step and global convergence is enforced by a nonmonotone line search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assumi…
▽ More
This paper deals with the minimization of large sum of convex functions by Inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The Conjugate Gradient method is used to compute the inexact Newton step and global convergence is enforced by a nonmonotone line search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assuming strongly convex functions, R-linear convergence and worst-case iteration complexity of the procedure are investigated when functions and gradients are approximated with increasing accuracy. A set of rules for the forcing parameters and subsample Hessian sizes are derived that ensure local q-linear/superlinear convergence of the proposed method.
The random choice of the Hessian subsample is also considered and convergence in the mean square, both for finite and infinite sums of functions, is proved. Finally, global convergence with asymptotic R-linear rate of IN methods is extended to the case of sum of convex function and strongly convex objective function. Numerical results on well known binary classification problems are also given. Adaptive strategies for selecting forcing terms and Hessian subsample size, streaming out of the theoretical analysis, are employed and the numerical results showed that they yield effective IN methods.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Distributed second order methods with increasing number of working nodes
Authors:
Natasa Krklec Jerinkic,
Dusan Jakovetic,
Natasa Krejic,
Dragana Bajovic
Abstract:
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id…
▽ More
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays idle with probability $1-p_k$, while the activations are independent both across nodes and across iterations. In this paper, we demonstrate that the idling mechanism can be successfully incorporated in \emph{distributed second order methods} also. Specifically, we apply the idling mechanism to the recently proposed Distributed Quasi Newton method (DQN). We first show theoretically that, when $p_k$ grows to one across iterations in a controlled manner, DQN with idling exhibits very similar theoretical convergence and convergence rates properties as the standard DQN method, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Simulation examples confirm the benefits of incorporating the idling mechanism, demonstrate the method's flexibility with respect to the choice of the $p_k$'s, and compare the proposed idling method with related algorithms from the literature.
△ Less
Submitted 20 September, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Newton-like method with diagonal correction for distributed optimization
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di…
▽ More
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -- dubbed DQN-0, DQN-1 and DQN-2 -- which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods.
△ Less
Submitted 20 February, 2017; v1 submitted 5 September, 2015;
originally announced September 2015.
-
Distributed Gradient Methods with Variable Number of Working Nodes
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Natasa Krejic,
Natasa Krklec-Jerinkic
Abstract:
We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by…
▽ More
We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient step with respect to its local cost, and performing a projection onto the constraint set; inactive nodes perform no updates. Assuming that nodes' local costs are strongly convex, with Lipschitz continuous gradients, we show that, as long as activation probability $p_k$ grows to one asymptotically, our algorithm converges in the mean square sense (MSS) to the same solution as the standard distributed gradient method, i.e., as if all the nodes were active at all iterations. Moreover, when $p_k$ grows to one linearly, with an appropriately set convergence factor, the algorithm has a linear MSS convergence, with practically the same factor as the standard distributed gradient method. Simulations on both synthetic and real world data sets demonstrate that, when compared with the standard distributed gradient method, the proposed algorithm significantly reduces the overall number of per-node communications and per-node gradient evaluations (computational cost) for the same required accuracy.
△ Less
Submitted 10 March, 2016; v1 submitted 15 April, 2015;
originally announced April 2015.