Search | arXiv e-print repository

Fast Stochastic Second-Order Adagrad for Nonconvex Bound-Constrained Optimization

Authors: S. Bellavia, S. Gratton, B. Morini, Ph. L. Toint

Abstract: ADAGB2, a generalization of the Adagrad algorithm for stochastic optimization is introduced, which is also applicable to bound-constrained problems and capable of using second-order information when available. It is shown that, given $δ\in(0,1)$ and $ε\in(0,1]$, the ADAGB2 algorithm needs at most $\calO(ε^{-2})$ iterations to ensure an $ε$-approximate first-order critical point of the bound-constr… ▽ More ADAGB2, a generalization of the Adagrad algorithm for stochastic optimization is introduced, which is also applicable to bound-constrained problems and capable of using second-order information when available. It is shown that, given $δ\in(0,1)$ and $ε\in(0,1]$, the ADAGB2 algorithm needs at most $\calO(ε^{-2})$ iterations to ensure an $ε$-approximate first-order critical point of the bound-constrained problem with probability at least $1-δ$, provided the average root mean square error of the gradient oracle is sufficiently small. Should this condition fail, it is also shown that the optimality level of iterates is bounded above by this average. The relation between the approximate and true classical projected-gradient-based optimality measures for bound constrained problems is also investigated, and it is shown that merely assuming unbiased gradient oracles may be insufficient to ensure convergence in $\calO(ε^{-2})$ iterations. △ Less

Submitted 9 May, 2025; originally announced May 2025.

MSC Class: 49M37; 65K05; 68Q17; 68W40; 90C30 ACM Class: F.2.1; G.1.6; I.1.2

arXiv:2505.04807 [pdf, other]

A Fast Newton Method Under Local Lipschitz Smoothness

Authors: Serge Gratton, Sadok Jerad, Philippe L. Toint

Abstract: A new, fast second-order method is proposed that achieves the optimal $\mathcal{O}\left(|\log(ε)|ε^{-3/2}\right)$ complexity to obtain first-order $ε$-stationary points. Crucially, this is deduced without assuming the standard global Lipschitz Hessian continuity condition, but only using an appropriate local smoothness requirement. The algorithm exploits Hessian information to compute… ▽ More A new, fast second-order method is proposed that achieves the optimal $\mathcal{O}\left(|\log(ε)|ε^{-3/2}\right)$ complexity to obtain first-order $ε$-stationary points. Crucially, this is deduced without assuming the standard global Lipschitz Hessian continuity condition, but only using an appropriate local smoothness requirement. The algorithm exploits Hessian information to compute a Newton step and a negative curvature step when needed, in an approach similar to that of the AN2C method. Inexact versions of the Newton step and negative curvature are proposed in order to reduce the cost of evaluating second-order information. Details are given of such an iterative implementation using Krylov subspaces. An extended algorithm for finding second-order critical points is also developed and its complexity is again shown to be within a log factor of the optimal one. Initial numerical experiments are discussed for both factorised and Krylov variants, which demonstrate the competitiveness of the proposed algorithm. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2502.08308 [pdf, ps, other]

prunAdag: an adaptive pruning-aware gradient method

Authors: Margherita Porcelli, Giovanni Seraghiti, Philippe L. Toint

Abstract: A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the ``relevant/irrelevant" approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of… ▽ More A pruning-aware adaptive gradient method is proposed which classifies the variables in two sets before updating them using different strategies. This technique extends the ``relevant/irrelevant" approach of Ding (2019) and Zimmer et al. (2022) and allows a posteriori sparsification of the solution of model parameter fitting problems. The new method is proved to be convergent with a global rate of decrease of the averaged gradient's norm of the form $\calO(\log(k)/\sqrt{k+1})$. Numerical experiments on several applications show that it is competitive. △ Less

Submitted 12 February, 2025; originally announced February 2025.

MSC Class: 49M05; 49M15; 65K10; 68Q25; 90C26 ACM Class: F.2.1; I.2.6; G.1.6

arXiv:2409.16047 [pdf, other]

Examples of slow convergence for adaptive regularization optimization methods are not isolated

Authors: Philippe L. Toint

Abstract: The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to require, under standard assumptions, at most $\mathcal{O}(ε^{3/(3-q)})$ evaluations of the objective function and its derivatives of degrees one and two to produce an $ε$-approximate critical point of order $q\in\{1,2\}$. This bound was shown… ▽ More The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to require, under standard assumptions, at most $\mathcal{O}(ε^{3/(3-q)})$ evaluations of the objective function and its derivatives of degrees one and two to produce an $ε$-approximate critical point of order $q\in\{1,2\}$. This bound was shown to be sharp for $q \in\{1,2\}$. This note revisits these results and shows that the example for which slow convergence is exhibited is not isolated, but that this behaviour occurs for a subset of univariate functions of nonzero measure. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 11 pages, 1 figure

MSC Class: 49M37; 65K05; 68Q17; 68W40; 90C30 ACM Class: F.2.1; G.1.6; I.1.2

arXiv:2408.09124 [pdf, other]

Refining asymptotic complexity bounds for nonconvex optimization methods, including why steepest descent is $o(ε^{-2})$ rather than $\mathcal{O}(ε^{-2})$

Authors: Serge Gratton, Chee-Khian Sim, Philippe L. Toint

Abstract: We revisit the standard ``telescoping sum'' argument ubiquitous in the final steps of analyzing evaluation complexity of algorithms for smooth nonconvex optimization, and obtain a refined formulation of the resulting bound as a function of the requested accuracy $ε$. While bounds obtained using the standard argument typically are of the form $\mathcal{O}(ε^{-α})$ for some positive $α$, the refined… ▽ More We revisit the standard ``telescoping sum'' argument ubiquitous in the final steps of analyzing evaluation complexity of algorithms for smooth nonconvex optimization, and obtain a refined formulation of the resulting bound as a function of the requested accuracy $ε$. While bounds obtained using the standard argument typically are of the form $\mathcal{O}(ε^{-α})$ for some positive $α$, the refined results are of the form $o(ε^{-α})$. We then explore to which known algorithms our refined bounds are applicable and finally describe an example showing how close the standard and refined bounds can be. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 10 ages, 1 figure

MSC Class: 49M37; 65K05; 68Q17; 68W40; 90C30 ACM Class: F.2.1; G.1.6; I.1.2

arXiv:2407.08018 [pdf, other]

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Authors: Serge Gratton, Sadok Jerad, Philippe L. Toint

Abstract: A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(ε^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps.… ▽ More A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(ε^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative evaluation is inexact and to minimization of finite sums by sampling are discussed. Numerical experiments on large binary classification problems illustrate the potential of the new method. △ Less

Submitted 21 January, 2025; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 32 pages, 9 figures

MSC Class: 49M37; 65K05; 68Q17; 68W40; 90C30 ACM Class: F.2.1; G.1.6; I.1.2

arXiv:2407.07812 [pdf, ps, other]

S2MPJ and CUTEst optimization problems for Matlab, Python and Julia

Authors: Serge Gratton, Philippe L. Toint

Abstract: A new decoder for the SIF test problems of the CUTEst collection is described, which produces problem files allowing the computation of values and derivatives of the objective function and constraints of most \cutest\ problems directly within ``native'' Matlab, Python or Julia, without any additional installation or interfacing with MEX files or Fortran programs. When used with Matlab, the new pro… ▽ More A new decoder for the SIF test problems of the CUTEst collection is described, which produces problem files allowing the computation of values and derivatives of the objective function and constraints of most \cutest\ problems directly within ``native'' Matlab, Python or Julia, without any additional installation or interfacing with MEX files or Fortran programs. When used with Matlab, the new problem files optionally support reduced-precision computations. △ Less

Submitted 10 July, 2024; originally announced July 2024.

MSC Class: 49N99; 65K05; 65Y20; 68N99; 90C30 ACM Class: G.1.6

arXiv:2406.15793 [pdf, ps, other]

Complexity of Adagrad and other first-order methods for nonconvex optimization problems with bounds constraints

Authors: Serge Gratton, Sadok Jerad, Philippe L. Toint

Abstract: A parametric class of trust-region algorithms for constrained nonconvex optimization is analyzed, where the objective function is never computed. By defining appropriate first-order stationarity criteria, we are able to extend the Adagrad method to the newly considered problem and retrieve the standard complexity rate of the projected gradient method that uses both the gradient and objective funct… ▽ More A parametric class of trust-region algorithms for constrained nonconvex optimization is analyzed, where the objective function is never computed. By defining appropriate first-order stationarity criteria, we are able to extend the Adagrad method to the newly considered problem and retrieve the standard complexity rate of the projected gradient method that uses both the gradient and objective function values. Furthermore, we propose an additional iteration-dependent scaling with slightly inferior theoretical guarantees. In both cases, the bounds are essentially sharp, and curvature information can be used to compute the stepsize. Initial experimental results for noisy bound-constrained instances illustrate the benefits of the objective-free approach. △ Less

Submitted 1 November, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

MSC Class: 90C60; 90C30; 90C15; 90C26; 49N30 ACM Class: F.2.1; G.1.6

arXiv:2310.16580 [pdf, ps, other]

An optimally fast objective-function-free minimization algorithm using random subspaces

Authors: S. Bellavia, S. Gratton, B. Morini, Ph. L. Toint

Abstract: An algorithm for unconstrained non-convex optimization is described, which does not evaluate the objective function and in which minimization is carried out, at each iteration, within a randomly selected subspace. It is shown that this random approximation technique does not affect the method's convergence nor its evaluation complexity for the search of an $ε$-approximate first-order critical poin… ▽ More An algorithm for unconstrained non-convex optimization is described, which does not evaluate the objective function and in which minimization is carried out, at each iteration, within a randomly selected subspace. It is shown that this random approximation technique does not affect the method's convergence nor its evaluation complexity for the search of an $ε$-approximate first-order critical point, which is $\mathcal{O}(ε^{-(p+1)/p})$, where $p$ is the order of derivatives used. A variant of the algorithm using approximate Hessian matrices is also analysed and shown to require at most $\mathcal{O}(ε^{-2})$ evaluations. Preliminary numerical tests show that the random-subspace technique can significantly improve performance when used with $p=2$ in the correct context, making it very competitive when compared to standard first-order algorithms. △ Less

Submitted 30 January, 2025; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 23 pages

MSC Class: 60G99; 65K05; 68M20; 68Q17; 90C26 ACM Class: G.6.1; F.2.1

arXiv:2308.00720 [pdf, ps, other]

Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

Authors: Ph. L. Toint

Abstract: A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters. A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters. △ Less

Submitted 1 August, 2023; originally announced August 2023.

MSC Class: 65K10; 90C26; 90C30 ACM Class: G.6.1; I.2.6

arXiv:2305.14477 [pdf, other]

A Block-Coordinate Approach of Multi-level Optimization with an Application to Physics-Informed Neural Networks

Authors: Serge Gratton, Valentin Mercier, Elisa Riccietti, Philippe L. Toint

Abstract: Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation com… ▽ More Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physics-informed neural networks (PINNs) and show on a few test problems that the approach results in better solutions and significant computational savings △ Less

Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2302.10065 [pdf, ps, other]

Yet another fast variant of Newton's method for nonconvex optimization

Authors: Serge Gratton, Sadok Jerad, Philippe L. Toint

Abstract: A class of second-order algorithms is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps in an iteration-dependent subspace. In most cases, the Hessian matrix is regularized with the square root of the current gradient and an additional term taking moderate negative curvature into account, a negative curvature step being taken… ▽ More A class of second-order algorithms is proposed for minimizing smooth nonconvex functions that alternates between regularized Newton and negative curvature steps in an iteration-dependent subspace. In most cases, the Hessian matrix is regularized with the square root of the current gradient and an additional term taking moderate negative curvature into account, a negative curvature step being taken only exceptionally. Practical variants have been detailed where the subspaces are chosen to be the full space, or Krylov subspaces. In the first case, the proposed method only requires the solution of a single linear system at nearly all iterations. We establish that at most $\mathcal{O}\left(|\logε|\,ε^{-3/2}\right)$ evaluations of the problem's objective function and derivatives are needed for algorithms in the new class to obtain an $ε$-approximate first-order minimizer, and at most $\mathcal{O}\left(|\logε|\,ε^{-3}\right)$ to obtain a second-order one. Encouraging initial numerical experiments with two full-space and two Krylov-subspaces variants are finally presented. △ Less

Submitted 20 August, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: 32 pages, 2 Figure, 4 Tables

MSC Class: 49M37; 65K05; 90C26; 90C30 ACM Class: G.1.6; F.2.1

arXiv:2302.07049 [pdf, other]

Multilevel Objective-Function-Free Optimization with an Application to Neural Networks Training

Authors: S. Gratton, A. Kopanicakova, Ph. L. Toint

Abstract: A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to noise, while the multi-l… ▽ More A class of multi-level algorithms for unconstrained nonlinear optimization is presented which does not require the evaluation of the objective function. The class contains the momentum-less AdaGrad method as a particular (single-level) instance. The choice of avoiding the evaluation of the objective function is intended to make the algorithms of the class less sensitive to noise, while the multi-level feature aims at reducing their computational cost. The evaluation complexity of these algorithms is analyzed and their behaviour in the presence of noise is then illustrated in the context of training deep neural networks for supervised learning applications. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 29 pages, 4 figures

MSC Class: 49K20; 65M55; 65Y20; 68Q25; 68T05; 90C26; 90C30 ACM Class: F.2.1; G.1.8; I.2.5

arXiv:2203.09947 [pdf, ps, other]

Convergence properties of an Objective-Function-Free Optimization regularization algorithm, including an $\mathcal{O}(ε^{-3/2})$ complexity bound

Authors: S. Gratton, S. Jerad, Ph. L. Toint

Abstract: An adaptive regularization algorithm for unconstrained nonconvex optimization is presented in which the objective function is never evaluated, but only derivatives are used. This algorithm belongs to the class of adaptive regularization methods, for which optimal worst-case complexity results are known for the standard framework where the objective function is evaluated. It is shown in this paper… ▽ More An adaptive regularization algorithm for unconstrained nonconvex optimization is presented in which the objective function is never evaluated, but only derivatives are used. This algorithm belongs to the class of adaptive regularization methods, for which optimal worst-case complexity results are known for the standard framework where the objective function is evaluated. It is shown in this paper that these excellent complexity bounds are also valid for the new algorithm, despite the fact that significantly less information is used. In particular, it is shown that, if derivatives of degree one to $p$ are used, the algorithm will find a $ε_1$-approximate first-order minimizer in at most $O(ε_1^{-(p+1)/p})$ iterations, and an $(ε_1,ε_2)$-approximate second-order minimizer in at most $O(\max[ε^{-(p+1)/p},ε_2^{-(p+1)/(p-1)}])$ iterations. As a special case, the new algorithm using first and second derivatives, when applied to functions with Lipschitz continuous Hessian, will find an iterate $x_k$ at which the gradient's norm is less than $ε_1$ in at most $O(ε_1^{-3/2})$ iterations. △ Less

Submitted 4 May, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

MSC Class: 90C60; 90C30; 90C15; 90C26; 49N30 ACM Class: F.2.1; G.1.6

arXiv:2203.03351 [pdf, ps, other]

OFFO minimization algorithms for second-order optimality and their complexity

Authors: S. Gratton, Ph. L. Toint

Abstract: An Adagrad-inspired class of algorithms for smooth unconstrained optimization is presented in which the objective function is never evaluated and yet the gradient norms decrease at least as fast as $\calO(1/\sqrt{k+1})$ while second-order optimality measures converge to zero at least as fast as $\calO(1/(k+1)^{1/3})$. This latter rate of convergence is shown to be essentially sharp and is identica… ▽ More An Adagrad-inspired class of algorithms for smooth unconstrained optimization is presented in which the objective function is never evaluated and yet the gradient norms decrease at least as fast as $\calO(1/\sqrt{k+1})$ while second-order optimality measures converge to zero at least as fast as $\calO(1/(k+1)^{1/3})$. This latter rate of convergence is shown to be essentially sharp and is identical to that known for more standard algorithms (like trust-region or adaptive-regularization methods) using both function and derivatives' evaluations. A related "divergent stepsize" method is also described, whose essentially sharp rate of convergence is slighly inferior. It is finally discussed how to obtain weaker second-order optimality guarantees at a (much) reduced computional cost. △ Less

Submitted 7 March, 2022; originally announced March 2022.

MSC Class: 90C60; 90C30; 90C26; 90C15; 49N30 ACM Class: F.2.1; G.1.6

Journal ref: Computational Optimization and Applications, 84, pages 573 - 607, 2023

arXiv:2203.01757 [pdf, other]

Complexity and performance for two classes of noise-tolerant first-order algorithms

Authors: S. Gratton, S. Jerad, Ph. L. Toint

Abstract: Two classes of algorithms for optimization in the presence of noise are presented, that do not require the evaluation of the objective function. The first generalizes the well-known Adagrad method. Its complexity is then analyzed as a function of its parameters. A second class of algorithms is then derived whose complexity is at least as good as that of the first class. Initial numerical experimen… ▽ More Two classes of algorithms for optimization in the presence of noise are presented, that do not require the evaluation of the objective function. The first generalizes the well-known Adagrad method. Its complexity is then analyzed as a function of its parameters. A second class of algorithms is then derived whose complexity is at least as good as that of the first class. Initial numerical experiments on finite-sum problems arising from deep-learning applications suggest that methods of the second class may outperform those of the first. △ Less

Submitted 29 January, 2025; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 3 figures. arXiv admin note: substantial text overlap with arXiv:2203.01647

MSC Class: 90C60; 90C30; 90C15; 90C26; 49N30 ACM Class: F.2.1; G.1.6

arXiv:2203.01647 [pdf, ps, other]

Complexity of a Class of First-Order Objective-Function-Free Optimization Algorithms

Authors: S. Gratton, S. Jerad, Ph. L. Toint

Abstract: A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of converg… ▽ More A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values, recovering existing results for purely-first order variants and improving the explicit dependence on problem dimension. This rate is shown to be essentially sharp. A new class of methods is also presented, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods' performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise. △ Less

Submitted 6 June, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

MSC Class: 90C60; 90C30; 90C15; 90C26; 49N30 ACM Class: F.2.1; G.1.6

arXiv:2112.06176 [pdf, ps, other]

Trust-region algorithms: probabilistic complexity and intrinsic noise with applications to subsampling techniques

Authors: S. Bellavia, G. Gurioli, B. Morini, Ph. L. Toint

Abstract: A trust-region algorithm is presented for finding approximate minimizers of smooth unconstrained functions whose values and derivatives are subject to random noise. It is shown that, under suitable probabilistic assumptions, the new method finds (in expectation) an $ε$-approximate minimizer of arbitrary order $ q \geq 1$ in at most $\mathcal{O}(ε^{-(q+1)})$ inexact evaluations of the function and… ▽ More A trust-region algorithm is presented for finding approximate minimizers of smooth unconstrained functions whose values and derivatives are subject to random noise. It is shown that, under suitable probabilistic assumptions, the new method finds (in expectation) an $ε$-approximate minimizer of arbitrary order $ q \geq 1$ in at most $\mathcal{O}(ε^{-(q+1)})$ inexact evaluations of the function and its derivatives, providing the first such result for general optimality orders. The impact of intrinsic noise limiting the validity of the assumptions is also discussed and it is shown that difficulties are unlikely to occur in the first-order version of the algorithm for sufficiently large gradients. Conversely, should these assumptions fail for specific realizations, then "degraded" optimality guarantees are shown to hold when failure occurs. These conclusions are then discussed and illustrated in the context of subsampling methods for finite-sum optimization. △ Less

Submitted 30 December, 2021; v1 submitted 12 December, 2021; originally announced December 2021.

MSC Class: 65K05; 65C50; 90C26 ACM Class: F.2.1; G.1.6

arXiv:2112.05636 [pdf, ps, other]

OPM, a collection of Optimization Problems in Matlab

Authors: Serge Gratton, Philippe L. Toint

Abstract: OPM is a small collection of CUTEst unconstrained and bound-constrained nonlinear optimization problems, which can be used in Matlab for testing optimization algorithms directly (i.e. without installing additional software). OPM is a small collection of CUTEst unconstrained and bound-constrained nonlinear optimization problems, which can be used in Matlab for testing optimization algorithms directly (i.e. without installing additional software). △ Less

Submitted 16 January, 2025; v1 submitted 10 December, 2021; originally announced December 2021.

MSC Class: 90C25; 90C26; 90C30 ACM Class: G.1.6; G.4

arXiv:2111.14098 [pdf, ps, other]

An adaptive regularization algorithm for unconstrained optimization with inexact function and derivatives values

Authors: N. I. M. Gould, Ph. L. Toint

Abstract: An adaptive regularization algorithm for unconstrained nonconvex optimization is proposed that is capable of handling inexact objective-function and derivative values, and also of providing approximate minimizer of arbitrary order. In comparison with a similar algorithm proposed in Cartis, Gould, Toint (2021), its distinguishing feature is that it is based on controlling the relative error between… ▽ More An adaptive regularization algorithm for unconstrained nonconvex optimization is proposed that is capable of handling inexact objective-function and derivative values, and also of providing approximate minimizer of arbitrary order. In comparison with a similar algorithm proposed in Cartis, Gould, Toint (2021), its distinguishing feature is that it is based on controlling the relative error between the model and objective values. A sharp evaluation complexity complexity bound is derived for the new algorithm. △ Less

Submitted 28 November, 2021; originally announced November 2021.

MSC Class: 49M37; 90C26; 90C30; 90C56 ACM Class: F.2.2; G.1.6

arXiv:2105.07765 [pdf, other]

Adaptive Regularization Minimization Algorithms with Non-Smooth Norms and Euclidean Curvature

Authors: Serge Gratton, Philippe L. Toint

Abstract: A regularization algorithm (AR1pGN) for unconstrained nonlinear minimization is considered, which uses a model consisting of a Taylor expansion of arbitrary degree and regularization term involving a possibly non-smooth norm. It is shown that the non-smoothness of the norm does not affect the $O(ε_1^{-(p+1)/p})$ upper bound on evaluation complexity for finding first-order $ε_1$-approximate minimiz… ▽ More A regularization algorithm (AR1pGN) for unconstrained nonlinear minimization is considered, which uses a model consisting of a Taylor expansion of arbitrary degree and regularization term involving a possibly non-smooth norm. It is shown that the non-smoothness of the norm does not affect the $O(ε_1^{-(p+1)/p})$ upper bound on evaluation complexity for finding first-order $ε_1$-approximate minimizers using $p$ derivatives, and that this result does not hinge on the equivalence of norms in $\Re^n$. It is also shown that, if $p=2$, the bound of $O(ε_2^{-3})$ evaluations for finding second-order $ε_2$-approximate minimizers still holds for a variant of AR1pGN named AR2GN, despite the possibly non-smooth nature of the regularization term. Moreover, the adaptation of the existing theory for handling the non-smoothness results in an interesting modification of the subproblem termination rules, leading to an even more compact complexity analysis. In particular, it is shown when the Newton's step is acceptable for an adaptive regularization method. The approximate minimization of quadratic polynomials regularized with non-smooth norms is then discussed, and a new approximate second-order necessary optimality condition is derived for this case. An specialized algorithm is then proposed to enforce the first- and second-order conditions that are strong enough to ensure the existence of a suitable step in AR1pGN (when $p=2$) and in AR2GN, and its iteration complexity is analyzed. △ Less

Submitted 27 May, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: A correction will be available soon

MSC Class: 90C60; 90C26; 49J52; 49M37 ACM Class: G.1.6; F.2.1

arXiv:2104.02564 [pdf, ps, other]

Hölder Gradient Descent and Adaptive Regularization Methods in Banach Spaces for First-Order Points

Authors: Serge Gratton, Sadok Jerad, Philippe L. Toint

Abstract: This paper considers optimization of smooth nonconvex functionals in smooth infinite dimensional spaces. A Hölder gradient descent algorithm is first proposed for finding approximate first-order points of regularized polynomial functionals. This method is then applied to analyze the evaluation complexity of an adaptive regularization method which searches for approximate first-order points of func… ▽ More This paper considers optimization of smooth nonconvex functionals in smooth infinite dimensional spaces. A Hölder gradient descent algorithm is first proposed for finding approximate first-order points of regularized polynomial functionals. This method is then applied to analyze the evaluation complexity of an adaptive regularization method which searches for approximate first-order points of functionals with $β$-Hölder continuous derivatives. It is shown that finding an $ε$-approximate first-order point requires at most $O(ε^{-\frac{p+β}{p+β-1}})$ evaluations of the functional and its first $p$ derivatives. △ Less

Submitted 6 April, 2021; originally announced April 2021.

MSC Class: 49K27; 49M37; 49M05; 49M20; 90C48; 90C26; 90C30 ACM Class: F.2.1; G.1.6

arXiv:2104.02519 [pdf, ps, other]

The Impact of Noise on Evaluation Complexity: The Deterministic Trust-Region Case

Authors: Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, Philippe L. Toint

Abstract: Intrinsic noise in objective function and derivatives evaluations may cause premature termination of optimization algorithms. Evaluation complexity bounds taking this situation into account are presented in the framework of a deterministic trust-region method. The results show that the presence of intrinsic noise may dominate these bounds, in contrast with what is known for methods in which the in… ▽ More Intrinsic noise in objective function and derivatives evaluations may cause premature termination of optimization algorithms. Evaluation complexity bounds taking this situation into account are presented in the framework of a deterministic trust-region method. The results show that the presence of intrinsic noise may dominate these bounds, in contrast with what is known for methods in which the inexactness in function and derivatives' evaluations is fully controllable. Moreover, the new analysis provides estimates of the optimality level achievable, should noise cause early termination. It finally sheds some light on the impact of inexact computer arithmetic on evaluation complexity. △ Less

Submitted 6 April, 2021; originally announced April 2021.

MSC Class: 90C26; 90C30; 90C56; 90C59; 49M37; 49M05 ACM Class: F.2.1; G.1.6

arXiv:2104.00592 [pdf, ps, other]

Quadratic and Cubic Regularisation Methods with Inexact function and Random Derivatives for Finite-Sum Minimisation

Authors: Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, Philippe L. Toint

Abstract: This paper focuses on regularisation methods using models up to the third order to search for up to second-order critical points of a finite-sum minimisation problem. The variant presented belongs to the framework of [3]: it employs random models with accuracy guaranteed with a sufficiently large prefixed probability and deterministic inexact function evaluations within a prescribed level of accur… ▽ More This paper focuses on regularisation methods using models up to the third order to search for up to second-order critical points of a finite-sum minimisation problem. The variant presented belongs to the framework of [3]: it employs random models with accuracy guaranteed with a sufficiently large prefixed probability and deterministic inexact function evaluations within a prescribed level of accuracy. Without assuming unbiased estimators, the expected number of iterations is $\mathcal{O}\bigl(ε_1^{-2}\bigr)$ or $\mathcal{O}\bigl(ε_1^{-{3/2}}\bigr)$ when searching for a first-order critical point using a second or third order model, respectively, and of $\mathcal{O}\bigl(\max[ε_1^{-{3/2}},ε_2^{-3}]\bigr)$ when seeking for second-order critical points with a third order model, in which $ε_j$, $j\in\{1,2\}$, is the $j$th-order tolerance. These results match the worst-case optimal complexity for the deterministic counterpart of the method. Preliminary numerical tests for first-order optimality in the context of nonconvex binary classification in imaging, with and without Artifical Neural Networks (ANNs), are presented and discussed. △ Less

Submitted 2 April, 2021; v1 submitted 30 March, 2021; originally announced April 2021.

Comments: 9 pages

arXiv:2011.00854 [pdf, ps, other]

Strong Evaluation Complexity of An Inexact Trust-Region Algorithm for Arbitrary-Order Unconstrained Nonconvex Optimization

Authors: C. Cartis, N. I. M. Gould, Ph. L. Toint

Abstract: A trust-region algorithm using inexact function and derivatives values is introduced for solving unconstrained smooth optimization problems. This algorithm uses high-order Taylor models and allows the search of strong approximate minimizers of arbitrary order. The evaluation complexity of finding a $q$-th approximate minimizer using this algorithm is then shown, under standard conditions, to be… ▽ More A trust-region algorithm using inexact function and derivatives values is introduced for solving unconstrained smooth optimization problems. This algorithm uses high-order Taylor models and allows the search of strong approximate minimizers of arbitrary order. The evaluation complexity of finding a $q$-th approximate minimizer using this algorithm is then shown, under standard conditions, to be $\mathcal{O}\big(\min_{j\in\{1,\ldots,q\}}ε_j^{-(q+1)}\big)$ where the $ε_j$ are the order-dependent requested accuracy thresholds. Remarkably, this order is identical to that of classical trust-region methods using exact information. △ Less

Submitted 12 October, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

MSC Class: 65Y20; 90C30; 90C60 ACM Class: F.2.1; G.1.6

arXiv:2005.04639 [pdf, ps, other]

Adaptive Regularization for Nonconvex Optimization Using Inexact Function Values and Randomly Perturbed Derivatives

Authors: S. Bellavia, G. Gurioli, B. Morini, Ph. L. Toint

Abstract: A regularization algorithm allowing random noise in derivatives and inexact function values is proposed for computing approximate local critical points of any order for smooth unconstrained optimization problems. For an objective function with Lipschitz continuous $p$-th derivative and given an arbitrary optimality order $q \leq p$, it is shown that this algorithm will, in expectation, compute suc… ▽ More A regularization algorithm allowing random noise in derivatives and inexact function values is proposed for computing approximate local critical points of any order for smooth unconstrained optimization problems. For an objective function with Lipschitz continuous $p$-th derivative and given an arbitrary optimality order $q \leq p$, it is shown that this algorithm will, in expectation, compute such a point in at most $O\left(\left(\min_{j\in\{1,\ldots,q\}}ε_j\right)^{-\frac{p+1}{p-q+1}}\right)$ inexact evaluations of $f$ and its derivatives whenever $q\in\{1,2\}$, where $ε_j$ is the tolerance for $j$th order accuracy. This bound becomes at most $O\left(\left(\min_{j\in\{1,\ldots,q\}}ε_j\right)^{-\frac{q(p+1)}{p}}\right)$ inexact evaluations if $q>2$ and all derivatives are Lipschitz continuous. Moreover these bounds are sharp in the order of the accuracy tolerances. An extension to convexly constrained problems is also outlined. △ Less

Submitted 6 April, 2021; v1 submitted 10 May, 2020; originally announced May 2020.

Comments: 22 pages

MSC Class: 49K10; 49M37; 65K05; 68W40; 90C15 ACM Class: G.1.6; F.2.1

arXiv:2001.10802 [pdf, ps, other]

Strong Evaluation Complexity Bounds for Arbitrary-Order Optimization of Nonconvex Nonsmooth Composite Functions

Authors: Coralia Cartis, Nick Gould, Philippe L. Toint

Abstract: We introduce the concept of strong high-order approximate minimizers for nonconvex optimization problems. These apply in both standard smooth and composite non-smooth settings, and additionally allow convex or inexpensive constraints. An adaptive regularization algorithm is then proposed to find such approximate minimizers. Under suitable Lipschitz continuity assumptions, whenever the feasible set… ▽ More We introduce the concept of strong high-order approximate minimizers for nonconvex optimization problems. These apply in both standard smooth and composite non-smooth settings, and additionally allow convex or inexpensive constraints. An adaptive regularization algorithm is then proposed to find such approximate minimizers. Under suitable Lipschitz continuity assumptions, whenever the feasible set is convex, it is shown that using a model of degree $p$, this algorithm will find a strong approximate q-th-order minimizer in at most ${\cal O}\left(\max_{1\leq j\leq q}ε_j^{-(p+1)/(p-j+1)}\right)$ evaluations of the problem's functions and their derivatives, where $ε_j$ is the $j$-th order accuracy tolerance; this bound applies when either $q=1$ or the problem is not composite with $q \leq 2$. For general non-composite problems, even when the feasible set is nonconvex, the bound becomes ${\cal O}\left(\max_{1\leq j\leq q}ε_j^{-q(p+1)/p}\right)$ evaluations. If the problem is composite, and either $q > 1$ or the feasible set is not convex, the bound is then ${\cal O}\left(\max_{1\leq j\leq q}ε_j^{-(q+1)}\right)$ evaluations. These results not only provide, to our knowledge, the first known bound for (unconstrained or inexpensively-constrained) composite problems for optimality orders exceeding one, but also give the first sharp bounds for high-order strong approximate $q$-th order minimizers of standard (unconstrained and inexpensively constrained) smooth problems, thereby complementing known results for weak minimizers. △ Less

Submitted 29 January, 2020; originally announced January 2020.

Comments: 32 pages, 1 figure

MSC Class: 90C60; 90C46; 90C30; 90C26; 65K10; 49M37 ACM Class: F.2.1; G.1.6

arXiv:2001.04801 [pdf, ps, other]

Exploiting problem structure in derivative free optimization

Authors: Margherita Porcelli, Philippe L. Toint

Abstract: A structured version of derivative-free random pattern search optimization algorithms is introduced which is able to exploit coordinate partially separable structure (typically associated with sparsity) often present in unconstrained and bound-constrained optimization problems. This technique improves performance by orders of magnitude and makes it possible to solve large problems that otherwise a… ▽ More A structured version of derivative-free random pattern search optimization algorithms is introduced which is able to exploit coordinate partially separable structure (typically associated with sparsity) often present in unconstrained and bound-constrained optimization problems. This technique improves performance by orders of magnitude and makes it possible to solve large problems that otherwise are totally intractable by other derivative-free methods. A library of interpolation-based modelling tools is also described, which can be associated to the structured or unstructured versions of the initial pattern search algorithm. The use of the library further enhances performance, especially when associated with structure. The significant gains in performance associated with these two techniques are illustrated using a new freely-available release of the BFO (Brute Force Optimizer) package firstly introduced in [Porcelli,Toint, ACM TOMS, 2017], which incorporates them. An interesting conclusion of the numerical results presented is that providing global structural information on a problem can result in significantly less evaluations of the objective function than attempting to building local Taylor-like models. △ Less

Submitted 12 January, 2021; v1 submitted 14 January, 2020; originally announced January 2020.

MSC Class: 65K05; 90C56; 90C90

arXiv:1909.04991 [pdf, other]

An algorithm for optimization with disjoint linear constraints and its application for predicting rain

Authors: Tijana Janjic, Yvonne Ruckstuhl, Philippe L. Toint

Abstract: A specialized algorithm for quadratic optimization (QO, or, formerly, QP) with disjoint linear constraints is presented. In the considered class of problems, a subset of variables are subject to linear equality constraints, while variables in a different subset are constrained to remain in a convex set. The proposed algorithm exploits the structure by combining steps in the nullspace of the equali… ▽ More A specialized algorithm for quadratic optimization (QO, or, formerly, QP) with disjoint linear constraints is presented. In the considered class of problems, a subset of variables are subject to linear equality constraints, while variables in a different subset are constrained to remain in a convex set. The proposed algorithm exploits the structure by combining steps in the nullspace of the equality constraint's matrix with projections onto the convex set. The algorithm is motivated by application in weather forecasting. Numerical results on a simple model designed for predicting rain show that the algorithm is an improvement on current practice and that it reduces the computational burden compared to a more general interior point QO method. In particular, if constraints are disjoint and the rank of the set of linear equality constraints is small, further reduction in computational costs can be achieved, making it possible to apply this algorithm in high dimensional weather forecasting problems. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: 13 pages, 2 figures

MSC Class: 65K05; 90C20; 86A10 ACM Class: G.1.6; J.1

arXiv:1902.10767 [pdf, ps, other]

High-Order Evaluation Complexity for Convexly-Constrained Optimization with Non-Lipschitzian Group Sparsity Terms

Authors: X. Chen, Ph. L. Toint

Abstract: This paper studies high-order evaluation complexity for partially separable convexly-constrained optimization involving non-Lipschitzian group sparsity terms in a nonconvex objective function. We propose a partially separable adaptive regularization algorithm using a $p$-th order Taylor model and show that the algorithm can produce an (epsilon,delta)-approximate q-th-order stationary point in at m… ▽ More This paper studies high-order evaluation complexity for partially separable convexly-constrained optimization involving non-Lipschitzian group sparsity terms in a nonconvex objective function. We propose a partially separable adaptive regularization algorithm using a $p$-th order Taylor model and show that the algorithm can produce an (epsilon,delta)-approximate q-th-order stationary point in at most O(epsilon^{-(p+1)/(p-q+1)}) evaluations of the objective function and its first p derivatives (whenever they exist). Our model uses the underlying rotational symmetry of the Euclidean norm function to build a Lipschitzian approximation for the non-Lipschitzian group sparsity terms, which are defined by the group ell_2-ell_a norm with a in (0,1). The new result shows that the partially-separable structure and non-Lipschitzian group sparsity terms in the objective function may not affect the worst-case evaluation complexity order. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: 27 pages

MSC Class: 90C30; 90C46; 65K05

arXiv:1902.10406 [pdf, ps, other]

Minimization of nonsmooth nonconvex functions using inexact evaluations and its worst-case complexity

Authors: S. Gratton, E. Simon, Ph. L. Toint

Abstract: An adaptive regularization algorithm using inexact function and derivatives evaluations is proposed for the solution of composite nonsmooth nonconvex optimization. It is shown that this algorithm needs at most $O(|\log(ε)|\,ε^{-2})$ evaluations of the problem's functions and their derivatives for finding an $ε$-approximate first-order stationary point. This complexity bound therefore generalizes t… ▽ More An adaptive regularization algorithm using inexact function and derivatives evaluations is proposed for the solution of composite nonsmooth nonconvex optimization. It is shown that this algorithm needs at most $O(|\log(ε)|\,ε^{-2})$ evaluations of the problem's functions and their derivatives for finding an $ε$-approximate first-order stationary point. This complexity bound therefore generalizes that provided by [Bellavia, Gurioli, Morini and Toint, 2018] for inexact methods for smooth nonconvex problems, and is within a factor $|\log(ε)|$ of the optimal bound known for smooth and nonsmooth nonconvex minimization with exact evaluations. A practically more restrictive variant of the algorithm with worst-case complexity $O(|\log(ε)|+ε^{-2})$ is also presented. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: 19 pages

MSC Class: 49K10; 49M37; 65K05; 68T05; 68W40 ACM Class: F.1.3; F.2.1; G.1.6; I.2.6

arXiv:1902.03056 [pdf, ps, other]

Bernstein Concentration Inequalities for Tensors via Einstein Products

Authors: Z. Luo, L. Qi, Ph. L. Toint

Abstract: A generalization of the Bernstein matrix concentration inequality to random tensors of general order is proposed. This generalization is based on the use of Einstein products between tensors, from which a strong link can be established between matrices and tensors, in turn allowing exploitation of existing results for the former. A generalization of the Bernstein matrix concentration inequality to random tensors of general order is proposed. This generalization is based on the use of Einstein products between tensors, from which a strong link can be established between matrices and tensors, in turn allowing exploitation of existing results for the former. △ Less

Submitted 8 February, 2019; originally announced February 2019.

Comments: 12 pages

MSC Class: 15A52; 15A72; 49J55; 60H25 ACM Class: F.2.1; G.1.3; G.3

Journal ref: Frontiers of Mathematics in China, vol. 5(2), pp. 367-384, 2020

arXiv:1812.03467 [pdf, ps, other]

A note on solving nonlinear optimization problems in variable precision

Authors: S. Gratton, Ph. L. Toint

Abstract: This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed Carter (1993) and Conn, Gould and Toint (2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered met… ▽ More This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed Carter (1993) and Conn, Gould and Toint (2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function's and gradient's evaluation "energy costs" by efficiently exploiting multi-precision computations. △ Less

Submitted 12 April, 2019; v1 submitted 9 December, 2018; originally announced December 2018.

Comments: 11 pages, 2 figures

MSC Class: 90C26; 90C30; 65K05 ACM Class: G.1.6; F.2.1; B.2.3; B.2.4; I.2.5

arXiv:1811.07057 [pdf, ps, other]

Universal regularization methods - varying the power, the smoothness and the accuracy

Authors: Coralia Cartis, Nicholas I. M. Gould, Philippe L. Toint

Abstract: Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size and applied to co… ▽ More Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size and applied to convexly-constrained optimization problems. We investigate the worst-case evaluation complexity/global rate of convergence of these algorithms, when the level of sufficient smoothness of the objective may be unknown or may even be absent. We find that the methods accurately reflect in their complexity the degree of smoothness of the objective and satisfy increasingly better bounds with improving accuracy of the models. The bounds vary continuously and robustly with respect to the regularization power and accuracy of the model and the degree of smoothness of the objective. △ Less

Submitted 16 November, 2018; originally announced November 2018.

Report number: Technical report, Oxford University, Numerical Analysis Group, 2017

arXiv:1811.03831 [pdf, ps, other]

Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization

Authors: S. Bellavia, G. Gurioli, B. Morini, Ph. L. Toint

Abstract: A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $β$-Hölder continuous. It feat… ▽ More A regularization algorithm using inexact function values and inexact derivatives is proposed and its evaluation complexity analyzed. This algorithm is applicable to unconstrained problems and to problems with inexpensive constraints (that is constraints whose evaluation and enforcement has negligible cost) under the assumption that the derivative of highest degree is $β$-Hölder continuous. It features a very flexible adaptive mechanism for determining the inexactness which is allowed, at each iteration, when computing objective function values and derivatives. The complexity analysis covers arbitrary optimality order and arbitrary degree of available approximate derivatives. It extends results of Cartis, Gould and Toint (2018) on the evaluation complexity to the inexact case: if a $q$th order minimizer is sought using approximations to the first $p$ derivatives, it is proved that a suitable approximate minimizer within $ε$ is computed by the proposed algorithm in at most $O(ε^{-\frac{p+β}{p-q+β}})$ iterations and at most $O(|\log(ε)|ε^{-\frac{p+β}{p-q+β}})$ approximate evaluations. An algorithmic variant, although more rigid in practice, can be proved to find such an approximate minimizer in $O(|\log(ε)|+ε^{-\frac{p+β}{p-q+β}})$ evaluations.While the proposed framework remains so far conceptual for high degrees and orders, it is shown to yield simple and computationally realistic inexact methods when specialized to the unconstrained and bound-constrained first- and second-order cases. The deterministic complexity results are finally extended to the stochastic context, yielding adaptive sample-size rules for subsampling methods typical of machine learning. △ Less

Submitted 19 April, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: 32 pages

MSC Class: 49K10; 49M37; 65K05; 68T05; 68W40 ACM Class: F.1.3; F.2.1; G.1.6; I.2.6

arXiv:1811.01220 [pdf, ps, other]

Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints

Authors: Coralia Cartis, Nick I. M. Gould, Philippe L. Toint

Abstract: We provide sharp worst-case evaluation complexity bounds for nonconvex minimization problems with general inexpensive constraints, i.e.\ problems where the cost of evaluating/enforcing of the (possibly nonconvex or even disconnected) constraints, if any, is negligible compared to that of evaluating the objective function. These bounds unify, extend or improve all known upper and lower complexity b… ▽ More We provide sharp worst-case evaluation complexity bounds for nonconvex minimization problems with general inexpensive constraints, i.e.\ problems where the cost of evaluating/enforcing of the (possibly nonconvex or even disconnected) constraints, if any, is negligible compared to that of evaluating the objective function. These bounds unify, extend or improve all known upper and lower complexity bounds for unconstrained and convexly-constrained problems. It is shown that, given an accuracy level $ε$, a degree of highest available Lipschitz continuous derivatives $p$ and a desired optimality order $q$ between one and $p$, a conceptual regularization algorithm requires no more than $O(ε^{-\frac{p+1}{p-q+1}})$ evaluations of the objective function and its derivatives to compute a suitably approximate $q$-th order minimizer. With an appropriate choice of the regularization, a similar result also holds if the $p$-th derivative is merely Hölder rather than Lipschitz continuous. We provide an example that shows that the above complexity bound is sharp for unconstrained and a wide class of constrained problems, we also give reasons for the optimality of regularization methods from a worst-case complexity point of view, within a large class of algorithms that use the same derivative information. △ Less

Submitted 3 November, 2018; originally announced November 2018.

Comments: 30 pages

MSC Class: 49K10; 49M37; 65K05; 65Y20; 68T05; 68W40 ACM Class: F.1.3, F.2.1, G.1.6, I.2.6

Journal ref: SIAM Journal on Optimization,, vol. 30(1), pp. 513-541, 2020

arXiv:1807.07476 [pdf, other]

Minimizing convex quadratic with variable precision conjugate gradients

Authors: S. Gratton, E. Simon, D. Titley-Peloquin, Ph. L. Toint

Abstract: We investigate the method of conjugate gradients, exploiting inaccurate matrix-vector products, for the solution of convex quadratic optimization problems. Theoretical performance bounds are derived, and the necessary quantities occurring in the theoretical bounds estimated, leading to a practical algorithm. Numerical experiments suggest that this approach has significant potential, including in t… ▽ More We investigate the method of conjugate gradients, exploiting inaccurate matrix-vector products, for the solution of convex quadratic optimization problems. Theoretical performance bounds are derived, and the necessary quantities occurring in the theoretical bounds estimated, leading to a practical algorithm. Numerical experiments suggest that this approach has significant potential, including in the steadily more important context of multi-precision computations △ Less

Submitted 21 September, 2020; v1 submitted 17 July, 2018; originally announced July 2018.

MSC Class: 90C20; 65F10; 65G99; ACM Class: F.2.1; G.1.3; B.2.3; B.2.4

arXiv:1711.09407 [pdf, ps, other]

A note on using performance and data profilesfor training algorithms

Authors: Margherita Porcelli, Philippe L. Toint

Abstract: It is shown how to use the performance and data profile benchmarking tools to improve algorithms' performance. An illustration for the BFO derivative-free optimizer suggests that the obtained gains are potentially significant. It is shown how to use the performance and data profile benchmarking tools to improve algorithms' performance. An illustration for the BFO derivative-free optimizer suggests that the obtained gains are potentially significant. △ Less

Submitted 26 November, 2017; originally announced November 2017.

Comments: 8 pages, 4 tables, 4 figures

MSC Class: 65K05; 90C56; 90C90 ACM Class: G.4; D.2.2; D.2.8; G.1.6

Journal ref: Transactions of the AMS on Mathematical Software, vol. 45(2), 2019

arXiv:1709.09031 [pdf, other]

doi 10.1002/qj.3262

A note on preconditioning weighted linear least squares, with consequences for weakly-constrained variational data assimilation

Authors: Serge Gratton, Selime Gürol, Ehouarn Simon, Philippe L. Toint

Abstract: The effect of preconditioning linear weighted least-squares using an approximation of the model matrix is analyzed, showing the interplay of the eigenstructures of both the model and weighting matrices. A small example is given illustrating the resulting potential inefficiency of such preconditioners. Consequences of these results in the context of the weakly-constrained 4D-Var data assimilation p… ▽ More The effect of preconditioning linear weighted least-squares using an approximation of the model matrix is analyzed, showing the interplay of the eigenstructures of both the model and weighting matrices. A small example is given illustrating the resulting potential inefficiency of such preconditioners. Consequences of these results in the context of the weakly-constrained 4D-Var data assimilation problem are finally discussed. △ Less

Submitted 26 September, 2017; originally announced September 2017.

Comments: 10 pages, 2 figures

MSC Class: 86A5; 86A10; 90C06; 90C30; 15A12 ACM Class: G.1.3; G.1.6

Journal ref: Quarterly Journal of the Royal Meteorological Society, vol. 144(172), pp. 934--940, 2018

arXiv:1709.07180 [pdf, ps, other]

Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization

Authors: Coralia Cartis, Nick I. M. Gould, Philippe L. Toint

Abstract: We establish or refute the optimality of inexact second-order methods for unconstrained nonconvex optimization from the point of view of worst-case evaluation complexity, improving and generalizing the results of Cartis, Gould and Toint (2010,2011). To this aim, we consider a new general class of inexact second-order algorithms for unconstrained optimization that includes regularization and trust-… ▽ More We establish or refute the optimality of inexact second-order methods for unconstrained nonconvex optimization from the point of view of worst-case evaluation complexity, improving and generalizing the results of Cartis, Gould and Toint (2010,2011). To this aim, we consider a new general class of inexact second-order algorithms for unconstrained optimization that includes regularization and trust-region variations of Newton's method as well as of their linesearch variants. For each method in this class and arbitrary accuracy threshold $ε\in (0,1)$, we exhibit a smooth objective function with bounded range, whose gradient is globally Lipschitz continuous and whose Hessian is $α-$Hölder continuous (for given $α\in [0,1]$), for which the method in question takes at least $\lfloorε^{-(2+α)/(1+α)}\rfloor$ function evaluations to generate a first iterate whose gradient is smaller than $ε$ in norm. Moreover, we also construct another function on which Newton's takes $\lfloorε^{-2}\rfloor$ evaluations, but whose Hessian is Lipschitz continuous on the path of iterates. These examples provide lower bounds on the worst-case evaluation complexity of methods in our class when applied to smooth problems satisfying the relevant assumptions. Furthermore, for $α=1$, this lower bound is of the same order in $ε$ as the upper bound on the worst-case evaluation complexity of the cubic and other methods in a class of methods proposed in Curtis, Robinson and samadi (2017) or in Royer and Wright (2017), thus implying that these methods have optimal worst-case evaluation complexity within a wider class of second-order methods, and that Newton's method is suboptimal. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Report number: naXys, University of Namur, 2017 MSC Class: 90C60

Journal ref: Mathematical Programming, vol. 163(1), pp. 359-368, 2017

arXiv:1709.06383 [pdf, ps, other]

doi 10.1002/qj.3355

On the use of the saddle formulation in weakly-constrained 4D-VAR data assimilation

Authors: S. Gratton, S. Gürol, E. Simon, Ph. L. Toint

Abstract: This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm are then proposed whos… ▽ More This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm are then proposed whose practical performance is compared to that of other formulations. This comparison is conducted on two data assimilation instances (Burgers equation and the Quasi-Geostrophic model), using two different assumptions on parallel computing environment. Because these variants essentially retain the parallelization advantages of the original proposal, they often --- but not always --- perform best, even for moderate numbers of computing processes. △ Less

Submitted 19 September, 2017; originally announced September 2017.

Journal ref: Quarterly Journal of the Royal Meteorological Society, 144(717), pp. 2792-2602, 2018

arXiv:1708.04044 [pdf, ps, other]

Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models

Authors: Coralia Cartis, Nicholas I. M. Gould, Philippe L. Toint

Abstract: The unconstrained minimization of a sufficiently smooth objective function $f(x)$ is considered, for which derivatives up to order $p$, $p\geq 2$, are assumed to be available. An adaptive regularization algorithm is proposed that uses Taylor models of the objective of order $p$ and that is guaranteed to find a first- and second-order critical point in at most… ▽ More The unconstrained minimization of a sufficiently smooth objective function $f(x)$ is considered, for which derivatives up to order $p$, $p\geq 2$, are assumed to be available. An adaptive regularization algorithm is proposed that uses Taylor models of the objective of order $p$ and that is guaranteed to find a first- and second-order critical point in at most $O \left(\max\left( ε_1^{-\frac{p+1}{p}}, ε_2^{-\frac{p+1}{p-1}} \right) \right)$ function and derivatives evaluations, where $ε_1$ and $ε_2 >0$ are prescribed first- and second-order optimality tolerances. Our approach extends the method in Birgin et al. (2016) to finding second-order critical points, and establishes the novel complexity bound for second-order criticality under identical problem assumptions as for first-order, namely, that the $p$-th derivative tensor is Lipschitz continuous and that $f(x)$ is bounded from below. The evaluation-complexity bound for second-order criticality improves on all such known existing results. △ Less

Submitted 14 August, 2017; originally announced August 2017.

Journal ref: Optimization methods and Software, vol. 35(2), pp. 243-256, 2020

arXiv:1705.07285 [pdf, ps, other]

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Authors: C. Cartis, N. I. M. Gould, Ph. L. Toint

Abstract: Necessary conditions for high-order optimality in smooth nonlinear constrained optimization are explored and their inherent intricacy discussed. A two-phase minimization algorithm is proposed which can achieve approximate first-, second- and third-order criticality and its evaluation complexity is analyzed as a function of the choice (among existing methods) of an inner algorithm for solving subpr… ▽ More Necessary conditions for high-order optimality in smooth nonlinear constrained optimization are explored and their inherent intricacy discussed. A two-phase minimization algorithm is proposed which can achieve approximate first-, second- and third-order criticality and its evaluation complexity is analyzed as a function of the choice (among existing methods) of an inner algorithm for solving subproblems in each of the two phases. The relation between high-order criticality and penalization techniques is finally considered, showing that standard algorithmic approaches will fail if approximate constrained high-order critical points are sought. △ Less

Submitted 7 January, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

Comments: 32 pages, 3 figures

MSC Class: 90C26; 90C46; 90C30 ACM Class: F.2.1; G.1.6

Journal ref: Journal of Complexity, vol. 53, pp. 68-94, 2019

arXiv:1705.04895 [pdf, other]

Evaluation complexity bounds for smooth constrained nonlinear optimisation using scaled KKT conditions, high-order models and the criticality measure $χ$

Authors: Coralia Cartis, Nick Gould, Philippe L Toint

Abstract: Evaluation complexity for convexly constrained optimization is considered and it is shown first that the complexity bound of $O(ε^{-3/2})$ proved by Cartis, Gould and Toint (IMAJNA 32(4) 2012, pp.1662-1695) for computing an $ε$-approximate first-order critical point can be obtained under significantly weaker assumptions. Moreover, the result is generalized to the case where high-order derivatives… ▽ More Evaluation complexity for convexly constrained optimization is considered and it is shown first that the complexity bound of $O(ε^{-3/2})$ proved by Cartis, Gould and Toint (IMAJNA 32(4) 2012, pp.1662-1695) for computing an $ε$-approximate first-order critical point can be obtained under significantly weaker assumptions. Moreover, the result is generalized to the case where high-order derivatives are used, resulting in a bound of $O(ε^{-(p+1)/p})$ evaluations whenever derivatives of order $p$ are available. It is also shown that the bound of $O(ε_P^{-1/2}ε_D^{-3/2})$ evaluations ($ε_P$ and $ε_D$ being primal and dual accuracy thresholds) suggested by Cartis, Gould and Toint (SINUM, 2015) for the general nonconvex case involving both equality and inequality constraints can be generalized to a bound of $O(ε_P^{-1/p}ε_D^{-(p+1)/p})$ evaluations under similarly weakened assumptions. This paper is variant of a companion report (NTR-11-2015, University of Namur, Belgium) which uses a different first-order criticality measure to obtain the same complexity bounds. △ Less

Submitted 13 May, 2017; originally announced May 2017.

Showing 1–44 of 44 results for author: Toint, P L