-
A Hybrid Subgradient Method for Nonsmooth Nonconvex Bilevel Optimization
Authors:
Nachuan Xiao,
Xiaoyin Hu,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we focus on the nonconvex-nonconvex bilevel optimization problem (BLO), where both upper-level and lower-level objectives are nonconvex, with the upper-level problem potentially being nonsmooth. We develop a two-timescale momentum-accelerated subgradient method (TMG) that employs two-timescale stepsizes, and establish its local convergence when initialized within a sufficiently smal…
▽ More
In this paper, we focus on the nonconvex-nonconvex bilevel optimization problem (BLO), where both upper-level and lower-level objectives are nonconvex, with the upper-level problem potentially being nonsmooth. We develop a two-timescale momentum-accelerated subgradient method (TMG) that employs two-timescale stepsizes, and establish its local convergence when initialized within a sufficiently small neighborhood of the feasible region. To develop a globally convergent algorithm for (BLO), we introduce a feasibility restoration scheme (FRG) that drives iterates toward the feasible region. Both (TMG) and (FRG) only require the first-order derivatives of the upper-level and lower-level objective functions, ensuring efficient computations in practice. We then develop a novel hybrid method that alternates between (TMG) and (FRG) and adaptively estimates its hyperparameters. Under mild conditions, we establish the global convergence properties of our proposed algorithm. Preliminary numerical experiments demonstrate the high efficiency and promising potential of our proposed algorithm.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
An Exact Penalty Approach for Equality Constrained Optimization over a Convex Set
Authors:
Nachuan Xiao,
Tianyun Tang,
Shiwei Wang,
Kim-Chuan Toh
Abstract:
In this paper, we consider the nonlinear constrained optimization problem (NCP) with constraint set $\{x \in \mathcal{X}: c(x) = 0\}$, where $\mathcal{X}$ is a closed convex subset of $\mathbb{R}^n$. We propose an exact penalty approach, named constraint dissolving approach, that transforms (NCP) into its corresponding constraint dissolving problem (CDP). The transformed problem (CDP) admits…
▽ More
In this paper, we consider the nonlinear constrained optimization problem (NCP) with constraint set $\{x \in \mathcal{X}: c(x) = 0\}$, where $\mathcal{X}$ is a closed convex subset of $\mathbb{R}^n$. We propose an exact penalty approach, named constraint dissolving approach, that transforms (NCP) into its corresponding constraint dissolving problem (CDP). The transformed problem (CDP) admits $\mathcal{X}$ as its feasible region with a locally Lipschitz smooth objective function. We prove that (NCP) and (CDP) share the same first-order stationary points, second-order stationary points, second-order sufficient condition (SOSC) points, and strong SOSC points, in a neighborhood of the feasible region. Moreover, we prove that these equivalences extend globally under a particular error bound condition. Therefore, our proposed constraint dissolving approach enables direct implementations of optimization approaches over $\mathcal{X}$ and inherits their convergence properties to solve problems that take the form of (NCP). Preliminary numerical experiments illustrate the high efficiency of directly applying existing solvers for optimization over $\mathcal{X}$ to solve (NCP) through (CDP). These numerical results further demonstrate the practical potential of our proposed constraint dissolving approach.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Stochastic optimization over expectation-formulated generalized Stiefel manifold
Authors:
Linshuo Jiang,
Nachuan Xiao,
Xin Liu
Abstract:
In this paper, we consider a class of stochastic optimization problems over the expectation-formulated generalized Stiefel manifold (SOEGS), where the objective function $f$ is continuously differentiable. We propose a novel constraint dissolving penalty function with a customized penalty term (CDFDP), which maintains the same order of differentiability as $f$. Our theoretical analysis establishes…
▽ More
In this paper, we consider a class of stochastic optimization problems over the expectation-formulated generalized Stiefel manifold (SOEGS), where the objective function $f$ is continuously differentiable. We propose a novel constraint dissolving penalty function with a customized penalty term (CDFDP), which maintains the same order of differentiability as $f$. Our theoretical analysis establishes the global equivalence between CDFCP and SOEGS in the sense that they share the same first-order and second-order stationary points under mild conditions. These results on equivalence enable the direct implementation of various stochastic optimization approaches to solve SOEGS. In particular, we develop a stochastic gradient algorithm and its accelerated variant by incorporating an adaptive step size strategy. Furthermore, we prove their $\mathcal{O}(\varepsilon^{-4})$ sample complexity for finding an $\varepsilon$-stationary point of CDFCP. Comprehensive numerical experiments show the efficiency and robustness of our proposed algorithms.
△ Less
Submitted 27 December, 2024;
originally announced December 2024.
-
A Double Tracking Method for Optimization with Decentralized Generalized Orthogonality Constraints
Authors:
Lei Wang,
Nachuan Xiao,
Xin Liu
Abstract:
In this paper, we consider the decentralized optimization problems with generalized orthogonality constraints, where both the objective function and the constraint exhibit a distributed structure. Such optimization problems, albeit ubiquitous in practical applications, remain unsolvable by existing algorithms in the presence of distributed constraints. To address this issue, we convert the origina…
▽ More
In this paper, we consider the decentralized optimization problems with generalized orthogonality constraints, where both the objective function and the constraint exhibit a distributed structure. Such optimization problems, albeit ubiquitous in practical applications, remain unsolvable by existing algorithms in the presence of distributed constraints. To address this issue, we convert the original problem into an unconstrained penalty model by resorting to the recently proposed constraint-dissolving operator. However, this transformation compromises the essential property of separability in the resulting penalty function, rendering it impossible to employ existing algorithms to solve. We overcome this difficulty by introducing a novel algorithm that tracks the gradient of the objective function and the Jacobian of the constraint mapping simultaneously. The global convergence guarantee is rigorously established with an iteration complexity. To substantiate the effectiveness and efficiency of our proposed algorithm, we present numerical results on both synthetic and real-world datasets.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
A Minimization Approach for Minimax Optimization with Coupled Constraints
Authors:
Xiaoyin Hu,
Kim-Chuan Toh,
Shiwei Wang,
Nachuan Xiao
Abstract:
In this paper, we focus on the nonconvex-strongly-concave minimax optimization problem (MCC), where the inner maximization subproblem contains constraints that couple the primal variable of the outer minimization problem. We prove that by introducing the dual variable of the inner maximization subproblem, (MCC) has the same first-order minimax points as a nonconvex-strongly-concave minimax optimiz…
▽ More
In this paper, we focus on the nonconvex-strongly-concave minimax optimization problem (MCC), where the inner maximization subproblem contains constraints that couple the primal variable of the outer minimization problem. We prove that by introducing the dual variable of the inner maximization subproblem, (MCC) has the same first-order minimax points as a nonconvex-strongly-concave minimax optimization problem without coupled constraints (MOL). We then extend our focus to a class of nonconvex-strongly-concave minimax optimization problems (MM) that generalize (MOL). By performing the partial forward-backward envelope to the primal variable of the inner maximization subproblem, we propose a minimization problem (MMPen), where its objective function is explicitly formulated. We prove that the first-order stationary points of (MMPen) coincide with the first-order minimax points of (MM). Therefore, various efficient minimization methods and their convergence guarantees can be directly employed to solve (MM), hence solving (MCC) through (MOL). Preliminary numerical experiments demonstrate the great potential of our proposed approach.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Learning-rate-free Momentum SGD with Reshuffling Converges in Nonsmooth Nonconvex Optimization
Authors:
Xiaoyin Hu,
Nachuan Xiao,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we propose a generalized framework for developing learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our…
▽ More
In this paper, we propose a generalized framework for developing learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our proposed framework enjoys global convergence to the stationary points of the objective function in the sense of the conservative field, hence providing convergence guarantees for training nonsmooth neural networks. Based on our proposed framework, we propose a novel learning-rate-free momentum SGD method (LFM). Preliminary numerical experiments reveal that LFM performs comparably to the state-of-the-art learning-rate-free methods (which have not been shown theoretically to be convergence) across well-known neural network training benchmarks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Tests for principal eigenvalues and eigenvectors
Authors:
Jianqing Fan,
Yingying Li,
Ningning Xia,
Xinghua Zheng
Abstract:
We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of…
▽ More
We establish central limit theorems for principal eigenvalues and eigenvectors under a large factor model setting, and develop two-sample tests of both principal eigenvalues and principal eigenvectors. One important application is to detect structural breaks in large factor models. Compared with existing methods for detecting structural breaks, our tests provide unique insights into the source of structural breaks because they can distinguish between individual principal eigenvalues and/or eigenvectors. We demonstrate the application by comparing the principal eigenvalues and principal eigenvectors of S\&P500 Index constituents' daily returns over different years.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization
Authors:
Nachuan Xiao,
Kuangyu Ding,
Xiaoyin Hu,
Kim-Chuan Toh
Abstract:
In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These su…
▽ More
In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions
Authors:
Siyuan Zhang,
Nachuan Xiao,
Xin Liu
Abstract:
In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tra…
▽ More
In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.
△ Less
Submitted 9 May, 2025; v1 submitted 18 March, 2024;
originally announced March 2024.
-
An Inexact Preconditioned Zeroth-order Proximal Method for Composite Optimization
Authors:
Shanglin Liu,
Lei Wang,
Nachuan Xiao,
Xin Liu
Abstract:
In this paper, we consider the composite optimization problem, where the objective function integrates a continuously differentiable loss function with a nonsmooth regularization term. Moreover, only the function values for the differentiable part of the objective function are available. To efficiently solve this composite optimization problem, we propose a preconditioned zeroth-order proximal gra…
▽ More
In this paper, we consider the composite optimization problem, where the objective function integrates a continuously differentiable loss function with a nonsmooth regularization term. Moreover, only the function values for the differentiable part of the objective function are available. To efficiently solve this composite optimization problem, we propose a preconditioned zeroth-order proximal gradient method in which the gradients and preconditioners are estimated by finite-difference schemes based on the function values at the same trial points. We establish the global convergence and worst-case complexity for our proposed method. Numerical experiments exhibit the superiority of our developed method.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Adam-family Methods with Decoupled Weight Decay in Deep Learning
Authors:
Kuangyu Ding,
Nachuan Xiao,
Kim-Chuan Toh
Abstract:
In this paper, we investigate the convergence properties of a wide class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by the AdamW method, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, th…
▽ More
In this paper, we investigate the convergence properties of a wide class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by the AdamW method, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, the estimators for the first-order and second-order moments of stochastic subgradients are updated independently of the weight decay term. Under mild assumptions and with non-diminishing stepsizes for updating the primary optimization variables, we establish the convergence properties of our proposed framework. In addition, we show that our proposed framework encompasses a wide variety of well-known Adam-family methods, hence offering convergence guarantees for these methods in the training of nonsmooth neural networks. More importantly, we show that our proposed framework asymptotically approximates the SGD method, thereby providing an explanation for the empirical observation that decoupled weight decay enhances generalization performance for Adam-family methods. As a practical application of our proposed framework, we propose a novel Adam-family method named Adam with Decoupled Weight Decay (AdamD), and establish its convergence properties under mild conditions. Numerical experiments demonstrate that AdamD outperforms Adam and is comparable to AdamW, in the aspects of both generalization performance and efficiency.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Stochastic Subgradient Methods with Guaranteed Global Stability in Nonsmooth Nonconvex Optimization
Authors:
Nachuan Xiao,
Xiaoyin Hu,
Kim-Chuan Toh
Abstract:
In this paper, we focus on providing convergence guarantees for stochastic subgradient methods in minimizing nonsmooth nonconvex functions. We first investigate the global stability of a general framework for stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, for any sequence of sufficiently small stepsizes and approxi…
▽ More
In this paper, we focus on providing convergence guarantees for stochastic subgradient methods in minimizing nonsmooth nonconvex functions. We first investigate the global stability of a general framework for stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, for any sequence of sufficiently small stepsizes and approximation parameters, coupled with sufficiently controlled noises, the iterates are uniformly bounded and asymptotically stabilize around the stable set of its corresponding differential inclusion. Moreover, we develop an improved analysis to apply our proposed framework to establish the global stability of a wide range of stochastic subgradient methods, where the corresponding Lyapunov functions are possibly non-coercive. These theoretical results illustrate the promising potential of our proposed framework for establishing the global stability of various stochastic subgradient methods.
△ Less
Submitted 12 October, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees
Authors:
Nachuan Xiao,
Xiaoyin Hu,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family…
▽ More
In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we show that our proposed methods converge even when the evaluation noises are only assumed to be integrable. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.
△ Less
Submitted 19 February, 2024; v1 submitted 6 May, 2023;
originally announced May 2023.
-
A Riemannian Dimension-reduced Second Order Method with Application in Sensor Network Localization
Authors:
Tianyun Tang,
Kim-Chuan Toh,
Nachuan Xiao,
Yinyu Ye
Abstract:
In this paper, we propose a cubic-regularized Riemannian optimization method (RDRSOM), which partially exploits the second order information and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. In order to reduce the per-iteration computational cost, we further propose a practical version of (RDRSOM), which is an extension of the well known Barzilai-Borwein method and achieves the it…
▽ More
In this paper, we propose a cubic-regularized Riemannian optimization method (RDRSOM), which partially exploits the second order information and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. In order to reduce the per-iteration computational cost, we further propose a practical version of (RDRSOM), which is an extension of the well known Barzilai-Borwein method and achieves the iteration complexity of $\mathcal{O}(1/ε^{3/2})$. We apply our method to solve a nonlinear formulation of the wireless sensor network localization problem whose feasible set is a Riemannian manifold that has not been considered in the literature before. Numerical experiments are conducted to verify the high efficiency of our algorithm compared to state-of-the-art Riemannian optimization methods and other nonlinear solvers.
△ Less
Submitted 24 April, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
A Partial Exact Penalty Function Approach for Constrained Optimization
Authors:
Nachuan Xiao,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we focus on a class of constrained nonlinear optimization problems (NLP), where some of its equality constraints define a closed embedded submanifold $\mathcal{M}$ in $\mathbb{R}^n$. Although NLP can be solved directly by various existing approaches for constrained optimization in Euclidean space, these approaches usually fail to recognize the manifold structure of $\mathcal{M}$. To…
▽ More
In this paper, we focus on a class of constrained nonlinear optimization problems (NLP), where some of its equality constraints define a closed embedded submanifold $\mathcal{M}$ in $\mathbb{R}^n$. Although NLP can be solved directly by various existing approaches for constrained optimization in Euclidean space, these approaches usually fail to recognize the manifold structure of $\mathcal{M}$. To achieve better efficiency by utilizing the manifold structure of $\mathcal{M}$ in directly applying these existing optimization approaches, we propose a partial penalty function approach for NLP. In our proposed penalty function approach, we transform NLP into the corresponding constraint dissolving problem (CDP) in the Euclidean space, where the constraints that define $\mathcal{M}$ are eliminated through exact penalization. We establish the relationships on the constraint qualifications between NLP and CDP, and prove that NLP and CDP have the same stationary points and KKT points in a neighborhood of the feasible region under mild conditions. Therefore, various existing optimization approaches developed for constrained optimization in the Euclidean space can be directly applied to solve NLP through CDP. Preliminary numerical experiments demonstrate that by dissolving the constraints that define $\mathcal{M}$, CDP gains superior computational efficiency when compared to directly applying existing optimization approaches to solve NLP, especially in high dimensional scenarios.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
CDOpt: A Python Package for a Class of Riemannian Optimization
Authors:
Nachuan Xiao,
Xiaoyin Hu,
Xin Liu,
Kim-Chuan Toh
Abstract:
Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc…
▽ More
Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc. These geometrical materials can be challenging to determine in general. Existing packages only accommodate a few well-known manifolds whose geometrical materials are easily accessible. For other manifolds which are not contained in these packages, the users have to develop the geometric materials by themselves. In addition, it is not always tractable to adopt advanced features from various state-of-the-art unconstrained optimization solvers to Riemannian optimization approaches.
We introduce CDOpt (available at https://cdopt.github.io/), a user-friendly Python package for a class Riemannian optimization. Based on constraint dissolving approaches, Riemannian optimization problems are transformed into their equivalent unconstrained counterparts in CDOpt. Therefore, solving Riemannian optimization problems through CDOpt directly benefits from various existing solvers and the rich expertise gained over decades for unconstrained optimization. Moreover, all the computations in CDOpt related to any manifold in question are conducted on its constraints expression, hence users can easily define new manifolds in CDOpt without any background on differential geometry. Furthermore, CDOpt extends the neural layers from PyTorch and Flax, thus allows users to train manifold constrained neural networks directly by the solvers for unconstrained optimization. Extensive numerical experiments demonstrate that CDOpt is highly efficient and robust in solving various classes of Riemannian optimization problems.
△ Less
Submitted 12 October, 2024; v1 submitted 5 December, 2022;
originally announced December 2022.
-
An Improved Unconstrained Approach for Bilevel Optimization
Authors:
Xiaoyin Hu,
Nachuan Xiao,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we focus on the nonconvex-strongly-convex bilevel optimization problem (BLO). In this BLO, the objective function of the upper-level problem is nonconvex and possibly nonsmooth, and the lower-level problem is smooth and strongly convex with respect to the underlying variable $y$. We show that the feasible region of BLO is a Riemannian manifold. Then we transform BLO to its correspon…
▽ More
In this paper, we focus on the nonconvex-strongly-convex bilevel optimization problem (BLO). In this BLO, the objective function of the upper-level problem is nonconvex and possibly nonsmooth, and the lower-level problem is smooth and strongly convex with respect to the underlying variable $y$. We show that the feasible region of BLO is a Riemannian manifold. Then we transform BLO to its corresponding unconstrained constraint dissolving problem (CDB), whose objective function is explicitly formulated from the objective functions in BLO. We prove that BLO is equivalent to the unconstrained optimization problem CDB. Therefore, various efficient unconstrained approaches, together with their theoretical results, can be directly applied to BLO through CDB. We propose a unified framework for developing subgradient-based methods for CDB. Remarkably, we show that several existing efficient algorithms can fit the unified framework and be interpreted as descent algorithms for CDB. These examples further demonstrate the great potential of our proposed approach.
△ Less
Submitted 23 December, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
A Constraint Dissolving Approach for Nonsmooth Optimization over the Stiefel Manifold
Authors:
Xiaoyin Hu,
Nachuan Xiao,
Xin Liu,
Kim-Chuan Toh
Abstract:
This paper focus on the minimization of a possibly nonsmooth objective function over the Stiefel manifold. The existing approaches either lack efficiency or can only tackle prox-friendly objective functions. We propose a constraint dissolving function named NCDF and show that it has the same first-order stationary points and local minimizers as the original problem in a neighborhood of the Stiefel…
▽ More
This paper focus on the minimization of a possibly nonsmooth objective function over the Stiefel manifold. The existing approaches either lack efficiency or can only tackle prox-friendly objective functions. We propose a constraint dissolving function named NCDF and show that it has the same first-order stationary points and local minimizers as the original problem in a neighborhood of the Stiefel manifold. Furthermore, we show that the Clarke subdifferential of NCDF is easy to achieve from the Clarke subdifferential of the objective function. Therefore, various existing approaches for unconstrained nonsmooth optimization can be directly applied to nonsmooth optimization problems over the Stiefel manifold. We propose a framework for developing subgradient-based methods and establish their convergence properties based on prior works. Furthermore, based on our proposed framework, we can develop efficient approaches for optimization over the Stiefel manifold. Preliminary numerical experiments further highlight that the proposed constraint dissolving approach yields efficient and direct implementations of various unconstrained approaches to nonsmooth optimization problems over the Stiefel manifold.
△ Less
Submitted 20 January, 2023; v1 submitted 21 May, 2022;
originally announced May 2022.
-
Dissolving Constraints for Riemannian Optimization
Authors:
Nachuan Xiao,
Xin Liu,
Kim-Chuan Toh
Abstract:
In this paper, we consider optimization problems over closed embedded submanifolds of $\mathbb{R}^n$, which are defined by the constraints $c(x) = 0$. We propose a class of constraint dissolving approaches for these Riemannian optimization problems. In these proposed approaches, solving a Riemannian optimization problem is transferred into the unconstrained minimization of a constraint dissolving…
▽ More
In this paper, we consider optimization problems over closed embedded submanifolds of $\mathbb{R}^n$, which are defined by the constraints $c(x) = 0$. We propose a class of constraint dissolving approaches for these Riemannian optimization problems. In these proposed approaches, solving a Riemannian optimization problem is transferred into the unconstrained minimization of a constraint dissolving function named CDF. Different from existing exact penalty functions, the exact gradient and Hessian of CDF are easy to compute. We study the theoretical properties of CDF and prove that the original problem and CDF have the same first-order and second-order stationary points, local minimizers, and Łojasiewicz exponents in a neighborhood of the feasible region. Remarkably, the convergence properties of our proposed constraint dissolving approaches can be directly inherited from the existing rich results in unconstrained optimization. Therefore, the proposed constraint dissolving approaches build up short cuts from unconstrained optimization to Riemannian optimization. Several illustrative examples further demonstrate the potential of our proposed constraint dissolving approaches.
△ Less
Submitted 14 October, 2022; v1 submitted 19 March, 2022;
originally announced March 2022.
-
Solving Optimization Problems over the Stiefel Manifold by Smooth Exact Penalty Function
Authors:
Nachuan Xiao,
Xin Liu
Abstract:
In this paper, we present a novel penalty model called ExPen for optimization over the Stiefel manifold. Different from existing penalty functions for orthogonality constraints, ExPen adopts a smooth penalty function without using any first-order derivative of the objective function. We show that all the first-order stationary points of ExPen with a sufficiently large penalty parameter are either…
▽ More
In this paper, we present a novel penalty model called ExPen for optimization over the Stiefel manifold. Different from existing penalty functions for orthogonality constraints, ExPen adopts a smooth penalty function without using any first-order derivative of the objective function. We show that all the first-order stationary points of ExPen with a sufficiently large penalty parameter are either feasible, namely, are the first-order stationary points of the original optimization problem, or far from the Stiefel manifold. Besides, the original problem and ExPen share the same second-order stationary points. Remarkably, the exact gradient and Hessian of ExPen are easy to compute. As a consequence, abundant algorithm resources in unconstrained optimization can be applied straightforwardly to solve ExPen.
△ Less
Submitted 18 December, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
A Penalty-free Infeasible Approach for a Class of Nonsmooth Optimization Problems over the Stiefel Manifold
Authors:
Nachuan Xiao,
Xin Liu,
Ya-xiang Yuan
Abstract:
Transforming into an exact penalty function model with convex compact constraints yields efficient infeasible approaches for optimization problems with orthogonality constraints. For smooth and $\ell_{2,1}$-norm regularized cases, these infeasible approaches adopt simple and orthonormalization-free updating scheme and show their high efficiency in the test examples. However, to avoid orthonormaliz…
▽ More
Transforming into an exact penalty function model with convex compact constraints yields efficient infeasible approaches for optimization problems with orthogonality constraints. For smooth and $\ell_{2,1}$-norm regularized cases, these infeasible approaches adopt simple and orthonormalization-free updating scheme and show their high efficiency in the test examples. However, to avoid orthonormalization while enforcing the feasibility of the final solution, these infeasible approaches introduce a quadratic penalty term, where an inappropriate penalty parameter can lead to numerical inefficiency. Inspired by penalty-free approaches for smooth optimization problems, we proposed a proximal first-order algorithm for a class of optimization problems with orthogonality constraints and nonsmooth regularization term. The consequent algorithm, named sequential linearized proximal gradient method (SLPG), alternatively takes tangential steps and normal steps to improve the optimality and feasibility respectively. In SLPG, the orthonormalization process is invoked only once at the last step if high precision in feasibility is needed, showing that main iterations in SLPG are orthonormalization-free. Besides, both the tangential steps and normal steps do not involve the penalty parameter, and thus SLPG is penalty-free and avoids the inefficiency by inappropriate penalty parameter. We analyze the global convergence properties of SLPG where the tangential steps are inexactly computed. By inexactly computing tangential steps, for smooth cases and $\ell_{2,1}$-norm regularized cases, SLPG has a closed-form updating scheme, which leads to its cheap tangential steps. Numerical experiments illustrate the numerical advantages of SLPG when compared with existing first-order methods.
△ Less
Submitted 28 March, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Physics driven reduced order model for real time blood flow simulations
Authors:
Sethuraman Sankaran,
David Lesage,
Rhea Tombropoulos,
Nan Xiao,
Hyun Jin Kim,
David Spain,
Michiel Schaap,
Charles A. Taylor
Abstract:
Predictive modeling of blood flow and pressure have numerous applications ranging from non-invasive assessment of functional significance of disease to planning invasive procedures. While several such predictive modeling techniques have been proposed, their use in the clinic has been limited due in part to the significant time required to perform virtual interventions and compute the resultant cha…
▽ More
Predictive modeling of blood flow and pressure have numerous applications ranging from non-invasive assessment of functional significance of disease to planning invasive procedures. While several such predictive modeling techniques have been proposed, their use in the clinic has been limited due in part to the significant time required to perform virtual interventions and compute the resultant changes in hemodynamic conditions. We propose a fast hemodynamic assessment method based on first constructing an exploration space of geometries, tailored to each patient, and subsequently building a physics driven reduced order model in this space. We demonstrate that this method can predict fractional flow reserve derived from coronary computed tomography angiography in response to changes to a patient-specific lumen geometry in real time while achieving high accuracy when compared to computational fluid dynamics simulations. We validated this method on over 1300 patients that received a coronary CT scan and demonstrated a correlation coefficient of 0.98 with an error of 0.005 +- 0.015 (95% confidence interval: (-0.020, 0.031)) as compared to three-dimensional blood flow calculations.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
On the estimation of high-dimensional integrated covariance matrix based on high-frequency data with multiple transactions
Authors:
Moming Wang,
Ningning Xia,
You Zhou
Abstract:
Due to the mechanism of recording, the presence of multiple transactions at each recording time becomes a common feature for high-frequency data in financial market. Using random matrix theory, this paper considers the estimation of integrated covariance (ICV) matrices of high-dimensional diffusion processes based on multiple high-frequency observations. We start by studying the estimator, the tim…
▽ More
Due to the mechanism of recording, the presence of multiple transactions at each recording time becomes a common feature for high-frequency data in financial market. Using random matrix theory, this paper considers the estimation of integrated covariance (ICV) matrices of high-dimensional diffusion processes based on multiple high-frequency observations. We start by studying the estimator, the time-variation adjusted realized covariance (TVA) matrix, proposed in Zheng and Li (2011) without microstructure noise. We show that in the high-dimensional case, for a class C of diffusion processes, the limiting spectral distribution (LSD) of averaged TVA depends not only on that of ICV, but also on the numbers of multiple transactions at each recording time. However, in practice, the observed prices are always contaminated by the market microstructure noise. Thus the limiting behavior of pre-averaging averaged TVA matrices is studied based on the noisy multiple observations. We show that for processes in class C, the pre-averaging averaged TVA has desirable properties that it eliminates the effects of microstructure noise and multiple transactions, and its LSD depends solely on that of the ICV matrix. Further, three types of nonlinear shrinkage estimators of ICV are proposed based on high-frequency noisy multiple observations. Simulation studies support our theoretical results and show the finite sample performance of the proposed estimators. At last, the high-frequency portfolio strategies are evaluated under these estimators in real data analysis.
△ Less
Submitted 5 September, 2019; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Shrinkage estimation of covariance matrix for portfolio choice with high frequency data
Authors:
Cheng Liu,
Ningning Xia,
Jun Yu
Abstract:
This paper examines the usefulness of high frequency data in estimating the covariance matrix for portfolio choice when the portfolio size is large. A computationally convenient nonlinear shrinkage estimator for the integrated covariance (ICV) matrix of financial assets is developed in two steps. The eigenvectors of the ICV are first constructed from a designed time variation adjusted realized cov…
▽ More
This paper examines the usefulness of high frequency data in estimating the covariance matrix for portfolio choice when the portfolio size is large. A computationally convenient nonlinear shrinkage estimator for the integrated covariance (ICV) matrix of financial assets is developed in two steps. The eigenvectors of the ICV are first constructed from a designed time variation adjusted realized covariance matrix of noise-free log-returns of relatively low frequency data. Then the regularized eigenvalues of the ICV are estimated by quasi-maximum likelihood based on high frequency data. The estimator is always positive definite and its inverse is the estimator of the inverse of ICV. It minimizes the limit of the out-of-sample variance of portfolio returns within the class of rotation-equivalent estimators. It works when the number of underlying assets is larger than the number of time series observations in each asset and when the asset price follows a general stochastic process. Our theoretical results are derived under the assumption that the number of assets (p) and the sample size (n) satisfy p/n \to y >0 as n goes to infty . The advantages of our proposed estimator are demonstrated using real data.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Convergence rate of eigenvector empirical spectral distribution of large Wigner matrices
Authors:
Ningning Xia,
Zhidong Bai
Abstract:
In this paper, we adopt the eigenvector empirical spectral distribution (VESD) to investigate the limiting behavior of eigenvectors of a large dimensional Wigner matrix W_n. In particular, we derive the optimal bound for the rate of convergence of the expected VESD of W_n to the semicircle law, which is of order O(n^{-1/2}) under the assumption of having finite 10th moment. We further show that th…
▽ More
In this paper, we adopt the eigenvector empirical spectral distribution (VESD) to investigate the limiting behavior of eigenvectors of a large dimensional Wigner matrix W_n. In particular, we derive the optimal bound for the rate of convergence of the expected VESD of W_n to the semicircle law, which is of order O(n^{-1/2}) under the assumption of having finite 10th moment. We further show that the convergence rates in probability and almost surely of the VESD are O(n^{-1/4}) and O(n^{-1/6}), respectively, under finite 8th moment condition. Numerical studies demonstrate that the convergence rate does not depend on the choice of unit vector involved in the VESD function, and the best possible bound for the rate of convergence of the VESD is of order O(n^{-1/2}).
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
On the inference about the spectral distribution of high-dimensional covariance matrix based on high-frequency noisy observations
Authors:
Ningning Xia,
Xinghua Zheng
Abstract:
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix a signal-plus-noise sample covariance matrix. Aiming to make inferences about the spectral distribution of the population covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (signal) sample covariance matrice…
▽ More
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix a signal-plus-noise sample covariance matrix. Aiming to make inferences about the spectral distribution of the population covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (signal) sample covariance matrices depends on that of signal-plus-noise-type sample covariance matrices. As an application, we consider inferences about the spectral distribution of integrated covolatility (ICV) matrices of high-dimensional diffusion processes based on high-frequency data with microstructure noise. The (slightly modified) pre-averaging estimator is a signal-plus-noise sample covariance matrix, and the aforementioned result, together with a (generalized) connection between the spectral distribution of signal sample covariance matrices and that of the population covariance matrix, enables us to propose a two-step procedure to consistently estimate the spectral distribution of ICV for a class of diffusion processes. An alternative approach is further proposed, which possesses several desirable properties: it is more robust, it eliminates the effects of microstructure noise, and the asymptotic relationship that enables consistent estimation of the spectral distribution of ICV is the standard Marcenko-Pastur equation. The performance of the two approaches is examined via simulation studies under both synchronous and asynchronous observation settings.
△ Less
Submitted 1 March, 2017; v1 submitted 12 April, 2016;
originally announced April 2016.
-
Mean Square Capacity of Power Constrained Fading Channels with Causal Encoders and Decoders
Authors:
Liang Xu,
Lihua Xie,
Nan Xiao
Abstract:
This paper is concerned with the mean square stabilization problem of discrete-time LTI systems over a power constrained fading channel. Different from existing research works, the channel considered in this paper suffers from both fading and additive noises. We allow any form of causal channel encoders/decoders, unlike linear encoders/decoders commonly studied in the literature. Sufficient condit…
▽ More
This paper is concerned with the mean square stabilization problem of discrete-time LTI systems over a power constrained fading channel. Different from existing research works, the channel considered in this paper suffers from both fading and additive noises. We allow any form of causal channel encoders/decoders, unlike linear encoders/decoders commonly studied in the literature. Sufficient conditions and necessary conditions for the mean square stabilizability are given in terms of channel parameters such as transmission power and fading and additive noise statistics in relation to the unstable eigenvalues of the open-loop system matrix. The corresponding mean square capacity of the power constrained fading channel under causal encoders/decoders is given. It is proved that this mean square capacity is smaller than the corresponding Shannon channel capacity. In the end, numerical examples are presented, which demonstrate that the causal encoders/decoders render less restrictive stabilizability conditions than those under linear encoders/decoders studied in the existing works.
△ Less
Submitted 15 September, 2015;
originally announced September 2015.
-
On the inference about the spectra of high-dimensional covariance matrix based on noisy observations-with applications to integrated covolatility matrix inference in the presence of microstructure noise
Authors:
Ningning Xia,
Xinghua Zheng
Abstract:
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix to be an information-plus-noise-type covariance matrix. Aiming to make inferences about the spectra of the underlying true covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (true) sample covariance matrices…
▽ More
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix to be an information-plus-noise-type covariance matrix. Aiming to make inferences about the spectra of the underlying true covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (true) sample covariance matrices depends on that of information-plus-noise-type sample covariance matrices. As an application, we consider the inference about the spectra of integrated covolatility (ICV) matrices of high-dimensional diffusion processes based on high-frequency data with microstructure noise. The (slightly modified) pre-averaging estimator is an information-plus-noise-type covariance matrix, and the aforementioned result, together with a (generalized) connection between the spectral distribution of true sample covariance matrices and that of the population covariance matrix, enables us to propose a two-step procedure to estimate the spectral distribution of ICV for a class of diffusion processes. An alternative estimator is further proposed, which possesses two desirable properties: it eliminates the impact of microstructure noise, and its limiting spectral distribution depends only on that of the ICV through the standard Marčenko-Pastur equation. Numerical studies demonstrate that our proposed methods can be used to estimate the spectra of the underlying covariance matrix based on noisy observations.
△ Less
Submitted 22 August, 2015; v1 submitted 7 September, 2014;
originally announced September 2014.
-
Interval-based parameter identification for structural static problems
Authors:
Naijia Xiao,
Francesco Fedele,
Rafi Muhanna
Abstract:
We present an interval-based approach for parameter identification in structural static inverse problems. The proposed inverse formulation exploits the Interval Finite Element Method (IFEM) combined with adjoint-based optimization. The inversion consists of a two-step algorithm: first, an estimate of the parameters is obtained by means of a deterministic iterative solver. Then, the algorithm switc…
▽ More
We present an interval-based approach for parameter identification in structural static inverse problems. The proposed inverse formulation exploits the Interval Finite Element Method (IFEM) combined with adjoint-based optimization. The inversion consists of a two-step algorithm: first, an estimate of the parameters is obtained by means of a deterministic iterative solver. Then, the algorithm switches to the interval extension of the previous solver, using the deterministic estimate of the parameters as an initial guess. The iterations are terminated based on a new containment-stopping criterion, which is intrinsic to intervals. Various numerical examples show that the proposed method provides guaranteed interval enclosures of the parameters.
△ Less
Submitted 4 September, 2014; v1 submitted 14 August, 2014;
originally announced August 2014.
-
Convergence rates of eigenvector empirical spectral distribution of large dimensional sample covariance matrix
Authors:
Ningning Xia,
Yingli Qin,
Zhidong Bai
Abstract:
The eigenvector Empirical Spectral Distribution (VESD) is adopted to investigate the limiting behavior of eigenvectors and eigenvalues of covariance matrices. In this paper, we shall show that the Kolmogorov distance between the expected VESD of sample covariance matrix and the Marčenko-Pastur distribution function is of order $O(N^{-1/2})$. Given that data dimension $n$ to sample size $N$ ratio i…
▽ More
The eigenvector Empirical Spectral Distribution (VESD) is adopted to investigate the limiting behavior of eigenvectors and eigenvalues of covariance matrices. In this paper, we shall show that the Kolmogorov distance between the expected VESD of sample covariance matrix and the Marčenko-Pastur distribution function is of order $O(N^{-1/2})$. Given that data dimension $n$ to sample size $N$ ratio is bounded between 0 and 1, this convergence rate is established under finite 10th moment condition of the underlying distribution. It is also shown that, for any fixed $η>0$, the convergence rates of VESD are $O(N^{-1/4})$ in probability and $O(N^{-1/4+η})$ almost surely, requiring finite 8th moment of the underlying distribution.
△ Less
Submitted 22 November, 2013; v1 submitted 20 November, 2013;
originally announced November 2013.