-
Stochastic Diagonal Estimation Based on Matrix Quadratic Form Oracles
Authors:
Haishan Ye,
Xiangyu Chang
Abstract:
We study the problem of estimating the diagonal of an implicitly given matrix $\Ab$. For such a matrix we have access to an oracle that allows us to evaluate the matrix quadratic form $ \ub^\top \Ab \ub$. Based on this query oracle, we propose a stochastic diagonal estimation method with random variable $\ub$ drawn from the standard Gaussian distribution. We provide the element-wise and norm-wise…
▽ More
We study the problem of estimating the diagonal of an implicitly given matrix $\Ab$. For such a matrix we have access to an oracle that allows us to evaluate the matrix quadratic form $ \ub^\top \Ab \ub$. Based on this query oracle, we propose a stochastic diagonal estimation method with random variable $\ub$ drawn from the standard Gaussian distribution. We provide the element-wise and norm-wise sample complexities of the proposed method. Our numerical experiments on different types and dimensions matrices demonstrate the effectiveness of our method and validate the tightness of theoretical results.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Code Retrieval for MILP Instance Generation
Authors:
Tianxing Yang,
Huigen Ye,
Hua Xu
Abstract:
Mixed-Integer Linear Programming (MILP) is widely used in fields such as scheduling, logistics, and planning. Enhancing the performance of MILP solvers, particularly learning-based solvers, requires substantial amounts of high-quality data. However, existing methods for MILP instance generation typically necessitate training a separate model for each problem class and are computationally intensive…
▽ More
Mixed-Integer Linear Programming (MILP) is widely used in fields such as scheduling, logistics, and planning. Enhancing the performance of MILP solvers, particularly learning-based solvers, requires substantial amounts of high-quality data. However, existing methods for MILP instance generation typically necessitate training a separate model for each problem class and are computationally intensive when generating new instances. To address these limitations, we reformulate the MILP Instance Generation task as MILP Code Generation task, enabling efficient, flexible, and interpretable instance generation through code. Since MILP instances generated from code can vary significantly in scale, we introduce MILP-EmbedSim, a new similarity metric that accurately measures the similarity between instances of varying sizes within the same problem class. Leveraging this metric, we propose MILP-Retrieval, a pipeline that retrieves generation code from library to produce MILP instances highly similar to target instance. MILP-Retrieval outperforms baselines in both MILP Code Generation and Instance Generation tasks, provides a novel perspective on MILP instance generation and opens new possibilities for learning-based solvers.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
$f$-vectors and $F$-invariant in generalized cluster algebras
Authors:
Huihui Ye,
Changjian Fu
Abstract:
We establish certain fundamental properties of $f$-vectors and $F$-matrices for generalized cluster algebras, including the initial and final seed mutation formulas, the compatibility property and the symmetry property. Along the way, we also generalize the construction of $F$-invariant for generalized cluster algebras without assuming positivity and prove certain basic properties.
We establish certain fundamental properties of $f$-vectors and $F$-matrices for generalized cluster algebras, including the initial and final seed mutation formulas, the compatibility property and the symmetry property. Along the way, we also generalize the construction of $F$-invariant for generalized cluster algebras without assuming positivity and prove certain basic properties.
△ Less
Submitted 31 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Global Dynamics of Nonlocal Diffusion Systems on Time-Varying Domains
Authors:
Xiandong Lin,
Hailong Ye,
Xiao-Qiang Zhao
Abstract:
We propose a class of nonlocal diffusion systems on time-varying domains, and fully characterize their asymptotic dynamics in the asymptotically fixed, time-periodic and unbounded cases. The kernel is not necessarily symmetric or compactly supported, provoking anisotropic diffusion or convective effects. Due to the nonlocal diffusion on time-varying domains in our systems, some significant challen…
▽ More
We propose a class of nonlocal diffusion systems on time-varying domains, and fully characterize their asymptotic dynamics in the asymptotically fixed, time-periodic and unbounded cases. The kernel is not necessarily symmetric or compactly supported, provoking anisotropic diffusion or convective effects. Due to the nonlocal diffusion on time-varying domains in our systems, some significant challenges arise, such as the lack of regularizing effects of the semigroup generated by the nonlocal operator, as well as the time-dependent inherent coupling structure in kernel. By investigating a general nonautonomous nonlocal diffusion system in the space of bounded and measurable functions, we establish a comprehensive and unified framework to rigorously examine the threshold dynamics of the original system on asymptotically fixed and time-periodic domains. In the case of an asymptotically unbounded domain, we introduce a key auxiliary function to separate vanishing coefficients from nonlocal diffusions. This enables us to construct appropriate sub-solutions and derive the global threshold dynamics via the comparison principle. The findings may be of independent interest and the developed techniques, which do not rely on the existence of the principal eigenvalue, are expected to find further applications in the related nonlocal diffusion problems. We also conduct numerical simulations based on a practical model to illustrate our analytical results.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
An Enhanced Zeroth-Order Stochastic Frank-Wolfe Framework for Constrained Finite-Sum Optimization
Authors:
Haishan Ye,
Yinghui Huang,
Hao Di,
Xiangyu Chang
Abstract:
We propose an enhanced zeroth-order stochastic Frank-Wolfe framework to address constrained finite-sum optimization problems, a structure prevalent in large-scale machine-learning applications. Our method introduces a novel double variance reduction framework that effectively reduces the gradient approximation variance induced by zeroth-order oracles and the stochastic sampling variance from finit…
▽ More
We propose an enhanced zeroth-order stochastic Frank-Wolfe framework to address constrained finite-sum optimization problems, a structure prevalent in large-scale machine-learning applications. Our method introduces a novel double variance reduction framework that effectively reduces the gradient approximation variance induced by zeroth-order oracles and the stochastic sampling variance from finite-sum objectives. By leveraging this framework, our algorithm achieves significant improvements in query efficiency, making it particularly well-suited for high-dimensional optimization tasks. Specifically, for convex objectives, the algorithm achieves a query complexity of O(d \sqrt{n}/ε) to find an epsilon-suboptimal solution, where d is the dimensionality and n is the number of functions in the finite-sum objective. For non-convex objectives, it achieves a query complexity of O(d^{3/2}\sqrt{n}/ε^2 ) without requiring the computation ofd partial derivatives at each iteration. These complexities are the best known among zeroth-order stochastic Frank-Wolfe algorithms that avoid explicit gradient calculations. Empirical experiments on convex and non-convex machine learning tasks, including sparse logistic regression, robust classification, and adversarial attacks on deep networks, validate the computational efficiency and scalability of our approach. Our algorithm demonstrates superior performance in both convergence rate and query complexity compared to existing methods.
△ Less
Submitted 22 January, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Distributionally Robust Fault Detection Trade-off Design with Prior Fault Information
Authors:
Yulin Feng,
Hailang Jin,
Steven X. Ding,
Hao Ye,
Chao Shang
Abstract:
The robustness of fault detection algorithms against uncertainty is crucial in the real-world industrial environment. Recently, a new probabilistic design scheme called distributionally robust fault detection (DRFD) has emerged and received immense interest. Despite its robustness against unknown distributions in practice, current DRFD focuses on the overall detectability of all possible faults ra…
▽ More
The robustness of fault detection algorithms against uncertainty is crucial in the real-world industrial environment. Recently, a new probabilistic design scheme called distributionally robust fault detection (DRFD) has emerged and received immense interest. Despite its robustness against unknown distributions in practice, current DRFD focuses on the overall detectability of all possible faults rather than the detectability of critical faults that are a priori known. Henceforth, a new DRFD trade-off design scheme is put forward in this work by utilizing prior fault information. The key contribution includes a novel distributional robustness metric of detecting a known fault and a new soft distributionally robust chance constraint that ensures robust detectability. Then a new trade-off design scheme of fault detection under unknown probability distributions is proposed, and this offers a flexible balance between the robustness of detecting known critical faults and the overall detectability against all possible faults. To solve the resulting problem, an exact reformulation is derived and a customized solution algorithm is developed, which includes a sequential optimization procedure and an initialization strategy. Finally, case studies on a simulated three-tank system and a real-world battery cell are carried out to showcase the usefulness of our DRFD method.
△ Less
Submitted 11 April, 2025; v1 submitted 28 December, 2024;
originally announced December 2024.
-
NeuralQP: A General Hypergraph-based Optimization Framework for Large-scale QCQPs
Authors:
Zhixiao Xiong,
Fangyu Zong,
Huigen Ye,
Hua Xu
Abstract:
Machine Learning (ML) optimization frameworks have gained attention for their ability to accelerate the optimization of large-scale Quadratically Constrained Quadratic Programs (QCQPs) by learning shared problem structures. However, existing ML frameworks often rely heavily on strong problem assumptions and large-scale solvers. This paper introduces NeuralQP, a general hypergraph-based framework f…
▽ More
Machine Learning (ML) optimization frameworks have gained attention for their ability to accelerate the optimization of large-scale Quadratically Constrained Quadratic Programs (QCQPs) by learning shared problem structures. However, existing ML frameworks often rely heavily on strong problem assumptions and large-scale solvers. This paper introduces NeuralQP, a general hypergraph-based framework for large-scale QCQPs. NeuralQP features two main components: Hypergraph-based Neural Prediction, which generates embeddings and predicted solutions for QCQPs without problem assumptions, and Parallel Neighborhood Optimization, which employs a McCormick relaxation-based repair strategy to identify and correct illegal variables, iteratively improving the solution with a small-scale solver. We further prove that our framework UniEGNN with our hypergraph representation is equivalent to the Interior-Point Method (IPM) for quadratic programming. Experiments on two benchmark problems and large-scale real-world instances from QPLIB demonstrate that NeuralQP outperforms state-of-the-art solvers (e.g., Gurobi and SCIP) in both solution quality and time efficiency, further validating the efficiency of ML optimization frameworks for QCQPs.
△ Less
Submitted 28 September, 2024;
originally announced October 2024.
-
Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient
Authors:
Hao Di,
Haishan Ye,
Yueling Zhang,
Xiangyu Chang,
Guang Dai,
Ivor W. Tsang
Abstract:
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require…
▽ More
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + κ)\log (\frac{1}ε))$ SZO query complexity in the strongly convex and smooth setting, where $κ$ represents the condition number and $ε$ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Bounded geometry for PCF-special subvarieties
Authors:
Laura DeMarco,
Niki Myrto Mavraki,
Hexi Ye
Abstract:
For each integer $d\geq 2$, let $M_d$ denote the moduli space of maps $f: \mathbb{P}^1\to \mathbb{P}^1$ of degree $d$. We study the geometric configurations of subsets of postcritically finite (or PCF) maps in $M_d$. A complex-algebraic subvariety $Y \subset M_d$ is said to be PCF-special if it contains a Zariski-dense set of PCF maps. Here we prove that there are only finitely many positive-dimen…
▽ More
For each integer $d\geq 2$, let $M_d$ denote the moduli space of maps $f: \mathbb{P}^1\to \mathbb{P}^1$ of degree $d$. We study the geometric configurations of subsets of postcritically finite (or PCF) maps in $M_d$. A complex-algebraic subvariety $Y \subset M_d$ is said to be PCF-special if it contains a Zariski-dense set of PCF maps. Here we prove that there are only finitely many positive-dimensional irreducible PCF-special subvarieties in $M_d$ with degree $\leq D$. In addition, there exist constants $N = N(D,d)$ and $B = B(D,d)$ so that for any complex algebraic subvariety $X \subset M_d$ of degree $\leq D$, the Zariski closure $\overline{X\cap\mathrm{PCF}}~$ has at most $N$ irreducible components, each with degree $\leq B$. We also prove generalizations of these results for points with small critical height in $M_d(\bar{\mathbb{Q}})$.
△ Less
Submitted 4 November, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity
Authors:
Qihao Zhou,
Haishan Ye,
Luo Luo
Abstract:
This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within commun…
▽ More
This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(δD^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}δD^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $δ$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $μ$-strongly-convex-$μ$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(δ/μ\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}δ/μ)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)/μ)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Anderson Acceleration Without Restart: A Novel Method with $n$-Step Super Quadratic Convergence Rate
Authors:
Haishan Ye,
Dachao Lin,
Xiangyu Chang,
Zhihua Zhang
Abstract:
In this paper, we propose a novel Anderson's acceleration method to solve nonlinear equations, which does \emph{not} require a restart strategy to achieve numerical stability. We propose the greedy and random versions of our algorithm. Specifically, the greedy version selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix. In contrast, the ran…
▽ More
In this paper, we propose a novel Anderson's acceleration method to solve nonlinear equations, which does \emph{not} require a restart strategy to achieve numerical stability. We propose the greedy and random versions of our algorithm. Specifically, the greedy version selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix. In contrast, the random version chooses the random Gaussian vector as the direction to update the approximate Jacobian. Furthermore, our algorithm, including both greedy and random versions, has an $n$-step super quadratic convergence rate, where $n$ is the dimension of the objective problem. For example, the explicit convergence rate of the random version can be presented as $ \norm{\vx_{k+n+1} - \vx_*} / \norm{\vx_k- \vx_*}^2 = \cO\left(\left(1-\frac{1}{n}\right)^{kn}\right)$ for any $k\geq 0$ where $\vx_*$ is the optimum of the objective problem. This kind of convergence rate is new to Anderson's acceleration and quasi-Newton methods. The experiments also validate the fast convergence rate of our algorithm.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
On $F$-Polynomials for Generalized Quantum Cluster Algebras and Gupta's Formula
Authors:
Changjian Fu,
Liangang Peng,
Huihui Ye
Abstract:
We show the polynomial property of $F$-polynomials for generalized quantum cluster algebras and obtain the associated separation formulas under a mild condition. Along the way, we obtain Gupta's formulas of $F$-polynomials for generalized quantum cluster algebras. These formulas specialize to Gupta's formulas for quantum cluster algebras and cluster algebras respectively. Finally, a generalization…
▽ More
We show the polynomial property of $F$-polynomials for generalized quantum cluster algebras and obtain the associated separation formulas under a mild condition. Along the way, we obtain Gupta's formulas of $F$-polynomials for generalized quantum cluster algebras. These formulas specialize to Gupta's formulas for quantum cluster algebras and cluster algebras respectively. Finally, a generalization of Gupta's formula has also been discussed in the setting of generalized cluster algebras.
△ Less
Submitted 3 September, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
The Restricted Edge-Connectivity of Strong Product Graphs
Authors:
Hazhe Ye,
Yingzhi Tian
Abstract:
The restricted edge-connectivity of a connected graph $G$, denoted by $λ^{\prime}(G)$, if it exists, is the minimum cardinality of a set of edges whose deletion makes $G$ disconnected and each component with at least 2 vertices. It was proved that if $G$ is not a star and $|V(G)|\geq4$, then $λ^{\prime}(G)$ exists and $λ^{\prime}(G)\leqξ(G)$, where $ξ(G)$ is the minimum edge-degree of $G$. Thus a…
▽ More
The restricted edge-connectivity of a connected graph $G$, denoted by $λ^{\prime}(G)$, if it exists, is the minimum cardinality of a set of edges whose deletion makes $G$ disconnected and each component with at least 2 vertices. It was proved that if $G$ is not a star and $|V(G)|\geq4$, then $λ^{\prime}(G)$ exists and $λ^{\prime}(G)\leqξ(G)$, where $ξ(G)$ is the minimum edge-degree of $G$. Thus a graph $G$ is called maximally restricted edge-connected if $λ^{\prime}(G)=ξ(G)$; and a graph $G$ is called super restricted edge-connected if each minimum restricted edge-cut isolates an edge of $G$. The strong product of graphs $G$ and $H$, denoted by $G\boxtimes H$, is the graph with vertex set $V(G)\times V(H)$ and edge set $\{(x_1,y_1)(x_2,y_2)\ |\ x_1=x_2$ and $y_1y_2\in E(H)$; or $y_1=y_2$ and $x_1x_2\in E(G)$; or $x_1x_2\in E(G)$ and $y_1y_2\in E(H)$\}. In this paper, we determine, for any nontrivial connected graph $G$, the restricted edge-connectivity of $G\boxtimes P_n$, $G\boxtimes C_n$ and $G\boxtimes K_n$, where $P_n$, $C_n$ and $K_n$ are the path, the cycle and the complete graph on $n$ vertices, respectively. As corollaries, we give sufficient conditions for these strong product graphs $G\boxtimes P_n$, $G\boxtimes C_n$ and $G\boxtimes K_n$ to be maximally restricted edge-connected and super restricted edge-connected.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Poisson approximation for stochastic processes summed over amenable groups
Authors:
Haoyu Ye,
Peter Orbanz,
Morgane Austern
Abstract:
We generalize the Poisson limit theorem to binary functions of random objects whose law is invariant under the action of an amenable group. Examples include stationary random fields, exchangeable sequences, and exchangeable graphs. A celebrated result of E. Lindenstrauss shows that normalized sums over certain increasing subsets of such groups approximate expectations. Our results clarify that the…
▽ More
We generalize the Poisson limit theorem to binary functions of random objects whose law is invariant under the action of an amenable group. Examples include stationary random fields, exchangeable sequences, and exchangeable graphs. A celebrated result of E. Lindenstrauss shows that normalized sums over certain increasing subsets of such groups approximate expectations. Our results clarify that the corresponding unnormalized sums of binary statistics are asymptotically Poisson, provided suitable mixing conditions hold. They extend further to randomly subsampled sums and also show that strict invariance of the distribution is not needed if the requisite mixing condition defined by the group holds. We illustrate the results with applications to random fields, Cayley graphs, and Poisson processes on groups.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Optimal Decentralized Composite Optimization for Convex Functions
Authors:
Haishan Ye,
Xiangyu Chang
Abstract:
In this paper, we focus on the decentralized composite optimization for convex functions. Because of advantages such as robust to the network and no communication bottle-neck in the central server, the decentralized optimization has attracted much research attention in signal processing, control, and optimization communities. Many optimal algorithms have been proposed for the objective function is…
▽ More
In this paper, we focus on the decentralized composite optimization for convex functions. Because of advantages such as robust to the network and no communication bottle-neck in the central server, the decentralized optimization has attracted much research attention in signal processing, control, and optimization communities. Many optimal algorithms have been proposed for the objective function is smooth and (strongly)-convex in the past years. However, it is still an open question whether one can design an optimal algorithm when there is a non-smooth regularization term. In this paper, we fill the gap between smooth decentralized optimization and decentralized composite optimization and propose the first algorithm which can achieve both the optimal computation and communication complexities. Our experiments also validate the effectiveness and efficiency of our algorithm both in computation and communication.
△ Less
Submitted 12 July, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Characterizing the forbidden pairs for graphs to be super-edge-connected
Authors:
Hazhe Ye,
Yingzhi Tian
Abstract:
Let $\mathcal{H}$ be a set of given connected graphs. A graph $G$ is said to be $\mathcal{H}$-free if $G$ contains no $H$ as an induced subgraph for any $H\in \mathcal{H}$. The graph $G$ is super-edge-connected if each minimum edge-cut isolates a vertex in $G$. In this paper, except for some special graphs, we characterize all forbidden subgraph sets $\mathcal{H}$ such that every $\mathcal{H}$-fre…
▽ More
Let $\mathcal{H}$ be a set of given connected graphs. A graph $G$ is said to be $\mathcal{H}$-free if $G$ contains no $H$ as an induced subgraph for any $H\in \mathcal{H}$. The graph $G$ is super-edge-connected if each minimum edge-cut isolates a vertex in $G$. In this paper, except for some special graphs, we characterize all forbidden subgraph sets $\mathcal{H}$ such that every $\mathcal{H}$-free is super-edge-connected for $|\mathcal{H}|=1$ and $2$.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold
Authors:
Jun Chen,
Haishan Ye,
Mengmeng Wang,
Tianxin Huang,
Guang Dai,
Ivor W. Tsang,
Yong Liu
Abstract:
The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios.…
▽ More
The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and the communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.
△ Less
Submitted 12 March, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Mirror Natural Evolution Strategies
Authors:
Haishan Ye
Abstract:
The zeroth-order optimization has been widely used in machine learning applications. However, the theoretical study of the zeroth-order optimization focus on the algorithms which approximate (first-order) gradients using (zeroth-order) function value difference at a random direction. The theory of algorithms which approximate the gradient and Hessian information by zeroth-order queries is much les…
▽ More
The zeroth-order optimization has been widely used in machine learning applications. However, the theoretical study of the zeroth-order optimization focus on the algorithms which approximate (first-order) gradients using (zeroth-order) function value difference at a random direction. The theory of algorithms which approximate the gradient and Hessian information by zeroth-order queries is much less studied. In this paper, we focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries. We first propose a novel reparameterized objective function with parameters $(μ, Σ)$. This reparameterized objective function achieves its optimum at the minimizer and the Hessian inverse of the original objective function respectively, but with small perturbations. Accordingly, we propose a new algorithm to minimize our proposed reparameterized objective, which we call \texttt{MiNES} (mirror descent natural evolution strategy). We show that the estimated covariance matrix of \texttt{MiNES} converges to the inverse of Hessian matrix of the objective function with a convergence rate $\widetilde{\mathcal{O}}(1/k)$, where $k$ is the iteration number and $\widetilde{\mathcal{O}}(\cdot)$ hides the constant and $\log$ terms. We also provide the explicit convergence rate of \texttt{MiNES} and how the covariance matrix promotes the convergence rate.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Accelerated Nonconvex ADMM with Self-Adaptive Penalty for Rank-Constrained Model Identification
Authors:
Qingyuan Liu,
Zhengchao Huang,
Hao Ye,
Dexian Huang,
Chao Shang
Abstract:
The alternating direction method of multipliers (ADMM) has been widely adopted in low-rank approximation and low-order model identification tasks; however, the performance of nonconvex ADMM is highly reliant on the choice of penalty parameter. To accelerate ADMM for solving rank-constrained identification problems, this paper proposes a new self-adaptive strategy for automatic penalty update. Guid…
▽ More
The alternating direction method of multipliers (ADMM) has been widely adopted in low-rank approximation and low-order model identification tasks; however, the performance of nonconvex ADMM is highly reliant on the choice of penalty parameter. To accelerate ADMM for solving rank-constrained identification problems, this paper proposes a new self-adaptive strategy for automatic penalty update. Guided by first-order analysis of the increment of the augmented Lagrangian, the self-adaptive penalty updating enables effective and balanced minimization of both primal and dual residuals and thus ensures a stable convergence. Moreover, improved efficiency can be obtained within the Anderson acceleration scheme. Numerical examples show that the proposed strategy significantly accelerates the convergence of nonconvex ADMM while alleviating the critical reliance on tedious tuning of penalty parameters.
△ Less
Submitted 8 September, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis
Authors:
Dachao Lin,
Yuze Han,
Haishan Ye,
Zhihua Zhang
Abstract:
We study finite-sum distributed optimization problems involving a master node and $n-1$ local nodes under the popular $δ$-similarity and $μ$-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of…
▽ More
We study finite-sum distributed optimization problems involving a master node and $n-1$ local nodes under the popular $δ$-similarity and $μ$-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of $\tilde{\mathcal{O}}(n {+} \sqrt{n}δ/μ)$ compared to existing non-accelerated algorithms. Applying the framework proposed in Katyusha X, we also develop a directly accelerated version named AccSVRS with the $\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{δ/μ})$ communication complexity. In contrast to existing results, our complexity bounds are entirely smoothness-free and exhibit superiority in ill-conditioned cases. Furthermore, we establish a nearly matched lower bound to verify the tightness of our AccSVRS method.
△ Less
Submitted 30 October, 2023; v1 submitted 15 April, 2023;
originally announced April 2023.
-
Snap-Shot Decentralized Stochastic Gradient Tracking Methods
Authors:
Haishan Ye,
Xiangyu Chang
Abstract:
In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (\texttt{SGD}) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Di…
▽ More
In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (\texttt{SGD}) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking~(\texttt{DSGT})~\citep{pu2021distributed} has been recognized as the popular and state-of-the-art decentralized \texttt{SGD} method due to its proper theoretical guarantees. However, the theoretical analysis of \dsgt~\citep{koloskova2021improved} shows that its iteration complexity is $\tilde{\mathcal{O}} \left(\frac{\barσ^2}{mμ\varepsilon} + \frac{\sqrt{L}\barσ}{μ(1 - λ_2(W))^{1/2} C_W \sqrt{\varepsilon} }\right)$, where $W$ is a double stochastic mixing matrix that presents the network topology and $ C_W $ is a parameter that depends on $W$. Thus, it indicates that the convergence property of \texttt{DSGT} is heavily affected by the topology of the communication network. To overcome the weakness of \texttt{DSGT}, we resort to the snap-shot gradient tracking skill and propose two novel algorithms. We further justify that the proposed two algorithms are more robust to the topology of communication networks under similar algorithmic structures and the same communication strategy to \dsgt~. Compared with \dsgt, their iteration complexity are $\mathcal{O}\left( \frac{\barσ^2}{mμ\varepsilon} + \frac{\sqrt{L}\barσ}{μ(1 - λ_2(W))\sqrt{\varepsilon}} \right)$ and $\mathcal{O}\left( \frac{\barσ^2}{mμ\varepsilon} + \frac{\sqrt{L}\barσ}{μ(1 - λ_2(W))^{1/2}\sqrt{\varepsilon}} \right)$ which reduce the impact on network topology (no $C_W$).
△ Less
Submitted 10 December, 2022;
originally announced December 2022.
-
An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization
Authors:
Lesi Chen,
Haishan Ye,
Luo Luo
Abstract:
This paper studies the stochastic nonconvex-strongly-concave minimax optimization over a multi-agent network. We propose an efficient algorithm, called Decentralized Recursive gradient descEnt Ascent Method (DREAM), which achieves the best-known theoretical guarantee for finding the $ε$-stationary points. Concretely, it requires $\mathcal{O}(\min (κ^3ε^{-3},κ^2 \sqrt{N} ε^{-2} ))$ stochastic first…
▽ More
This paper studies the stochastic nonconvex-strongly-concave minimax optimization over a multi-agent network. We propose an efficient algorithm, called Decentralized Recursive gradient descEnt Ascent Method (DREAM), which achieves the best-known theoretical guarantee for finding the $ε$-stationary points. Concretely, it requires $\mathcal{O}(\min (κ^3ε^{-3},κ^2 \sqrt{N} ε^{-2} ))$ stochastic first-order oracle (SFO) calls and $\tilde{\mathcal{O}}(κ^2 ε^{-2})$ communication rounds, where $κ$ is the condition number and $N$ is the total number of individual functions. Our numerical experiments also validate the superiority of DREAM over previous methods.
△ Less
Submitted 14 May, 2024; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Adaptive Constraint Partition based Optimization Framework for Large-scale Integer Linear Programming(Student Abstract)
Authors:
Huigen Ye,
Hongyan Wang,
Hua Xu,
Chengming Wang,
Yu Jiang
Abstract:
Integer programming problems (IPs) are challenging to be solved efficiently due to the NP-hardness, especially for large-scale IPs. To solve this type of IPs, Large neighborhood search (LNS) uses an initial feasible solution and iteratively improves it by searching a large neighborhood around the current solution. However, LNS easily steps into local optima and ignores the correlation between vari…
▽ More
Integer programming problems (IPs) are challenging to be solved efficiently due to the NP-hardness, especially for large-scale IPs. To solve this type of IPs, Large neighborhood search (LNS) uses an initial feasible solution and iteratively improves it by searching a large neighborhood around the current solution. However, LNS easily steps into local optima and ignores the correlation between variables to be optimized, leading to compromised performance. This paper presents a general adaptive constraint partition-based optimization framework (ACP) for large-scale IPs that can efficiently use any existing optimization solver as a subroutine. Specifically, ACP first randomly partitions the constraints into blocks, where the number of blocks is adaptively adjusted to avoid local optima. Then, ACP uses a subroutine solver to optimize the decision variables in a randomly selected block of constraints to enhance the variable correlation. ACP is compared with LNS framework with different subroutine solvers on four IPs and a real-world IP. The experimental results demonstrate that in specified wall-clock time ACP shows better performance than SCIP and Gurobi.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
A Unified Analysis of Multi-task Functional Linear Regression Models with Manifold Constraint and Composite Quadratic Penalty
Authors:
Shiyuan He,
Hanxuan Ye,
Kejun He
Abstract:
This work studies the multi-task functional linear regression models where both the covariates and the unknown regression coefficients (called slope functions) are curves. For slope function estimation, we employ penalized splines to balance bias, variance, and computational complexity. The power of multi-task learning is brought in by imposing additional structures over the slope functions. We pr…
▽ More
This work studies the multi-task functional linear regression models where both the covariates and the unknown regression coefficients (called slope functions) are curves. For slope function estimation, we employ penalized splines to balance bias, variance, and computational complexity. The power of multi-task learning is brought in by imposing additional structures over the slope functions. We propose a general model with double regularization over the spline coefficient matrix: i) a matrix manifold constraint, and ii) a composite penalty as a summation of quadratic terms. Many multi-task learning approaches can be treated as special cases of this proposed model, such as a reduced-rank model and a graph Laplacian regularized model. We show the composite penalty induces a specific norm, which helps to quantify the manifold curvature and determine the corresponding proper subset in the manifold tangent space. The complexity of tangent space subset is then bridged to the complexity of geodesic neighbor via generic chaining. A unified convergence upper bound is obtained and specifically applied to the reduced-rank model and the graph Laplacian regularized model. The phase transition behaviors for the estimators are examined as we vary the configurations of model parameters.
△ Less
Submitted 31 July, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization
Authors:
Luo Luo,
Yunyan Bai,
Lesi Chen,
Yuxing Liu,
Haishan Ye
Abstract:
We study the decentralized optimization problem $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{m}\sum_{i=1}^m f_i({\bf x})$, where the local function on the $i$-th agent has the form of $f_i({\bf x})\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}({\bf x})$ and every individual $f_{i,j}$ is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic…
▽ More
We study the decentralized optimization problem $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{m}\sum_{i=1}^m f_i({\bf x})$, where the local function on the $i$-th agent has the form of $f_i({\bf x})\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}({\bf x})$ and every individual $f_{i,j}$ is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an $ε$-stationary point at each agent with the communication rounds of $\tilde{\mathcal O}(Lε^{-2}/\sqrtγ\,)$, the computation rounds of $\tilde{\mathcal O}(n+(L+\min\{nL, \sqrt{n/m}\bar L\})ε^{-2})$, and the local incremental first-oracle calls of ${\mathcal O}(mn + {\min\{mnL, \sqrt{mn}\bar L\}}{ε^{-2}})$, where $L$ is the smoothness parameter of the objective function, $\bar L$ is the mean-squared smoothness parameter of all individual functions, and $γ$ is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters $L$ and $\bar L$ used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-Łojasiewicz condition, also achieving the near-optimal complexity bounds.
△ Less
Submitted 11 January, 2025; v1 submitted 25 October, 2022;
originally announced October 2022.
-
A Modern Theory for High-dimensional Cox Regression Models
Authors:
Xianyang Zhang,
Huijuan Zhou,
Hanxuan Ye
Abstract:
The proportional hazards model has been extensively used in many fields such as biomedicine to estimate and perform statistical significance testing on the effects of covariates influencing the survival time of patients. The classical theory of maximum partial-likelihood estimation (MPLE) is used by most software packages to produce inference, e.g., the coxph function in R and the PHREG procedure…
▽ More
The proportional hazards model has been extensively used in many fields such as biomedicine to estimate and perform statistical significance testing on the effects of covariates influencing the survival time of patients. The classical theory of maximum partial-likelihood estimation (MPLE) is used by most software packages to produce inference, e.g., the coxph function in R and the PHREG procedure in SAS. In this paper, we investigate the asymptotic behavior of the MPLE in the regime in which the number of parameters p is of the same order as the number of samples n. The main results are (i) existence of the MPLE undergoes a sharp 'phase transition'; (ii) the classical MPLE theory leads to invalid inference in the high-dimensional regime. We show that the asymptotic behavior of the MPLE is governed by a new asymptotic theory. These findings are further corroborated through numerical studies. The main technical tool in our proofs is the Convex Gaussian Min-max Theorem (CGMT), which has not been previously used in the analysis of partial likelihood. Our results thus extend the scope of CGMT and shed new light on the use of CGMT for examining the existence of MPLE and non-separable objective functions.
△ Less
Submitted 3 April, 2022;
originally announced April 2022.
-
Decentralized Stochastic Variance Reduced Extragradient Method
Authors:
Luo Luo,
Haishan Ye
Abstract:
This paper studies decentralized convex-concave minimax optimization problems of the form $\min_x\max_y f(x,y) \triangleq\frac{1}{m}\sum_{i=1}^m f_i(x,y)$, where $m$ is the number of agents and each local function can be written as $f_i(x,y)=\frac{1}{n}\sum_{j=1}^n f_{i,j}(x,y)$. We propose a novel decentralized optimization algorithm, called multi-consensus stochastic variance reduced extragradie…
▽ More
This paper studies decentralized convex-concave minimax optimization problems of the form $\min_x\max_y f(x,y) \triangleq\frac{1}{m}\sum_{i=1}^m f_i(x,y)$, where $m$ is the number of agents and each local function can be written as $f_i(x,y)=\frac{1}{n}\sum_{j=1}^n f_{i,j}(x,y)$. We propose a novel decentralized optimization algorithm, called multi-consensus stochastic variance reduced extragradient, which achieves the best known stochastic first-order oracle (SFO) complexity for this problem. Specifically, each agent requires $\mathcal O((n+κ\sqrt{n})\log(1/\varepsilon))$ SFO calls for strongly-convex-strongly-concave problem and $\mathcal O((n+\sqrt{n}L/\varepsilon)\log(1/\varepsilon))$ SFO call for general convex-concave problem to achieve $\varepsilon$-accurate solution in expectation, where $κ$ is the condition number and $L$ is the smoothness parameter. The numerical experiments show the proposed method performs better than baselines.
△ Less
Submitted 13 February, 2022; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Authors:
Rui Pan,
Haishan Ye,
Tong Zhang
Abstract:
Learning rate schedulers have been widely adopted in training deep neural networks. Despite their practical importance, there is a discrepancy between its practice and its theoretical analysis. For instance, it is not known what schedules of SGD achieve best convergence, even for simple problems such as optimizing quadratic objectives. In this paper, we propose Eigencurve, the first family of lear…
▽ More
Learning rate schedulers have been widely adopted in training deep neural networks. Despite their practical importance, there is a discrepancy between its practice and its theoretical analysis. For instance, it is not known what schedules of SGD achieve best convergence, even for simple problems such as optimizing quadratic objectives. In this paper, we propose Eigencurve, the first family of learning rate schedules that can achieve minimax optimal convergence rates (up to a constant) for SGD on quadratic objectives when the eigenvalue distribution of the underlying Hessian matrix is skewed. The condition is quite common in practice. Experimental results show that Eigencurve can significantly outperform step decay in image classification tasks on CIFAR-10, especially when the number of epochs is small. Moreover, the theory inspires two simple learning rate schedulers for practical applications that can approximate eigencurve. For some problems, the optimal shape of the proposed schedulers resembles that of cosine decay, which sheds light to the success of cosine decay for such situations. For other situations, the proposed schedulers are superior to cosine decay.
△ Less
Submitted 14 June, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Greedy and Random Broyden's Methods with Explicit Superlinear Convergence Rates in Nonlinear Equations
Authors:
Haishan Ye,
Dachao Lin,
Zhihua Zhang
Abstract:
In this paper, we propose the greedy and random Broyden's method for solving nonlinear equations. Specifically, the greedy method greedily selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix, while the random method randomly chooses a direction. We establish explicit (local) superlinear convergence rates of both methods if the initial point…
▽ More
In this paper, we propose the greedy and random Broyden's method for solving nonlinear equations. Specifically, the greedy method greedily selects the direction to maximize a certain measure of progress for approximating the current Jacobian matrix, while the random method randomly chooses a direction. We establish explicit (local) superlinear convergence rates of both methods if the initial point and approximate Jacobian are close enough to a solution and corresponding Jacobian. Our two novel variants of Broyden's method enjoy two important advantages that the approximate Jacobians of our algorithms will converge to the exact ones and the convergence rates of our algorithms are asymptotically faster than the original Broyden's method. Our work is the first time to achieve such two advantages theoretically. Our experiments also empirically validate the advantages of our algorithms.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
Explicit Superlinear Convergence Rates of Broyden's Methods in Nonlinear Equations
Authors:
Dachao Lin,
Haishan Ye,
Zhihua Zhang
Abstract:
In this paper, we study the explicit superlinear convergence rates of quasi-Newton methods. We particularly focus on the classical Broyden's method for solving nonlinear equations. We establish its explicit (local) superlinear convergence rate when the initial point is close enough to a solution and the initial Jacobian approximation is also close enough to the exact Jacobian related to the soluti…
▽ More
In this paper, we study the explicit superlinear convergence rates of quasi-Newton methods. We particularly focus on the classical Broyden's method for solving nonlinear equations. We establish its explicit (local) superlinear convergence rate when the initial point is close enough to a solution and the initial Jacobian approximation is also close enough to the exact Jacobian related to the solution. Our results present the explicit superlinear convergence rates of Broyden's "good" and "bad" update schemes. These explicit convergence rates in turn provide some important insights on the performance difference between the "good" and "bad" schemes, which are also validated empirically.
△ Less
Submitted 10 September, 2022; v1 submitted 4 September, 2021;
originally announced September 2021.
-
From Generalized Gauss Bounds to Distributionally Robust Fault Detection with Unimodality Information
Authors:
Chao Shang,
Hao Ye,
Dexian Huang,
Steven X. Ding
Abstract:
Probabilistic methods have attracted much interest in fault detection design, but its need for complete distributional knowledge is seldomly fulfilled. This has spurred endeavors in distributionally robust fault detection (DRFD) design, which secures robustness against inexact distributions by using moment-based ambiguity sets as a prime modelling tool. However, with the worst-case distribution be…
▽ More
Probabilistic methods have attracted much interest in fault detection design, but its need for complete distributional knowledge is seldomly fulfilled. This has spurred endeavors in distributionally robust fault detection (DRFD) design, which secures robustness against inexact distributions by using moment-based ambiguity sets as a prime modelling tool. However, with the worst-case distribution being implausibly discrete, the resulting design suffers from over-pessimisim and can mask the true fault. This paper aims at developing a new DRFD design scheme with reduced conservatism, by assuming unimodality of the true distribution, a property commonly encountered in real-life practice. To tackle the chance constraint on false alarms, we first attain a new generalized Gauss bound on the probability outside an ellipsoid, which is less conservative than known Chebyshev bounds. As a result, analytical solutions to DRFD design problems are obtained, which are less conservative than known ones disregarding unimodality. We further encode bounded support information into ambiguity sets, derive a tightened multivariate Gauss bound, and develop approximate reformulations of design problems as convex programs. Moreover, the derived generalized Gauss bounds are broadly applicable to versatile change detection tasks for setting alarm thresholds. Results on a laborotary system shown that, the incorporation of unimodality information helps reducing conservatism of distributionally robust design and leads to a better tradeoff between robustness and sensitivity.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Explicit Superlinear Convergence Rates of The SR1 Algorithm
Authors:
Haishan Ye,
Dachao Lin,
Zhihua Zhang,
Xiangyu Chang
Abstract:
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm which has wide applications in different scenarios. Although it has been extensively investigated, SR1 still lacks a non-asymptotic superlinear rate compared with other quasi-Newton methods such as DFP and BFGS. In this paper we address this problem. Inspired by the recent work on explicit convergence analysis of quasi-Ne…
▽ More
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm which has wide applications in different scenarios. Although it has been extensively investigated, SR1 still lacks a non-asymptotic superlinear rate compared with other quasi-Newton methods such as DFP and BFGS. In this paper we address this problem. Inspired by the recent work on explicit convergence analysis of quasi-Newton methods, we obtain the first explicit non-asymptotic rates of superlinear convergence for the vanilla SR1 methods with correction strategy to achieve the numerical stability. Specifically, the vanilla SR1 with the correction strategy achieves the rates of the form $\left(\frac{4n\ln(eκ) }{k}\right)^{k/2}$ for general smooth strongly-convex functions where $k$ is the iteration counter, $κ$ is the condition number of the objective function and $n$ is the dimension of the problem. For the quadratic function, the vanilla SR1 algorithm can find the optima of the objective function at most $n$ steps.
△ Less
Submitted 3 June, 2021; v1 submitted 15 May, 2021;
originally announced May 2021.
-
Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods
Authors:
Dachao Lin,
Haishan Ye,
Zhihua Zhang
Abstract:
Optimization is important in machine learning problems, and quasi-Newton methods have a reputation as the most efficient numerical schemes for smooth unconstrained optimization. In this paper, we consider the explicit superlinear convergence rates of quasi-Newton methods and address two open problems mentioned by Rodomanov and Nesterov. First, we extend Rodomanov and Nesterov's results to random q…
▽ More
Optimization is important in machine learning problems, and quasi-Newton methods have a reputation as the most efficient numerical schemes for smooth unconstrained optimization. In this paper, we consider the explicit superlinear convergence rates of quasi-Newton methods and address two open problems mentioned by Rodomanov and Nesterov. First, we extend Rodomanov and Nesterov's results to random quasi-Newton methods, which include common DFP, BFGS, SR1 methods. Such random methods adopt a random direction for updating the approximate Hessian matrix in each iteration. Second, we focus on the specific quasi-Newton methods: SR1 and BFGS methods. We provide improved versions of greedy and random methods with provable better explicit (local) superlinear convergence rates. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth, and strongly self-concordant functions.
△ Less
Submitted 10 September, 2022; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Bounded weak and strong time periodic solutions to a three-dimensional chemotaxis-Stokes model with porous medium diffusion
Authors:
Hailong Ye,
Chunhua Jin
Abstract:
In this paper, we study the time periodic problem to a three-dimensional chemotaxis-Stokes model with porous medium diffusion $Δn^m$ and inhomogeneous mixed boundary conditions. By using a double-level approximation method and some iterative techniques, we obtain the existence and time-space uniform boundedness of weak time periodic solutions for any $m>1$. Moreover, we improve the regularity for…
▽ More
In this paper, we study the time periodic problem to a three-dimensional chemotaxis-Stokes model with porous medium diffusion $Δn^m$ and inhomogeneous mixed boundary conditions. By using a double-level approximation method and some iterative techniques, we obtain the existence and time-space uniform boundedness of weak time periodic solutions for any $m>1$. Moreover, we improve the regularity for $m\le\frac{4}{3}$ and show that the obtained periodic solutions are in fact strong periodic solutions.
△ Less
Submitted 12 April, 2021; v1 submitted 7 March, 2021;
originally announced March 2021.
-
Stationary Distribution Convergence of the Offered Waiting Processes in Heavy Traffic under General Patience Time Scaling
Authors:
Chihoon Lee,
Amy R. Ward,
Heng-Qing Ye
Abstract:
We study a sequence of single server queues with customer abandonment (GI/GI/1+GI) under heavy traffic. The patience time distributions vary with the sequence, which allows for a wider scope of applications. It is known ([20, 18]) that the sequence of scaled offered waiting time processes converges weakly to a reflecting diffusion process with non-linear drift, as the traffic intensity approaches…
▽ More
We study a sequence of single server queues with customer abandonment (GI/GI/1+GI) under heavy traffic. The patience time distributions vary with the sequence, which allows for a wider scope of applications. It is known ([20, 18]) that the sequence of scaled offered waiting time processes converges weakly to a reflecting diffusion process with non-linear drift, as the traffic intensity approaches one. In this paper, we further show that the sequence of stationary distributions and moments of the offered waiting times, with diffusion scaling, converge to those of the limit diffusion process. This justifies the stationary performance of the diffusion limit as a valid approximation for the stationary performance of the GI/GI/1+GI queue. Consequently, we also derive the approximation for the abandonment probability for the GI/GI/1+GI queue in the stationary state.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
DeEPCA: Decentralized Exact PCA with Linear Convergence Rate
Authors:
Haishan Ye,
Tong Zhang
Abstract:
Due to the rapid growth of smart agents such as weakly connected computational nodes and sensors, developing decentralized algorithms that can perform computations on local agents becomes a major research direction. This paper considers the problem of decentralized Principal components analysis (PCA), which is a statistical method widely used for data analysis. We introduce a technique called subs…
▽ More
Due to the rapid growth of smart agents such as weakly connected computational nodes and sensors, developing decentralized algorithms that can perform computations on local agents becomes a major research direction. This paper considers the problem of decentralized Principal components analysis (PCA), which is a statistical method widely used for data analysis. We introduce a technique called subspace tracking to reduce the communication cost, and apply it to power iterations. This leads to a decentralized PCA algorithm called \texttt{DeEPCA}, which has a convergence rate similar to that of the centralized PCA, while achieving the best communication complexity among existing decentralized PCA algorithms. \texttt{DeEPCA} is the first decentralized PCA algorithm with the number of communication rounds for each power iteration independent of target precision. Compared to existing algorithms, the proposed method is easier to tune in practice, with an improved overall communication cost. Our experiments validate the advantages of \texttt{DeEPCA} empirically.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction
Authors:
Haishan Ye,
Wei Xiong,
Tong Zhang
Abstract:
This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance-reduction proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that…
▽ More
This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance-reduction proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that algorithms under this framework achieve convergence rates similar to that of their centralized counterparts. We also describe and analyze two representative algorithms, PMGT-SAGA and PMGT-LSVRG, and compare them to existing state-of-the-art proximal algorithms. To the best of our knowledge, PMGT-VR is the first linearly convergent decentralized stochastic algorithm that can solve decentralized composite optimization problems. Numerical experiments are provided to demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 5 June, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization
Authors:
Yimin Huang,
Yujun Li,
Hanrong Ye,
Zhenguo Li,
Zhihua Zhang
Abstract:
The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space. In this paper, we propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyperparameter search evaluation. It evaluates the potential of hyperparameters by th…
▽ More
The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space. In this paper, we propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyperparameter search evaluation. It evaluates the potential of hyperparameters by the sub-samples of observations and is theoretically proved to be optimal under the criterion of cumulative regret. We further combine SS with Bayesian Optimization and develop a novel hyperparameter optimization algorithm called BOSS. Empirical studies validate our theoretical arguments of SS and demonstrate the superior performance of BOSS on a number of applications, including Neural Architecture Search (NAS), Data Augmentation (DA), Object Detection (OD), and Reinforcement Learning (RL).
△ Less
Submitted 16 December, 2020; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Multi-consensus Decentralized Accelerated Gradient Descent
Authors:
Haishan Ye,
Luo Luo,
Ziang Zhou,
Tong Zhang
Abstract:
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm th…
▽ More
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Furthermore, the linear convergence of our algorithms only depends on the strong convexity of global objective and it does \emph{not} require the local functions to be convex. The design of our methods relies on a novel integration of well-known techniques including Nesterov's acceleration, multi-consensus and gradient-tracking. Empirical studies show the outperformance of our methods for machine learning applications.
△ Less
Submitted 10 October, 2023; v1 submitted 2 May, 2020;
originally announced May 2020.
-
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
Authors:
Luo Luo,
Haishan Ye,
Zhichao Huang,
Tong Zhang
Abstract:
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulatio…
▽ More
We consider nonconvex-concave minimax optimization problems of the form $\min_{\bf x}\max_{\bf y\in{\mathcal Y}} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$ and ${\mathcal Y}$ is a convex and compact set. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $Φ(\cdot)=\max_{\bf y\in{\mathcal Y}} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(κ^3\varepsilon^{-4})$ stochastic gradient evaluations, where $κ$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(κ^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem.
△ Less
Submitted 23 October, 2020; v1 submitted 11 January, 2020;
originally announced January 2020.
-
Common preperiodic points for quadratic polynomials
Authors:
Laura DeMarco,
Holly Krieger,
Hexi Ye
Abstract:
Let $f_c(z) = z^2+c$ for $c \in \mathbb{C}$. We show there exists a uniform bound on the number of points in $\mathbb{P}^1(\mathbb{C})$ that can be preperiodic for both $f_{c_1}$ and $f_{c_2}$ with $c_1\not= c_2$ in $\mathbb{C}$. The proof combines arithmetic ingredients with complex-analytic; we estimate an adelic energy pairing when the parameters lie in $\bar{\mathbb{Q}}$, building on the quant…
▽ More
Let $f_c(z) = z^2+c$ for $c \in \mathbb{C}$. We show there exists a uniform bound on the number of points in $\mathbb{P}^1(\mathbb{C})$ that can be preperiodic for both $f_{c_1}$ and $f_{c_2}$ with $c_1\not= c_2$ in $\mathbb{C}$. The proof combines arithmetic ingredients with complex-analytic; we estimate an adelic energy pairing when the parameters lie in $\bar{\mathbb{Q}}$, building on the quantitative arithmetic equidistribution theorem of Favre and Rivera-Letelier, and we use distortion theorems in complex analysis to control the size of the intersection of distinct Julia sets. The proof is effective, and we provide explicit constants for each of the results.
△ Less
Submitted 28 November, 2021; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Mirror Natural Evolution Strategies
Authors:
Haishan Ye,
Tong Zhang
Abstract:
Evolution Strategies such as CMA-ES (covariance matrix adaptation evolution strategy) and NES (natural evolution strategy) have been widely used in machine learning applications, where an objective function is optimized without using its derivatives. However, the convergence behaviors of these algorithms have not been carefully studied. In particular, there is no rigorous analysis for the converge…
▽ More
Evolution Strategies such as CMA-ES (covariance matrix adaptation evolution strategy) and NES (natural evolution strategy) have been widely used in machine learning applications, where an objective function is optimized without using its derivatives. However, the convergence behaviors of these algorithms have not been carefully studied. In particular, there is no rigorous analysis for the convergence of the estimated covariance matrix, and it is unclear how does the estimated covariance matrix help the converge of the algorithm. The relationship between Evolution Strategies and derivative free optimization algorithms is also not clear. In this paper, we propose a new algorithm closely related toNES, which we call MiNES (mirror descent natural evolution strategy), for which we can establish rigorous convergence results. We show that the estimated covariance matrix of MiNES converges to the inverse of Hessian matrix of the objective function with a sublinear convergence rate. Moreover, we show that some derivative free optimization algorithms are special cases of MiNES. Our empirical studies demonstrate that MiNES is a query-efficient optimization algorithm competitive to classical algorithms including NES and CMA-ES.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Benders' decomposition of the unit commitment problem with semidefinite relaxation of AC power flow constraints
Authors:
M. Paredes,
L. S. A. Martins,
S. Soares,
Hongxing Ye
Abstract:
In this paper we present a formulation of the unit commitment problem with AC power flow constraints. It is solved by a Benders decomposition in which the unit commitment master problem is formulated as a mixed-integer problem with linearization of the power generation constraints for improved convergence. Semidefinite programming relaxation of the rectangular AC optimal power flow is used in the…
▽ More
In this paper we present a formulation of the unit commitment problem with AC power flow constraints. It is solved by a Benders decomposition in which the unit commitment master problem is formulated as a mixed-integer problem with linearization of the power generation constraints for improved convergence. Semidefinite programming relaxation of the rectangular AC optimal power flow is used in the subproblem, providing somewhat conservative cuts. Numerical case studies, including a 6-bus and the IEEE 118-bus network, are provided to test the effectiveness of our proposal. We show in our numerical experiments that the use of such strategy improves the quality of feasibility and optimality cuts generated by the solution of the convex relaxation of the subproblem, therefore reducing the number of iterations required for algorithm convergence.
△ Less
Submitted 23 November, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Uniform Manin-Mumford for a family of genus 2 curves
Authors:
Laura DeMarco,
Holly Krieger,
Hexi Ye
Abstract:
We introduce a general strategy for proving quantitative and uniform bounds on the number of common points of height zero for a pair of inequivalent height functions on $\mathbb{P}^1(\overline{\mathbb{Q}}).$ We apply this strategy to prove a conjecture of Bogomolov, Fu, and Tschinkel asserting uniform bounds on the number of common torsion points of elliptic curves in the case of two Legendre curv…
▽ More
We introduce a general strategy for proving quantitative and uniform bounds on the number of common points of height zero for a pair of inequivalent height functions on $\mathbb{P}^1(\overline{\mathbb{Q}}).$ We apply this strategy to prove a conjecture of Bogomolov, Fu, and Tschinkel asserting uniform bounds on the number of common torsion points of elliptic curves in the case of two Legendre curves over $\mathbb{C}$. As a consequence, we obtain two uniform bounds for a two-dimensional family of genus 2 curves: a uniform Manin-Mumford bound for the family over $\mathbb{C}$, and a uniform Bogomolov bound for the family over $\overline{\mathbb{Q}}.$
△ Less
Submitted 2 December, 2019; v1 submitted 28 January, 2019;
originally announced January 2019.
-
A convergence result on the second boundary value problem for parabolic equations
Authors:
R. L. Huang,
Y. H. Ye
Abstract:
We establish a Schn$\ddot{\text{u}}$rer's convergence result and then apply it to obtain the existence of solutions on the second boundary value problem for a family of special Lagrangian equations
We establish a Schn$\ddot{\text{u}}$rer's convergence result and then apply it to obtain the existence of solutions on the second boundary value problem for a family of special Lagrangian equations
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
The Dynamical Manin-Mumford Conjecture and the Dynamical Bogomolov Conjecture for endomorphisms of (P^1)^n
Authors:
Dragos Ghioca,
Khoa D. Nguyen,
Hexi Ye
Abstract:
We prove Zhang's Dynamical Manin-Mumford Conjecture and Dynamical Bogomolov Conjecture for dominant endomorphisms of (P^1)^n. We use the equidistribution theorem for points of small height with respect to an algebraic dynamical system, combined with an analysis of the symmetries of the Julia set for a rational function.
We prove Zhang's Dynamical Manin-Mumford Conjecture and Dynamical Bogomolov Conjecture for dominant endomorphisms of (P^1)^n. We use the equidistribution theorem for points of small height with respect to an algebraic dynamical system, combined with an analysis of the symmetries of the Julia set for a rational function.
△ Less
Submitted 13 May, 2017;
originally announced May 2017.
-
The sharp existence of constrained minimizers for the $L^2$-critical Schrödinger-Poisson system and Schrödinger equations
Authors:
Hongyu Ye
Abstract:
In this paper, we study the existence of minimizers for a class of constrained minimization problems derived from the Schrödinger-Poisson equations: $$-Δu+V(x)u+(|x|^{-1}*u^2)u-|u|^\frac{4}{3}u=λu,~~x\in\R^3$$ on the $L^2$-spheres $\widetilde{S}(c)=\{u\in H^1(\R^3)|~\int_{\R^3}V(x)u^2dx<+\infty,~|u|_2^2=c>0\}$. If $V(x)\equiv0$, then by a different method from Jeanjean and Luo [Z. Angrew. Math. Ph…
▽ More
In this paper, we study the existence of minimizers for a class of constrained minimization problems derived from the Schrödinger-Poisson equations: $$-Δu+V(x)u+(|x|^{-1}*u^2)u-|u|^\frac{4}{3}u=λu,~~x\in\R^3$$ on the $L^2$-spheres $\widetilde{S}(c)=\{u\in H^1(\R^3)|~\int_{\R^3}V(x)u^2dx<+\infty,~|u|_2^2=c>0\}$. If $V(x)\equiv0$, then by a different method from Jeanjean and Luo [Z. Angrew. Math. Phys. 64 (2013), 937-954], we show that there is no minimizer for all $c>0$; If $0\leq V(x)\in L^{\infty}_{loc}(\R^3)$ and $\lim\limits_{|x|\rightarrow+\infty}V(x)=+\infty$, then a minimizer exists if and only if $0<c\leq c^*=|Q|_2^2$, where $Q$ is the unique positive radial solution of $-Δu+u=|u|^{\frac{4}{3}}u,$ $x\in\R^3$. Our results are sharp. We also extend some results to constrained minimization problems on $\widetilde{S}(c)$ derived from Schrödinger operators: $$F_μ(u)=\frac{1}{2}\ds\int_{\R^N}|\nabla u|^2-\fracμ2\ds\int_{\R^N}V(x)u^2-\frac{N}{2N+4}|u|^\frac{2N+4}{N}$$ where $0\leq V(x)\in L^{\infty}_{loc}(\R^N)$ and $\lim\limits_{|x|\rightarrow+\infty}V(x)=0$. We show that if $μ>μ_1$ for some $μ_1>0$, then a minimizer exists for each $c\in(0,c^*)$.
△ Less
Submitted 3 May, 2017;
originally announced May 2017.
-
Bounded height in families of dynamical systems
Authors:
Laura DeMarco,
Dragos Ghioca,
Holly Krieger,
Khoa D. Nguyen,
Thomas J. Tucker,
Hexi Ye
Abstract:
Let a and b be algebraic numbers such that exactly one of a and b is an algebraic integer, and let f_t(z):=z^2+t be a family of polynomials parametrized by t. We prove that the set of all algebraic numbers t for which there exist positive integers m and n such that f_t^m(a)=f_t^n(b) has bounded Weil height. This is a special case of a more general result supporting a new bounded height conjecture…
▽ More
Let a and b be algebraic numbers such that exactly one of a and b is an algebraic integer, and let f_t(z):=z^2+t be a family of polynomials parametrized by t. We prove that the set of all algebraic numbers t for which there exist positive integers m and n such that f_t^m(a)=f_t^n(b) has bounded Weil height. This is a special case of a more general result supporting a new bounded height conjecture in dynamics. Our results fit into the general setting of the principle of unlikely intersections in arithmetic dynamics.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.
-
Approximate Newton Methods
Authors:
Haishan Ye,
Luo Luo,
Zhihua Zhang
Abstract:
Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high…
▽ More
Many machine learning models involve solving optimization problems. Thus, it is important to deal with a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Other efficient stochastic second order methods are also proposed. However, the convergence properties of these methods are still not well understood. There are also several important gaps between the current convergence theory and the performance in real applications. In this paper, we aim to fill these gaps. We propose a unifying framework to analyze both local and global convergence properties of second order methods. Based on this framework, we present our theoretical results which match the performance in real applications well.
△ Less
Submitted 21 March, 2020; v1 submitted 26 February, 2017;
originally announced February 2017.
-
Robust Coordinated Transmission and Generation Expansion Planning Considering Ramping Requirements and Construction Periods
Authors:
Jia Li,
Zuyi Li,
Feng Liu,
Hongxing Ye,
Xuemin Zhang,
Shengwei Mei,
Naichao Chang
Abstract:
Two critical issues have arisen in transmission expansion planning with the rapid growth of wind power generation. First, severe power ramping events in daily operation due to the high variability of wind power generation pose great challenges to multi-year planning decision making. Second, the long construction periods of transmission lines may not be able to keep pace with the fast growing uncer…
▽ More
Two critical issues have arisen in transmission expansion planning with the rapid growth of wind power generation. First, severe power ramping events in daily operation due to the high variability of wind power generation pose great challenges to multi-year planning decision making. Second, the long construction periods of transmission lines may not be able to keep pace with the fast growing uncertainty due to the increasing integration of renewable energy generation. To address such issues, we propose a comprehensive robust planning model considering different resources, namely, transmission lines, generators, and FACTS devices. Various factors are taken into account, including flexibility requirement, construction period, and cost. We construct the hourly net load ramping uncertainty (HLRU) set to characterize the variation of hourly net load including wind power generation, and the annual net load duration curve uncertainty (LDCU) set for the uncertainty of normal annual net load duration curve. This results in a two-stage robust optimization model with two different types of uncertainty sets, which are decoupled into two different sets of subproblems to make the entire solution process tractable. Numerical simulations with real-world data show that the proposed model and solution method are effective to coordinate different flexible resources, rendering robust expansion planning strategies.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.