-
A point cloud reconstruction method based on uncertainty feature enhancement for aerodynamic shape optimization
Authors:
Junlin Li,
Yang Zhang,
Bo Pang,
Junqiang Bai,
Jiakuan Xu
Abstract:
The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architectu…
▽ More
The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architecture, called AE-BUFE, designed to attain efficient and precise generalized representations of 3D aircraft through uncertainty analysis of the deformation relationships among surface grid points. The deep learning architecture consists of two components: the uncertainty index-based feature enhancement module and the point cloud autoencoder module. It learns the shape features of the point cloud geometric representation to establish a low-dimensional latent space. To assess and evaluate the efficiency of the method, a comparison was conducted with the prevailing point cloud autoencoder architecture and the proper orthogonal decomposition (POD) linear dimensionality reduction method under conditions of complex shape deformation. The results showed that the new architecture significantly improved the extraction effect of the low-dimensional latent space. Then, we developed the SBO optimization framework based on the AE-BUFE parameterization method and completed a multi-objective aerodynamic optimization design for a wide-speed-range vehicle considering volume and moment constraints. While ensuring the take-off and landing performance, the aerodynamic performance is improved at transonic and hypersonic conditions, which verifies the efficiency and engineering practicability of this method.
△ Less
Submitted 2 April, 2025; v1 submitted 29 March, 2025;
originally announced March 2025.
-
A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection
Authors:
Yu Shao,
Jiawen Bai,
Yingze Hou,
Xia'an Zhou,
Zhanhao Pan
Abstract:
The credit spread is a key indicator in bond investments, offering valuable insights for fixed-income investors to devise effective trading strategies. This study proposes a novel credit spread forecasting model leveraging ensemble learning techniques. To enhance predictive accuracy, a feature selection method based on mutual information is incorporated. Empirical results demonstrate that the prop…
▽ More
The credit spread is a key indicator in bond investments, offering valuable insights for fixed-income investors to devise effective trading strategies. This study proposes a novel credit spread forecasting model leveraging ensemble learning techniques. To enhance predictive accuracy, a feature selection method based on mutual information is incorporated. Empirical results demonstrate that the proposed methodology delivers superior accuracy in credit spread predictions. Additionally, we present a forecast of future credit spread trends using current data, providing actionable insights for investment decision-making.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
A Novel Property of Generalized Fibonacci Sequence in Grids
Authors:
Zixian Yang,
Jianchao Bai
Abstract:
Fibonacci sequence, generated by summing the preceding two terms, is a classical sequence renowned for its elegant properties. In this paper, leveraging properties of generalized Fibonacci sequences and formulas for consecutive sums of equidistant subsequences, we investigate the ratio of the sum of numbers along main-diagonal and sub-diagonal of odd-order grids containing generalized Fibonacci se…
▽ More
Fibonacci sequence, generated by summing the preceding two terms, is a classical sequence renowned for its elegant properties. In this paper, leveraging properties of generalized Fibonacci sequences and formulas for consecutive sums of equidistant subsequences, we investigate the ratio of the sum of numbers along main-diagonal and sub-diagonal of odd-order grids containing generalized Fibonacci sequences. We show that this ratio is solely dependent on the order of the grid, providing a concise and splendid identity.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks
Authors:
Yizheng Wang,
Jia Sun,
Jinshuai Bai,
Cosmin Anitescu,
Mohammad Sadegh Eshaghi,
Xiaoying Zhuang,
Timon Rabczuk,
Yinghua Liu
Abstract:
AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in…
▽ More
AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in various forms, such as strong form, energy form, and inverse form. While mathematically equivalent, these forms are not computationally equivalent, making the exploration of different PDE formulations significant in computational physics. Thus, we propose different PDE forms based on KAN instead of MLP, termed Kolmogorov-Arnold-Informed Neural Network (KINN) for solving forward and inverse problems. We systematically compare MLP and KAN in various numerical examples of PDEs, including multi-scale, singularity, stress concentration, nonlinear hyperelasticity, heterogeneous, and complex geometry problems. Our results demonstrate that KINN significantly outperforms MLP regarding accuracy and convergence speed for numerous PDEs in computational solid mechanics, except for the complex geometry problem. This highlights KINN's potential for more efficient and accurate PDE solutions in AI for PDEs.
△ Less
Submitted 4 August, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
On the Reachability of 3-Dimensional Paths with a Prescribed Curvature Bound
Authors:
Juho Bae,
Ji Hoon Bai,
Byung-Yoon Lee,
Jun-Yong Lee,
Chang-Hun Lee
Abstract:
This paper presents the reachability analysis of curves in $\mathbb{R}^3$ with a prescribed curvature bound. Based on Pontryagin Maximum Principle, we leverage the existing knowledge on the structure of solutions to minimum-time problems, or Markov-Dubins problem, to reachability considerations. Based on this development, two types of reachability are discussed. First, we prove that any boundary p…
▽ More
This paper presents the reachability analysis of curves in $\mathbb{R}^3$ with a prescribed curvature bound. Based on Pontryagin Maximum Principle, we leverage the existing knowledge on the structure of solutions to minimum-time problems, or Markov-Dubins problem, to reachability considerations. Based on this development, two types of reachability are discussed. First, we prove that any boundary point of the reachability set, with the directional component taken into account as well as geometric coordinates, can be reached via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Second, we show that the reachability set when directional component is not considered\textemdash{}the position reachability set\textemdash{}is simply a solid of revolution of its two-dimensional counterpart, the Dubins car. These findings extend the developments presented in literature on Dubins car into spatial curves in $\mathbb{R}^3$.
△ Less
Submitted 26 March, 2025; v1 submitted 27 March, 2024;
originally announced March 2024.
-
A Unified Inexact Stochastic ADMM for Composite Nonconvex and Nonsmooth Optimization
Authors:
Yuxuan Zeng,
Jianchao Bai,
Shengjia Wang,
Zhiguo Wang
Abstract:
In this paper, we propose a unified framework of inexact stochastic Alternating Direction Method of Multipliers (ADMM) for solving nonconvex problems subject to linear constraints, whose objective comprises an average of finite-sum smooth functions and a nonsmooth but possibly nonconvex function. The new framework is highly versatile. Firstly, it not only covers several existing algorithms such as…
▽ More
In this paper, we propose a unified framework of inexact stochastic Alternating Direction Method of Multipliers (ADMM) for solving nonconvex problems subject to linear constraints, whose objective comprises an average of finite-sum smooth functions and a nonsmooth but possibly nonconvex function. The new framework is highly versatile. Firstly, it not only covers several existing algorithms such as SADMM, SVRG-ADMM, and SPIDER-ADMM but also guides us to design a novel accelerated hybrid stochastic ADMM algorithm, which utilizes a new hybrid estimator to trade-off variance and bias. Second, it enables us to exploit a more flexible dual stepsize in the convergence analysis. Under some mild conditions, our unified framework preserves $\mathcal{O}(1/T)$ sublinear convergence. Additionally, we establish the linear convergence under error bound conditions. Finally, numerical experiments demonstrate the efficacy of the new algorithm for some nonsmooth and nonconvex problems.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Maximal exponent of the Lorentz cones
Authors:
Guillaume Aubrun,
Jing Bai
Abstract:
We show that the maximal exponent (i.e., the minimum number of iterations required for a primitive map to become strictly positive) of the n-dimensional Lorentz cone is equal to n. As a byproduct, we show that the optimal exponent in the quantum Wielandt inequality for qubit channels is equal to 3.
We show that the maximal exponent (i.e., the minimum number of iterations required for a primitive map to become strictly positive) of the n-dimensional Lorentz cone is equal to n. As a byproduct, we show that the optimal exponent in the quantum Wielandt inequality for qubit channels is equal to 3.
△ Less
Submitted 12 December, 2023; v1 submitted 30 November, 2023;
originally announced November 2023.
-
An Accelerated Stochastic ADMM for Nonconvex and Nonsmooth Finite-Sum Optimization
Authors:
Yuxuan Zeng,
Zhiguo Wang,
Jianchao Bai,
Xiaojing Shen
Abstract:
The nonconvex and nonsmooth finite-sum optimization problem with linear constraint has attracted much attention in the fields of artificial intelligence, computer, and mathematics, due to its wide applications in machine learning and the lack of efficient algorithms with convincing convergence theories. A popular approach to solve it is the stochastic Alternating Direction Method of Multipliers (A…
▽ More
The nonconvex and nonsmooth finite-sum optimization problem with linear constraint has attracted much attention in the fields of artificial intelligence, computer, and mathematics, due to its wide applications in machine learning and the lack of efficient algorithms with convincing convergence theories. A popular approach to solve it is the stochastic Alternating Direction Method of Multipliers (ADMM), but most stochastic ADMM-type methods focus on convex models. In addition, the variance reduction (VR) and acceleration techniques are useful tools in the development of stochastic methods due to their simplicity and practicability in providing acceleration characteristics of various machine learning models. However, it remains unclear whether accelerated SVRG-ADMM algorithm (ASVRG-ADMM), which extends SVRG-ADMM by incorporating momentum techniques, exhibits a comparable acceleration characteristic or convergence rate in the nonconvex setting. To fill this gap, we consider a general nonconvex nonsmooth optimization problem and study the convergence of ASVRG-ADMM. By utilizing a well-defined potential energy function, we establish its sublinear convergence rate $O(1/T)$, where $T$ denotes the iteration number. Furthermore, under the additional Kurdyka-Lojasiewicz (KL) property which is less stringent than the frequently used conditions for showcasing linear convergence rates, such as strong convexity, we show that the ASVRG-ADMM sequence has a finite length and converges to a stationary solution with a linear convergence rate. Several experiments on solving the graph-guided fused lasso problem and regularized logistic regression problem validate that the proposed ASVRG-ADMM performs better than the state-of-the-art methods.
△ Less
Submitted 3 July, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Grassmann Tensors and their applications in geometry
Authors:
Changqing Xu,
Kaijie Xu,
Jun Wang,
Jingxuan Bai
Abstract:
In this paper, we introduce the Grassmann tensor by tensor product of vectors and some basic terminology in tensor theory. Some basic properties of the Grassmann tensors are investigated and the tensor language is used to rewrite some relations and correspondences in the mutliview geometry. Finally we show that a polytope in the Euclidean space $\R^{n}$ can also be concisely expressed as the Grass…
▽ More
In this paper, we introduce the Grassmann tensor by tensor product of vectors and some basic terminology in tensor theory. Some basic properties of the Grassmann tensors are investigated and the tensor language is used to rewrite some relations and correspondences in the mutliview geometry. Finally we show that a polytope in the Euclidean space $\R^{n}$ can also be concisely expressed as the Grassmann tensor generated by its vertices.
△ Less
Submitted 3 September, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
On complete hypersurfaces with constant scalar curvature $n(n-1)$ in the unit sphere
Authors:
Jinchuan Bai,
Yong Luo
Abstract:
Let $M^n$ be an $n$-dimensional complete and locally conformally flat hypersurface in the unit sphere $\mathbb{S}^{n+1}$ with constant scalar curvature $n(n-1)$. We show that if the total curvature $\left( \int _ { M } | H | ^ { n } d v \right) ^ { \frac { 1 } { n } }$ of $M$ is sufficiently small, then $M^n$ is totally geodesic.
Let $M^n$ be an $n$-dimensional complete and locally conformally flat hypersurface in the unit sphere $\mathbb{S}^{n+1}$ with constant scalar curvature $n(n-1)$. We show that if the total curvature $\left( \int _ { M } | H | ^ { n } d v \right) ^ { \frac { 1 } { n } }$ of $M$ is sufficiently small, then $M^n$ is totally geodesic.
△ Less
Submitted 16 February, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Quotients of Palindromic and Antipalindromic Numbers
Authors:
James Haoyu Bai,
Joseph Meleshko,
Samin Riasat,
Jeffrey Shallit
Abstract:
A natural number N is said to be palindromic if its binary representation reads the same forwards and backwards. In this paper we study the quotients of two palindromic numbers and answer some basic questions about the resulting sets of integers and rational numbers. For example, we show that the following problem is algorithmically decidable: given an integer N, determine if we can write N = A/B…
▽ More
A natural number N is said to be palindromic if its binary representation reads the same forwards and backwards. In this paper we study the quotients of two palindromic numbers and answer some basic questions about the resulting sets of integers and rational numbers. For example, we show that the following problem is algorithmically decidable: given an integer N, determine if we can write N = A/B for palindromic numbers A and B. Given that N is representable, we find a bound on the size of the numerator of the smallest representation. We prove that the set of unrepresentable integers has positive density in N. We also obtain similar results for quotients of antipalindromic numbers (those for which the first half of the binary representation is the reverse complement of the second half). We also provide examples, numerical data, and a number of intriguing conjectures and open problems.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
A new insight on augmented Lagrangian method with applications in machine learning
Authors:
Jianchao Bai,
Linyuan Jia,
Zheng Peng
Abstract:
By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergenc…
▽ More
By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergence results about the sublinear and linear convergence rates are established by variational characterizations for both the saddle-point of the problem and the first-order optimality conditions of involved subproblems. A large number of experiments on testing the linear support vector machine problem and the robust principal component analysis problem arising from machine learning indicate that our proposed algorithms perform much better than several state-of-the-art algorithms.
△ Less
Submitted 13 June, 2025; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Iteration complexity analysis of a partial LQP-based alternating direction method of multipliers
Authors:
Jianchao Bai,
Yuxue Ma,
Hao Sun,
Miao Zhang
Abstract:
In this paper, we consider a prototypical convex optimization problem with multi-block variables and separable structures. By adding the Logarithmic Quadratic Proximal (LQP) regularizer with suitable proximal parameter to each of the first grouped subproblems, we develop a partial LQP-based Alternating Direction Method of Multipliers (ADMM-LQP). The dual variable is updated twice with relatively l…
▽ More
In this paper, we consider a prototypical convex optimization problem with multi-block variables and separable structures. By adding the Logarithmic Quadratic Proximal (LQP) regularizer with suitable proximal parameter to each of the first grouped subproblems, we develop a partial LQP-based Alternating Direction Method of Multipliers (ADMM-LQP). The dual variable is updated twice with relatively larger stepsizes than the classical region $(0,\frac{1+\sqrt{5}}{2})$. Using a prediction-correction approach to analyze properties of the iterates generated by ADMM-LQP, we establish its global convergence and sublinear convergence rate of $O(1/T)$ in the new ergodic and nonergodic senses, where $T$ denotes the iteration index. We also extend the algorithm to a nonsmooth composite convex optimization and establish {similar convergence results} as our ADMM-LQP.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Convergence on a symmetric accelerated stochastic ADMM with larger stepsizes
Authors:
Jianchao Bai,
Deren Han,
Hao Sun,
Hongchao Zhang
Abstract:
In this paper, we develop a symmetric accelerated stochastic Alternating Direction Method of Multipliers (SAS-ADMM) for solving separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and an average function of many smooth convex functions. Our proposed algorithm combines both ideas of ADMM and the techniques of acce…
▽ More
In this paper, we develop a symmetric accelerated stochastic Alternating Direction Method of Multipliers (SAS-ADMM) for solving separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and an average function of many smooth convex functions. Our proposed algorithm combines both ideas of ADMM and the techniques of accelerated stochastic gradient methods possibly with variance reduction to solve the smooth subproblem. One main feature of SAS-ADMM is that its dual variable is symmetrically updated after each update of the separated primal variable, which would allow a more flexible and larger convergence region of the dual variable compared with that of standard deter-ministic or stochastic ADMM. This new stochastic optimization algorithm is shown to have ergodic converge in expectation with O(1/T) convergence rate, where T is the number of outer iterations. Our preliminary experiments indicate the proposed algorithm is very effective for solving separable optimization problems from big-data applications. Finally, 3-block extensions of the algorithm and its variant of an accelerated stochastic augmented Lagrangian method are discussed in the appendix.
△ Less
Submitted 19 December, 2021; v1 submitted 30 March, 2021;
originally announced March 2021.
-
Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee
Authors:
Jincheng Bai,
Qifan Song,
Guang Cheng
Abstract:
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally inf…
▽ More
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally infeasible. In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors, and develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution. The variational posterior contraction rate is provided, which justifies the consistency of the proposed variational Bayes method. Notably, our empirical results demonstrate that this variational procedure provides uncertainty quantification in terms of Bayesian predictive distribution and is also capable to accomplish consistent variable selection by training a sparse multi-layer neural network.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors
Authors:
Jincheng Bai,
Qifan Song,
Guang Cheng
Abstract:
We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alter…
▽ More
We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alternative of Markov Chain Monte Carlo (MCMC) sampling. Meanwhile, comparing to conventional MCMC methods, the VB procedure achieves much higher computational efficiency, which greatly alleviates the computing burden for modern machine learning applications such as massive data analysis. Through numerical studies, we demonstrate that the proposed VB method leads to shorter computing time, higher estimation accuracy, and lower variable selection error than competitive sparse Bayesian methods.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
An Inexact Accelerated Stochastic ADMM for Separable Convex Optimization
Authors:
Jianchao Bai,
William W. Hager,
Hongchao Zhang
Abstract:
An inexact accelerated stochastic Alternating Direction Method of Multipliers (AS-ADMM) scheme is developed for solving structured separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and a smooth function which is an average of many component convex functions. Problems having this structure often arise in machine…
▽ More
An inexact accelerated stochastic Alternating Direction Method of Multipliers (AS-ADMM) scheme is developed for solving structured separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and a smooth function which is an average of many component convex functions. Problems having this structure often arise in machine learning and data mining applications. AS-ADMM combines the ideas of both ADMM and the stochastic gradient methods using variance reduction techniques. One of the ADMM subproblems employs a linearization technique while a similar linearization could be introduced for the other subproblem. For a specified choice of the algorithm parameters, it is shown that the objective error and the constraint violation are $\mathcal{O}(1/k)$ relative to the number of outer iterations $k$. Under a strong convexity assumption, the expected iterate error converges to zero linearly. A linearized variant of AS-ADMM and incremental sampling strategies are also discussed. Numerical experiments with both stochastic and deterministic ADMM algorithms show that AS-ADMM can be particularly effective for structured optimization arising in big data applications.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Adaptive Variational Bayesian Inference for Sparse Deep Neural Network
Authors:
Jincheng Bai,
Qifan Song,
Guang Cheng
Abstract:
In this work, we focus on variational Bayesian inference on the sparse Deep Neural Network (DNN) modeled under a class of spike-and-slab priors. Given a pre-specified sparse DNN structure, the corresponding variational posterior contraction rate is characterized that reveals a trade-off between the variational error and the approximation error, which are both determined by the network structural c…
▽ More
In this work, we focus on variational Bayesian inference on the sparse Deep Neural Network (DNN) modeled under a class of spike-and-slab priors. Given a pre-specified sparse DNN structure, the corresponding variational posterior contraction rate is characterized that reveals a trade-off between the variational error and the approximation error, which are both determined by the network structural complexity (i.e., depth, width and sparsity). However, the optimal network structure, which strikes the balance of the aforementioned trade-off and yields the best rate, is generally unknown in reality. Therefore, our work further develops an {\em adaptive} variational inference procedure that can automatically select a reasonably good (data-dependent) network structure that achieves the best contraction rate, without knowing the optimal network structure. In particular, when the true function is H{ö}lder smooth, the adaptive variational inference is capable to attain (near-)optimal rate without the knowledge of smoothness level. The above rate still suffers from the curse of dimensionality, and thus motivates the teacher-student setup, i.e., the true function is a sparse DNN model, under which the rate only logarithmically depends on the input dimension.
△ Less
Submitted 2 August, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
DEAM: Adaptive Momentum with Discriminative Weight for Stochastic Optimization
Authors:
Jiyang Bai,
Yuxiang Ren,
Jiawei Zhang
Abstract:
Optimization algorithms with momentum, e.g., (ADAM), have been widely used for building deep learning models due to the faster convergence rates compared with stochastic gradient descent (SGD). Momentum helps accelerate SGD in the relevant directions in parameter updating, which can minify the oscillations of parameters update route. However, there exist errors in some update steps in optimization…
▽ More
Optimization algorithms with momentum, e.g., (ADAM), have been widely used for building deep learning models due to the faster convergence rates compared with stochastic gradient descent (SGD). Momentum helps accelerate SGD in the relevant directions in parameter updating, which can minify the oscillations of parameters update route. However, there exist errors in some update steps in optimization algorithms with momentum like ADAM. The fixed momentum weight (e.g., β_1 in ADAM) will propagate errors in momentum computing. In this paper, we introduce a novel optimization algorithm, namely Discriminative wEight on Adaptive Momentum (DEAM). Instead of assigning the momentum term weight with a fixed hyperparameter, DEAM proposes to compute the momentum weight automatically based on the discriminative angle. In this way, DEAM involves fewer hyperparameters. DEAM also contains a novel backtrack term, which restricts redundant updates when the correction of the last step is needed. Extensive experiments demonstrate that DEAM can achieve a faster convergence rate than the existing optimization algorithms in training the deep learning models of both convex and non-convex situations.
△ Less
Submitted 22 January, 2020; v1 submitted 25 July, 2019;
originally announced July 2019.
-
A family of multi-parameterized proximal point algorithms
Authors:
Jianchao Bai,
Ke Guo,
Xiaokai Chang
Abstract:
In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that…
▽ More
In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that the proposed algorithm performs better than some well-established methods
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Accelerated Symmetric ADMM and Its Applications in Signal Processing
Authors:
Jianchao Bai,
Junli Liang,
Ke Guo,
Yang Jing
Abstract:
The alternating direction method of multipliers (ADMM) were extensively investigated in the past decades for solving separable convex optimization problems. Fewer researchers focused on exploring its convergence properties for the nonconvex case although it performed surprisingly efficient. In this paper, we propose a symmetric ADMM based on different acceleration techniques for a family of potent…
▽ More
The alternating direction method of multipliers (ADMM) were extensively investigated in the past decades for solving separable convex optimization problems. Fewer researchers focused on exploring its convergence properties for the nonconvex case although it performed surprisingly efficient. In this paper, we propose a symmetric ADMM based on different acceleration techniques for a family of potentially nonsmooth nonconvex programing problems with equality constraints, where the dual variables are updated twice with different stepsizes. Under proper assumptions instead of using the so-called Kurdyka-Lojasiewicz inequality, convergence of the proposed algorithm as well as its pointwise iteration-complexity are analyzed in terms of the corresponding augmented Lagrangian function and the primal-dual residuals, respectively. Performance of our algorithm is verified by some preliminary numerical examples on applications in sparse nonconvex/convex regularized minimization signal processing problems.
△ Less
Submitted 1 July, 2019; v1 submitted 27 June, 2019;
originally announced June 2019.
-
Convergence Revisit on Generalized Symmetric ADMM
Authors:
Jianchao Bai,
Xiaokai Chang,
Jicheng Li,
Fengmin Xu
Abstract:
In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remain…
▽ More
In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Proximal extrapolated gradient methods with prediction and correction for monotone variational inequalities
Authors:
Xiaokai Chang,
Sanyang Liu,
Jianchao Bai,
Jun Yang
Abstract:
An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, w…
▽ More
An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, which is even weaker than the boundedness of sequence generated and always holds when the operator is the gradient of a convex function. We establish its convergence and ergodic convergence rate in theory under the larger range of parameters. Furthermore, we improve numerical efficiency by employing the proposed method with non-monotonic step size, and obtain the upper bound of the parameter relating to step size by an extremely simple example. Related numerical experiments illustrate the improvements in efficiency from the larger step size.
△ Less
Submitted 4 December, 2019; v1 submitted 12 December, 2018;
originally announced December 2018.
-
On the generalized low rank approximation of the correlation matrices arising in the asset portfolio
Authors:
Xuefeng Duan,
Jianchao Bai,
Maojun Zhang,
Xinjun Zhang
Abstract:
In this paper, we consider the generalized low rank approximation of the correlation matrices problem which arises in the asset portfolio. We first characterize the feasible set by using the Gramian representation together with a special trigonometric function transform, and then transform the generalized low rank approximation of the correlation matrices problem into an unconstrained optimization…
▽ More
In this paper, we consider the generalized low rank approximation of the correlation matrices problem which arises in the asset portfolio. We first characterize the feasible set by using the Gramian representation together with a special trigonometric function transform, and then transform the generalized low rank approximation of the correlation matrices problem into an unconstrained optimization problem. Finally, we use the conjugate gradient algorithm with the strong Wolfe line search to solve the unconstrained optimization problem. Numerical examples show that our new method is feasible and effective.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Generalized Symmetric ADMM for Separable Convex Optimization
Authors:
Jianchao Bai,
Jicheng Li,
Fengmin Xu,
Hongchao Zhang
Abstract:
The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GS-ADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multi-block separable convex programming. This GS-ADMM partitions the data into two group va…
▽ More
The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GS-ADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multi-block separable convex programming. This GS-ADMM partitions the data into two group variables so that one group consists of $p$ block variables while the other has $q$ block variables, where $p \ge 1$ and $q \ge 1$ are two integers. The two grouped variables are updated in a {\it Gauss-Seidel} scheme, while the variables within each group are updated in a {\it Jacobi} scheme, which would make it very attractive for a big data setting. By adding proper proximal terms to the subproblems, we specify the domain of the stepsizes to guarantee that GS-ADMM is globally convergent with a worst-case $O(1/t)$ ergodic convergence rate. It turns out that our convergence domain of the stepsizes is significantly larger than other convergence domains in the literature. Hence, the GS-ADMM is more flexible and attractive on choosing and using larger stepsizes of the dual variable. Besides, two special cases of GS-ADMM, which allows using zero penalty terms, are also discussed and analyzed. Compared with several state-of-the-art methods, preliminary numerical experiments on solving a sparse matrix minimization problem in the statistical learning show that our proposed method is effective and promising.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
General parameterized proximal point algorithm with applications in statistical learning
Authors:
Jianchao Bai,
Jicheng Li,
Pingfan Dai,
Jiaofen Li
Abstract:
In the literature, there are a few researches to design some parameters in the Proximal Point Algorithm (PPA), especially for the multi-objective convex optimizations. Introducing some parameters to PPA can make it more flexible and attractive. Mainly motivated by our recent work (Bai et al., A parameterized proximal point algorithm for separable convex optimization, Optim. Lett. (2017) doi: 10.10…
▽ More
In the literature, there are a few researches to design some parameters in the Proximal Point Algorithm (PPA), especially for the multi-objective convex optimizations. Introducing some parameters to PPA can make it more flexible and attractive. Mainly motivated by our recent work (Bai et al., A parameterized proximal point algorithm for separable convex optimization, Optim. Lett. (2017) doi: 10.1007/s11590-017-1195-9), in this paper we develop a general parameterized PPA with a relaxation step for solving the multi-block separable structured convex programming. By making use of the variational inequality and some mathematical identities, the global convergence and the worst-case $\mathcal{O}(1/t)$ convergence rate of the proposed algorithm are established. Preliminary numerical experiments on solving a sparse matrix minimization problem from statistical learning validate that our algorithm is more efficient than several state-of-the-art algorithms.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
A parameterized proximal point algorithm for separable convex optimization
Authors:
Jianchao Bai,
Hongchao Zhang,
Jicheng Li
Abstract:
In this paper, we develop a parameterized proximal point algorithm (P-PPA) for solving a class of separable convex programming problems subject to linear and convex constraints. The proposed algorithm is provable to be globally convergent with a worst-case O(1/t) convergence rate, wheret denotes the iteration number. By properly choosing the algorithm parameters, numerical experiments on solving a…
▽ More
In this paper, we develop a parameterized proximal point algorithm (P-PPA) for solving a class of separable convex programming problems subject to linear and convex constraints. The proposed algorithm is provable to be globally convergent with a worst-case O(1/t) convergence rate, wheret denotes the iteration number. By properly choosing the algorithm parameters, numerical experiments on solving a sparse optimization problem arising from statistical learning show that our P-PPA could perform significantly better than other state-of-the-art methods, such as the alternating direction method of multipliers and the relaxed proximal point algorithm.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
An improved immersed finte element particle-in-cell method for plasma simulation
Authors:
Jinwei Bai,
Yong Cao,
Yuchuan Chu,
Xu Zhang
Abstract:
The particle-in-cell (PIC) method has been widely used for plasma simulation, because of its noise-reduction capability and moderate computational cost. The immersed finite element (IFE) method is efficient for solving interface problems on Cartesian meshes, which is desirable for PIC method. The combination of these two methods provides an effective tool for plasma simulation with complex interfa…
▽ More
The particle-in-cell (PIC) method has been widely used for plasma simulation, because of its noise-reduction capability and moderate computational cost. The immersed finite element (IFE) method is efficient for solving interface problems on Cartesian meshes, which is desirable for PIC method. The combination of these two methods provides an effective tool for plasma simulation with complex interface/boundary. This paper introduces an improved IFE-PIC method that enhances the performance in both IFE and PIC aspects. For the electric field solver, we adopt the newly developed partially penalized IFE method with enhanced accuracy. For PIC implementation, we introduce a new interpolation technique to ensure the conservation of the charge. Numerical examples are provided to demonstrate the features of the improved IFE-PIC method.
△ Less
Submitted 9 June, 2017;
originally announced June 2017.
-
Theory and methods of panel data models with interactive effects
Authors:
Jushan Bai,
Kunpeng Li
Abstract:
This paper considers the maximum likelihood estimation of panel data models with interactive effects. Motivated by applications in economics and other social sciences, a notable feature of the model is that the explanatory variables are correlated with the unobserved effects. The usual within-group estimator is inconsistent. Existing methods for consistent estimation are either designed for panel…
▽ More
This paper considers the maximum likelihood estimation of panel data models with interactive effects. Motivated by applications in economics and other social sciences, a notable feature of the model is that the explanatory variables are correlated with the unobserved effects. The usual within-group estimator is inconsistent. Existing methods for consistent estimation are either designed for panel data with short time periods or are less efficient. The maximum likelihood estimator has desirable properties and is easy to implement, as illustrated by the Monte Carlo simulations. This paper develops the inferential theory for the maximum likelihood estimator, including consistency, rate of convergence and the limiting distributions. We further extend the model to include time-invariant regressors and common regressors (cross-section invariant). The regression coefficients for the time-invariant regressors are time-varying, and the coefficients for the common regressors are cross-sectionally varying.
△ Less
Submitted 26 February, 2014;
originally announced February 2014.
-
Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models
Authors:
Jushan Bai,
Yuan Liao
Abstract:
While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the s…
▽ More
While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the statistical inference and estimation efficiency of structural parameters based on large covariance estimators. For efficient estimation, both models call for a weighted principle components (WPC), which relies on a high dimensional weight matrix. This paper derives an efficient and feasible WPC using the covariance matrix estimator of Fan et al. (2013). However, we demonstrate that existing results on large covariance estimation based on absolute convergence are not suitable for statistical inferences of the structural parameters. What is needed is some weighted consistency and the associated rate of convergence, which are obtained in this paper. Finally, the proposed method is applied to the US divorce rate data. We find that the efficient WPC identifies the significant effects of divorce-law reforms on the divorce rate, and it provides more accurate estimation and tighter confidence intervals than existing methods.
△ Less
Submitted 12 November, 2013; v1 submitted 9 July, 2013;
originally announced July 2013.
-
Statistical analysis of factor models of high dimension
Authors:
Jushan Bai,
Kunpeng Li
Abstract:
This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We…
▽ More
This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered.
△ Less
Submitted 30 May, 2012;
originally announced May 2012.
-
Panel Cointegration with Global Stochastic Trends
Authors:
Jushan Bai,
Chihwa Kao,
Serena Ng
Abstract:
This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting est…
▽ More
This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting estimators are referred to respectively as CupBC (continuously-updated and bias-corrected) and the CupFM (continuously-updated and fully-modified) estimators. We establish their consistency and derive their limiting distributions. Both are asymptotically unbiased and asymptotically mixed normal and permit inference to be conducted using standard test statistics. The estimators are also valid when there are mixed stationary and non-stationary factors, as well as when the factors are all stationary.
△ Less
Submitted 12 May, 2008;
originally announced May 2008.