Search | arXiv e-print repository

A point cloud reconstruction method based on uncertainty feature enhancement for aerodynamic shape optimization

Authors: Junlin Li, Yang Zhang, Bo Pang, Junqiang Bai, Jiakuan Xu

Abstract: The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architectu… ▽ More The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architecture, called AE-BUFE, designed to attain efficient and precise generalized representations of 3D aircraft through uncertainty analysis of the deformation relationships among surface grid points. The deep learning architecture consists of two components: the uncertainty index-based feature enhancement module and the point cloud autoencoder module. It learns the shape features of the point cloud geometric representation to establish a low-dimensional latent space. To assess and evaluate the efficiency of the method, a comparison was conducted with the prevailing point cloud autoencoder architecture and the proper orthogonal decomposition (POD) linear dimensionality reduction method under conditions of complex shape deformation. The results showed that the new architecture significantly improved the extraction effect of the low-dimensional latent space. Then, we developed the SBO optimization framework based on the AE-BUFE parameterization method and completed a multi-objective aerodynamic optimization design for a wide-speed-range vehicle considering volume and moment constraints. While ensuring the take-off and landing performance, the aerodynamic performance is improved at transonic and hypersonic conditions, which verifies the efficiency and engineering practicability of this method. △ Less

Submitted 2 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

Comments: There are some problems with the data that can lead to wrong conclusions

arXiv:2412.09769 [pdf, other]

A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

Authors: Yu Shao, Jiawen Bai, Yingze Hou, Xia'an Zhou, Zhanhao Pan

Abstract: The credit spread is a key indicator in bond investments, offering valuable insights for fixed-income investors to devise effective trading strategies. This study proposes a novel credit spread forecasting model leveraging ensemble learning techniques. To enhance predictive accuracy, a feature selection method based on mutual information is incorporated. Empirical results demonstrate that the prop… ▽ More The credit spread is a key indicator in bond investments, offering valuable insights for fixed-income investors to devise effective trading strategies. This study proposes a novel credit spread forecasting model leveraging ensemble learning techniques. To enhance predictive accuracy, a feature selection method based on mutual information is incorporated. Empirical results demonstrate that the proposed methodology delivers superior accuracy in credit spread predictions. Additionally, we present a forecast of future credit spread trends using current data, providing actionable insights for investment decision-making. △ Less

Submitted 12 December, 2024; originally announced December 2024.

Comments: 7 pages, 5 figures

arXiv:2407.05369 [pdf, other]

A Novel Property of Generalized Fibonacci Sequence in Grids

Authors: Zixian Yang, Jianchao Bai

Abstract: Fibonacci sequence, generated by summing the preceding two terms, is a classical sequence renowned for its elegant properties. In this paper, leveraging properties of generalized Fibonacci sequences and formulas for consecutive sums of equidistant subsequences, we investigate the ratio of the sum of numbers along main-diagonal and sub-diagonal of odd-order grids containing generalized Fibonacci se… ▽ More Fibonacci sequence, generated by summing the preceding two terms, is a classical sequence renowned for its elegant properties. In this paper, leveraging properties of generalized Fibonacci sequences and formulas for consecutive sums of equidistant subsequences, we investigate the ratio of the sum of numbers along main-diagonal and sub-diagonal of odd-order grids containing generalized Fibonacci sequences. We show that this ratio is solely dependent on the order of the grid, providing a concise and splendid identity. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.11045 [pdf, other]

doi 10.1016/j.cma.2024.117518

Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov Arnold Networks

Authors: Yizheng Wang, Jia Sun, Jinshuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu

Abstract: AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in… ▽ More AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in various forms, such as strong form, energy form, and inverse form. While mathematically equivalent, these forms are not computationally equivalent, making the exploration of different PDE formulations significant in computational physics. Thus, we propose different PDE forms based on KAN instead of MLP, termed Kolmogorov-Arnold-Informed Neural Network (KINN) for solving forward and inverse problems. We systematically compare MLP and KAN in various numerical examples of PDEs, including multi-scale, singularity, stress concentration, nonlinear hyperelasticity, heterogeneous, and complex geometry problems. Our results demonstrate that KINN significantly outperforms MLP regarding accuracy and convergence speed for numerous PDEs in computational solid mechanics, except for the complex geometry problem. This highlights KINN's potential for more efficient and accurate PDE solutions in AI for PDEs. △ Less

Submitted 4 August, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Journal ref: Comput. Methods Appl. Mech. Engrg. 433 (2025) 117518

arXiv:2403.18707 [pdf, other]

On the Reachability of 3-Dimensional Paths with a Prescribed Curvature Bound

Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee, Chang-Hun Lee

Abstract: This paper presents the reachability analysis of curves in $\mathbb{R}^3$ with a prescribed curvature bound. Based on Pontryagin Maximum Principle, we leverage the existing knowledge on the structure of solutions to minimum-time problems, or Markov-Dubins problem, to reachability considerations. Based on this development, two types of reachability are discussed. First, we prove that any boundary p… ▽ More This paper presents the reachability analysis of curves in $\mathbb{R}^3$ with a prescribed curvature bound. Based on Pontryagin Maximum Principle, we leverage the existing knowledge on the structure of solutions to minimum-time problems, or Markov-Dubins problem, to reachability considerations. Based on this development, two types of reachability are discussed. First, we prove that any boundary point of the reachability set, with the directional component taken into account as well as geometric coordinates, can be reached via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Second, we show that the reachability set when directional component is not considered\textemdash{}the position reachability set\textemdash{}is simply a solid of revolution of its two-dimensional counterpart, the Dubins car. These findings extend the developments presented in literature on Dubins car into spatial curves in $\mathbb{R}^3$. △ Less

Submitted 26 March, 2025; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted for publication in Automatica

arXiv:2403.02015 [pdf, other]

A Unified Inexact Stochastic ADMM for Composite Nonconvex and Nonsmooth Optimization

Authors: Yuxuan Zeng, Jianchao Bai, Shengjia Wang, Zhiguo Wang

Abstract: In this paper, we propose a unified framework of inexact stochastic Alternating Direction Method of Multipliers (ADMM) for solving nonconvex problems subject to linear constraints, whose objective comprises an average of finite-sum smooth functions and a nonsmooth but possibly nonconvex function. The new framework is highly versatile. Firstly, it not only covers several existing algorithms such as… ▽ More In this paper, we propose a unified framework of inexact stochastic Alternating Direction Method of Multipliers (ADMM) for solving nonconvex problems subject to linear constraints, whose objective comprises an average of finite-sum smooth functions and a nonsmooth but possibly nonconvex function. The new framework is highly versatile. Firstly, it not only covers several existing algorithms such as SADMM, SVRG-ADMM, and SPIDER-ADMM but also guides us to design a novel accelerated hybrid stochastic ADMM algorithm, which utilizes a new hybrid estimator to trade-off variance and bias. Second, it enables us to exploit a more flexible dual stepsize in the convergence analysis. Under some mild conditions, our unified framework preserves $\mathcal{O}(1/T)$ sublinear convergence. Additionally, we establish the linear convergence under error bound conditions. Finally, numerical experiments demonstrate the efficacy of the new algorithm for some nonsmooth and nonconvex problems. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2311.18634 [pdf, other]

Maximal exponent of the Lorentz cones

Authors: Guillaume Aubrun, Jing Bai

Abstract: We show that the maximal exponent (i.e., the minimum number of iterations required for a primitive map to become strictly positive) of the n-dimensional Lorentz cone is equal to n. As a byproduct, we show that the optimal exponent in the quantum Wielandt inequality for qubit channels is equal to 3. We show that the maximal exponent (i.e., the minimum number of iterations required for a primitive map to become strictly positive) of the n-dimensional Lorentz cone is equal to n. As a byproduct, we show that the optimal exponent in the quantum Wielandt inequality for qubit channels is equal to 3. △ Less

Submitted 12 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 1 figure. v2: simplified the proof, added a section about quantum Wielandt inequality

MSC Class: 52120

arXiv:2306.05899 [pdf, other]

An Accelerated Stochastic ADMM for Nonconvex and Nonsmooth Finite-Sum Optimization

Authors: Yuxuan Zeng, Zhiguo Wang, Jianchao Bai, Xiaojing Shen

Abstract: The nonconvex and nonsmooth finite-sum optimization problem with linear constraint has attracted much attention in the fields of artificial intelligence, computer, and mathematics, due to its wide applications in machine learning and the lack of efficient algorithms with convincing convergence theories. A popular approach to solve it is the stochastic Alternating Direction Method of Multipliers (A… ▽ More The nonconvex and nonsmooth finite-sum optimization problem with linear constraint has attracted much attention in the fields of artificial intelligence, computer, and mathematics, due to its wide applications in machine learning and the lack of efficient algorithms with convincing convergence theories. A popular approach to solve it is the stochastic Alternating Direction Method of Multipliers (ADMM), but most stochastic ADMM-type methods focus on convex models. In addition, the variance reduction (VR) and acceleration techniques are useful tools in the development of stochastic methods due to their simplicity and practicability in providing acceleration characteristics of various machine learning models. However, it remains unclear whether accelerated SVRG-ADMM algorithm (ASVRG-ADMM), which extends SVRG-ADMM by incorporating momentum techniques, exhibits a comparable acceleration characteristic or convergence rate in the nonconvex setting. To fill this gap, we consider a general nonconvex nonsmooth optimization problem and study the convergence of ASVRG-ADMM. By utilizing a well-defined potential energy function, we establish its sublinear convergence rate $O(1/T)$, where $T$ denotes the iteration number. Furthermore, under the additional Kurdyka-Lojasiewicz (KL) property which is less stringent than the frequently used conditions for showcasing linear convergence rates, such as strong convexity, we show that the ASVRG-ADMM sequence has a finite length and converges to a stationary solution with a linear convergence rate. Several experiments on solving the graph-guided fused lasso problem and regularized logistic regression problem validate that the proposed ASVRG-ADMM performs better than the state-of-the-art methods. △ Less

Submitted 3 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 40 Pages, 8 figures

arXiv:2208.11023 [pdf, ps, other]

Grassmann Tensors and their applications in geometry

Authors: Changqing Xu, Kaijie Xu, Jun Wang, Jingxuan Bai

Abstract: In this paper, we introduce the Grassmann tensor by tensor product of vectors and some basic terminology in tensor theory. Some basic properties of the Grassmann tensors are investigated and the tensor language is used to rewrite some relations and correspondences in the mutliview geometry. Finally we show that a polytope in the Euclidean space $\R^{n}$ can also be concisely expressed as the Grass… ▽ More In this paper, we introduce the Grassmann tensor by tensor product of vectors and some basic terminology in tensor theory. Some basic properties of the Grassmann tensors are investigated and the tensor language is used to rewrite some relations and correspondences in the mutliview geometry. Finally we show that a polytope in the Euclidean space $\R^{n}$ can also be concisely expressed as the Grassmann tensor generated by its vertices. △ Less

Submitted 3 September, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: 17 pages

MSC Class: 53A45; 15A69

arXiv:2205.13156 [pdf, ps, other]

On complete hypersurfaces with constant scalar curvature $n(n-1)$ in the unit sphere

Authors: Jinchuan Bai, Yong Luo

Abstract: Let $M^n$ be an $n$-dimensional complete and locally conformally flat hypersurface in the unit sphere $\mathbb{S}^{n+1}$ with constant scalar curvature $n(n-1)$. We show that if the total curvature $\left( \int _ { M } | H | ^ { n } d v \right) ^ { \frac { 1 } { n } }$ of $M$ is sufficiently small, then $M^n$ is totally geodesic. Let $M^n$ be an $n$-dimensional complete and locally conformally flat hypersurface in the unit sphere $\mathbb{S}^{n+1}$ with constant scalar curvature $n(n-1)$. We show that if the total curvature $\left( \int _ { M } | H | ^ { n } d v \right) ^ { \frac { 1 } { n } }$ of $M$ is sufficiently small, then $M^n$ is totally geodesic. △ Less

Submitted 16 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Typos and inaccuracies corrected, accepted by Kodai Mathematical Journal

arXiv:2202.13694 [pdf, ps, other]

Quotients of Palindromic and Antipalindromic Numbers

Authors: James Haoyu Bai, Joseph Meleshko, Samin Riasat, Jeffrey Shallit

Abstract: A natural number N is said to be palindromic if its binary representation reads the same forwards and backwards. In this paper we study the quotients of two palindromic numbers and answer some basic questions about the resulting sets of integers and rational numbers. For example, we show that the following problem is algorithmically decidable: given an integer N, determine if we can write N = A/B… ▽ More A natural number N is said to be palindromic if its binary representation reads the same forwards and backwards. In this paper we study the quotients of two palindromic numbers and answer some basic questions about the resulting sets of integers and rational numbers. For example, we show that the following problem is algorithmically decidable: given an integer N, determine if we can write N = A/B for palindromic numbers A and B. Given that N is representable, we find a bound on the size of the numerator of the smallest representation. We prove that the set of unrepresentable integers has positive density in N. We also obtain similar results for quotients of antipalindromic numbers (those for which the first half of the binary representation is the reverse complement of the second half). We also provide examples, numerical data, and a number of intriguing conjectures and open problems. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2108.11125 [pdf, ps, other]

A new insight on augmented Lagrangian method with applications in machine learning

Authors: Jianchao Bai, Linyuan Jia, Zheng Peng

Abstract: By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergenc… ▽ More By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergence results about the sublinear and linear convergence rates are established by variational characterizations for both the saddle-point of the problem and the first-order optimality conditions of involved subproblems. A large number of experiments on testing the linear support vector machine problem and the robust principal component analysis problem arising from machine learning indicate that our proposed algorithms perform much better than several state-of-the-art algorithms. △ Less

Submitted 13 June, 2025; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: 33 pages

Journal ref: Journal of Scientific Computing, 99, 53, (2024)

arXiv:2103.16752 [pdf, ps, other]

Iteration complexity analysis of a partial LQP-based alternating direction method of multipliers

Authors: Jianchao Bai, Yuxue Ma, Hao Sun, Miao Zhang

Abstract: In this paper, we consider a prototypical convex optimization problem with multi-block variables and separable structures. By adding the Logarithmic Quadratic Proximal (LQP) regularizer with suitable proximal parameter to each of the first grouped subproblems, we develop a partial LQP-based Alternating Direction Method of Multipliers (ADMM-LQP). The dual variable is updated twice with relatively l… ▽ More In this paper, we consider a prototypical convex optimization problem with multi-block variables and separable structures. By adding the Logarithmic Quadratic Proximal (LQP) regularizer with suitable proximal parameter to each of the first grouped subproblems, we develop a partial LQP-based Alternating Direction Method of Multipliers (ADMM-LQP). The dual variable is updated twice with relatively larger stepsizes than the classical region $(0,\frac{1+\sqrt{5}}{2})$. Using a prediction-correction approach to analyze properties of the iterates generated by ADMM-LQP, we establish its global convergence and sublinear convergence rate of $O(1/T)$ in the new ergodic and nonergodic senses, where $T$ denotes the iteration index. We also extend the algorithm to a nonsmooth composite convex optimization and establish {similar convergence results} as our ADMM-LQP. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: 22 pages

arXiv:2103.16154 [pdf, ps, other]

Convergence on a symmetric accelerated stochastic ADMM with larger stepsizes

Authors: Jianchao Bai, Deren Han, Hao Sun, Hongchao Zhang

Abstract: In this paper, we develop a symmetric accelerated stochastic Alternating Direction Method of Multipliers (SAS-ADMM) for solving separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and an average function of many smooth convex functions. Our proposed algorithm combines both ideas of ADMM and the techniques of acce… ▽ More In this paper, we develop a symmetric accelerated stochastic Alternating Direction Method of Multipliers (SAS-ADMM) for solving separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and an average function of many smooth convex functions. Our proposed algorithm combines both ideas of ADMM and the techniques of accelerated stochastic gradient methods possibly with variance reduction to solve the smooth subproblem. One main feature of SAS-ADMM is that its dual variable is symmetrically updated after each update of the separated primal variable, which would allow a more flexible and larger convergence region of the dual variable compared with that of standard deter-ministic or stochastic ADMM. This new stochastic optimization algorithm is shown to have ergodic converge in expectation with O(1/T) convergence rate, where T is the number of outer iterations. Our preliminary experiments indicate the proposed algorithm is very effective for solving separable optimization problems from big-data applications. Finally, 3-block extensions of the algorithm and its variant of an accelerated stochastic augmented Lagrangian method are discussed in the appendix. △ Less

Submitted 19 December, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: Accepted by CSIAM-AM

arXiv:2011.07439 [pdf, other]

Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

Authors: Jincheng Bai, Qifan Song, Guang Cheng

Abstract: Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally inf… ▽ More Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally infeasible. In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors, and develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution. The variational posterior contraction rate is provided, which justifies the consistency of the proposed variational Bayes method. Notably, our empirical results demonstrate that this variational procedure provides uncertainty quantification in terms of Bayesian predictive distribution and is also capable to accomplish consistent variable selection by training a sparse multi-layer neural network. △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2010.12887 [pdf, ps, other]

Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

Authors: Jincheng Bai, Qifan Song, Guang Cheng

Abstract: We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alter… ▽ More We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alternative of Markov Chain Monte Carlo (MCMC) sampling. Meanwhile, comparing to conventional MCMC methods, the VB procedure achieves much higher computational efficiency, which greatly alleviates the computing burden for modern machine learning applications such as massive data analysis. Through numerical studies, we demonstrate that the proposed VB method leads to shorter computing time, higher estimation accuracy, and lower variable selection error than competitive sparse Bayesian methods. △ Less

Submitted 24 October, 2020; originally announced October 2020.

arXiv:2010.12765 [pdf, ps, other]

An Inexact Accelerated Stochastic ADMM for Separable Convex Optimization

Authors: Jianchao Bai, William W. Hager, Hongchao Zhang

Abstract: An inexact accelerated stochastic Alternating Direction Method of Multipliers (AS-ADMM) scheme is developed for solving structured separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and a smooth function which is an average of many component convex functions. Problems having this structure often arise in machine… ▽ More An inexact accelerated stochastic Alternating Direction Method of Multipliers (AS-ADMM) scheme is developed for solving structured separable convex optimization problems with linear constraints. The objective function is the sum of a possibly nonsmooth convex function and a smooth function which is an average of many component convex functions. Problems having this structure often arise in machine learning and data mining applications. AS-ADMM combines the ideas of both ADMM and the stochastic gradient methods using variance reduction techniques. One of the ADMM subproblems employs a linearization technique while a similar linearization could be introduced for the other subproblem. For a specified choice of the algorithm parameters, it is shown that the objective error and the constraint violation are $\mathcal{O}(1/k)$ relative to the number of outer iterations $k$. Under a strong convexity assumption, the expected iterate error converges to zero linearly. A linearized variant of AS-ADMM and incremental sampling strategies are also discussed. Numerical experiments with both stochastic and deterministic ADMM algorithms show that AS-ADMM can be particularly effective for structured optimization arising in big data applications. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:1910.04355 [pdf, other]

Adaptive Variational Bayesian Inference for Sparse Deep Neural Network

Authors: Jincheng Bai, Qifan Song, Guang Cheng

Abstract: In this work, we focus on variational Bayesian inference on the sparse Deep Neural Network (DNN) modeled under a class of spike-and-slab priors. Given a pre-specified sparse DNN structure, the corresponding variational posterior contraction rate is characterized that reveals a trade-off between the variational error and the approximation error, which are both determined by the network structural c… ▽ More In this work, we focus on variational Bayesian inference on the sparse Deep Neural Network (DNN) modeled under a class of spike-and-slab priors. Given a pre-specified sparse DNN structure, the corresponding variational posterior contraction rate is characterized that reveals a trade-off between the variational error and the approximation error, which are both determined by the network structural complexity (i.e., depth, width and sparsity). However, the optimal network structure, which strikes the balance of the aforementioned trade-off and yields the best rate, is generally unknown in reality. Therefore, our work further develops an {\em adaptive} variational inference procedure that can automatically select a reasonably good (data-dependent) network structure that achieves the best contraction rate, without knowing the optimal network structure. In particular, when the true function is H{ö}lder smooth, the adaptive variational inference is capable to attain (near-)optimal rate without the knowledge of smoothness level. The above rate still suffers from the curse of dimensionality, and thus motivates the teacher-student setup, i.e., the true function is a sparse DNN model, under which the rate only logarithmically depends on the input dimension. △ Less

Submitted 2 August, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

arXiv:1907.11307 [pdf, other]

DEAM: Adaptive Momentum with Discriminative Weight for Stochastic Optimization

Authors: Jiyang Bai, Yuxiang Ren, Jiawei Zhang

Abstract: Optimization algorithms with momentum, e.g., (ADAM), have been widely used for building deep learning models due to the faster convergence rates compared with stochastic gradient descent (SGD). Momentum helps accelerate SGD in the relevant directions in parameter updating, which can minify the oscillations of parameters update route. However, there exist errors in some update steps in optimization… ▽ More Optimization algorithms with momentum, e.g., (ADAM), have been widely used for building deep learning models due to the faster convergence rates compared with stochastic gradient descent (SGD). Momentum helps accelerate SGD in the relevant directions in parameter updating, which can minify the oscillations of parameters update route. However, there exist errors in some update steps in optimization algorithms with momentum like ADAM. The fixed momentum weight (e.g., β_1 in ADAM) will propagate errors in momentum computing. In this paper, we introduce a novel optimization algorithm, namely Discriminative wEight on Adaptive Momentum (DEAM). Instead of assigning the momentum term weight with a fixed hyperparameter, DEAM proposes to compute the momentum weight automatically based on the discriminative angle. In this way, DEAM involves fewer hyperparameters. DEAM also contains a novel backtrack term, which restricts redundant updates when the correction of the last step is needed. Extensive experiments demonstrate that DEAM can achieve a faster convergence rate than the existing optimization algorithms in training the deep learning models of both convex and non-convex situations. △ Less

Submitted 22 January, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

arXiv:1907.04469 [pdf, ps, other]

A family of multi-parameterized proximal point algorithms

Authors: Jianchao Bai, Ke Guo, Xiaokai Chang

Abstract: In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that… ▽ More In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that the proposed algorithm performs better than some well-established methods △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 7 pages

arXiv:1906.12015 [pdf, ps, other]

Accelerated Symmetric ADMM and Its Applications in Signal Processing

Authors: Jianchao Bai, Junli Liang, Ke Guo, Yang Jing

Abstract: The alternating direction method of multipliers (ADMM) were extensively investigated in the past decades for solving separable convex optimization problems. Fewer researchers focused on exploring its convergence properties for the nonconvex case although it performed surprisingly efficient. In this paper, we propose a symmetric ADMM based on different acceleration techniques for a family of potent… ▽ More The alternating direction method of multipliers (ADMM) were extensively investigated in the past decades for solving separable convex optimization problems. Fewer researchers focused on exploring its convergence properties for the nonconvex case although it performed surprisingly efficient. In this paper, we propose a symmetric ADMM based on different acceleration techniques for a family of potentially nonsmooth nonconvex programing problems with equality constraints, where the dual variables are updated twice with different stepsizes. Under proper assumptions instead of using the so-called Kurdyka-Lojasiewicz inequality, convergence of the proposed algorithm as well as its pointwise iteration-complexity are analyzed in terms of the corresponding augmented Lagrangian function and the primal-dual residuals, respectively. Performance of our algorithm is verified by some preliminary numerical examples on applications in sparse nonconvex/convex regularized minimization signal processing problems. △ Less

Submitted 1 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: 20 pages, submitted

arXiv:1906.07888 [pdf, ps, other]

Convergence Revisit on Generalized Symmetric ADMM

Authors: Jianchao Bai, Xiaokai Chang, Jicheng Li, Fengmin Xu

Abstract: In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remain… ▽ More In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM. △ Less

Submitted 18 June, 2019; originally announced June 2019.

Comments: 16 pages

arXiv:1812.04876 [pdf, other]

Proximal extrapolated gradient methods with prediction and correction for monotone variational inequalities

Authors: Xiaokai Chang, Sanyang Liu, Jianchao Bai, Jun Yang

Abstract: An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, w… ▽ More An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, which is even weaker than the boundedness of sequence generated and always holds when the operator is the gradient of a convex function. We establish its convergence and ergodic convergence rate in theory under the larger range of parameters. Furthermore, we improve numerical efficiency by employing the proposed method with non-monotonic step size, and obtain the upper bound of the parameter relating to step size by an extremely simple example. Related numerical experiments illustrate the improvements in efficiency from the larger step size. △ Less

Submitted 4 December, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

Comments: 22 pages, 10 figures

arXiv:1812.04228 [pdf, ps, other]

doi 10.1016/j.laa.2014.07.026

On the generalized low rank approximation of the correlation matrices arising in the asset portfolio

Authors: Xuefeng Duan, Jianchao Bai, Maojun Zhang, Xinjun Zhang

Abstract: In this paper, we consider the generalized low rank approximation of the correlation matrices problem which arises in the asset portfolio. We first characterize the feasible set by using the Gramian representation together with a special trigonometric function transform, and then transform the generalized low rank approximation of the correlation matrices problem into an unconstrained optimization… ▽ More In this paper, we consider the generalized low rank approximation of the correlation matrices problem which arises in the asset portfolio. We first characterize the feasible set by using the Gramian representation together with a special trigonometric function transform, and then transform the generalized low rank approximation of the correlation matrices problem into an unconstrained optimization problem. Finally, we use the conjugate gradient algorithm with the strong Wolfe line search to solve the unconstrained optimization problem. Numerical examples show that our new method is feasible and effective. △ Less

Submitted 11 December, 2018; originally announced December 2018.

Journal ref: Linear Algebra and its Applications, 461 (2014) 1-17

arXiv:1812.03769 [pdf, ps, other]

doi 10.1007/s10589-017-9971-0

Generalized Symmetric ADMM for Separable Convex Optimization

Authors: Jianchao Bai, Jicheng Li, Fengmin Xu, Hongchao Zhang

Abstract: The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GS-ADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multi-block separable convex programming. This GS-ADMM partitions the data into two group va… ▽ More The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GS-ADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multi-block separable convex programming. This GS-ADMM partitions the data into two group variables so that one group consists of $p$ block variables while the other has $q$ block variables, where $p \ge 1$ and $q \ge 1$ are two integers. The two grouped variables are updated in a {\it Gauss-Seidel} scheme, while the variables within each group are updated in a {\it Jacobi} scheme, which would make it very attractive for a big data setting. By adding proper proximal terms to the subproblems, we specify the domain of the stepsizes to guarantee that GS-ADMM is globally convergent with a worst-case $O(1/t)$ ergodic convergence rate. It turns out that our convergence domain of the stepsizes is significantly larger than other convergence domains in the literature. Hence, the GS-ADMM is more flexible and attractive on choosing and using larger stepsizes of the dual variable. Besides, two special cases of GS-ADMM, which allows using zero penalty terms, are also discussed and analyzed. Compared with several state-of-the-art methods, preliminary numerical experiments on solving a sparse matrix minimization problem in the statistical learning show that our proposed method is effective and promising. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Journal ref: Computational Optimization and Applications (2018) 70, 129-170

arXiv:1812.03763 [pdf, ps, other]

doi 10.1080/00207160.2018.1427854

General parameterized proximal point algorithm with applications in statistical learning

Authors: Jianchao Bai, Jicheng Li, Pingfan Dai, Jiaofen Li

Abstract: In the literature, there are a few researches to design some parameters in the Proximal Point Algorithm (PPA), especially for the multi-objective convex optimizations. Introducing some parameters to PPA can make it more flexible and attractive. Mainly motivated by our recent work (Bai et al., A parameterized proximal point algorithm for separable convex optimization, Optim. Lett. (2017) doi: 10.10… ▽ More In the literature, there are a few researches to design some parameters in the Proximal Point Algorithm (PPA), especially for the multi-objective convex optimizations. Introducing some parameters to PPA can make it more flexible and attractive. Mainly motivated by our recent work (Bai et al., A parameterized proximal point algorithm for separable convex optimization, Optim. Lett. (2017) doi: 10.1007/s11590-017-1195-9), in this paper we develop a general parameterized PPA with a relaxation step for solving the multi-block separable structured convex programming. By making use of the variational inequality and some mathematical identities, the global convergence and the worst-case $\mathcal{O}(1/t)$ convergence rate of the proposed algorithm are established. Preliminary numerical experiments on solving a sparse matrix minimization problem from statistical learning validate that our algorithm is more efficient than several state-of-the-art algorithms. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Journal ref: International Journal of Computer Mathematics, 2019, 96(1), 199-215

arXiv:1812.03759 [pdf, ps, other]

doi 10.1007/s11590-017-1195-9

A parameterized proximal point algorithm for separable convex optimization

Authors: Jianchao Bai, Hongchao Zhang, Jicheng Li

Abstract: In this paper, we develop a parameterized proximal point algorithm (P-PPA) for solving a class of separable convex programming problems subject to linear and convex constraints. The proposed algorithm is provable to be globally convergent with a worst-case O(1/t) convergence rate, wheret denotes the iteration number. By properly choosing the algorithm parameters, numerical experiments on solving a… ▽ More In this paper, we develop a parameterized proximal point algorithm (P-PPA) for solving a class of separable convex programming problems subject to linear and convex constraints. The proposed algorithm is provable to be globally convergent with a worst-case O(1/t) convergence rate, wheret denotes the iteration number. By properly choosing the algorithm parameters, numerical experiments on solving a sparse optimization problem arising from statistical learning show that our P-PPA could perform significantly better than other state-of-the-art methods, such as the alternating direction method of multipliers and the relaxed proximal point algorithm. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Journal ref: Optimization Letters (2018) 12, 1589-1608

arXiv:1706.02827 [pdf, other]

doi 10.1016/j.camwa.2017.08.001

An improved immersed finte element particle-in-cell method for plasma simulation

Authors: Jinwei Bai, Yong Cao, Yuchuan Chu, Xu Zhang

Abstract: The particle-in-cell (PIC) method has been widely used for plasma simulation, because of its noise-reduction capability and moderate computational cost. The immersed finite element (IFE) method is efficient for solving interface problems on Cartesian meshes, which is desirable for PIC method. The combination of these two methods provides an effective tool for plasma simulation with complex interfa… ▽ More The particle-in-cell (PIC) method has been widely used for plasma simulation, because of its noise-reduction capability and moderate computational cost. The immersed finite element (IFE) method is efficient for solving interface problems on Cartesian meshes, which is desirable for PIC method. The combination of these two methods provides an effective tool for plasma simulation with complex interface/boundary. This paper introduces an improved IFE-PIC method that enhances the performance in both IFE and PIC aspects. For the electric field solver, we adopt the newly developed partially penalized IFE method with enhanced accuracy. For PIC implementation, we introduce a new interpolation technique to ensure the conservation of the charge. Numerical examples are provided to demonstrate the features of the improved IFE-PIC method. △ Less

Submitted 9 June, 2017; originally announced June 2017.

arXiv:1402.6550 [pdf, ps, other]

doi 10.1214/13-AOS1183

Theory and methods of panel data models with interactive effects

Authors: Jushan Bai, Kunpeng Li

Abstract: This paper considers the maximum likelihood estimation of panel data models with interactive effects. Motivated by applications in economics and other social sciences, a notable feature of the model is that the explanatory variables are correlated with the unobserved effects. The usual within-group estimator is inconsistent. Existing methods for consistent estimation are either designed for panel… ▽ More This paper considers the maximum likelihood estimation of panel data models with interactive effects. Motivated by applications in economics and other social sciences, a notable feature of the model is that the explanatory variables are correlated with the unobserved effects. The usual within-group estimator is inconsistent. Existing methods for consistent estimation are either designed for panel data with short time periods or are less efficient. The maximum likelihood estimator has desirable properties and is easy to implement, as illustrated by the Monte Carlo simulations. This paper develops the inferential theory for the maximum likelihood estimator, including consistency, rate of convergence and the limiting distributions. We further extend the model to include time-invariant regressors and common regressors (cross-section invariant). The regression coefficients for the time-invariant regressors are time-varying, and the coefficients for the common regressors are cross-sectionally varying. △ Less

Submitted 26 February, 2014; originally announced February 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1183 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1183

Journal ref: Annals of Statistics 2014, Vol. 42, No. 1, 142-170

arXiv:1307.2662 [pdf, ps, other]

Statistical Inferences Using Large Estimated Covariances for Panel Data and Factor Models

Authors: Jushan Bai, Yuan Liao

Abstract: While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the s… ▽ More While most of the convergence results in the literature on high dimensional covariance matrix are concerned about the accuracy of estimating the covariance matrix (and precision matrix), relatively less is known about the effect of estimating large covariances on statistical inferences. We study two important models: factor analysis and panel data model with interactive effects, and focus on the statistical inference and estimation efficiency of structural parameters based on large covariance estimators. For efficient estimation, both models call for a weighted principle components (WPC), which relies on a high dimensional weight matrix. This paper derives an efficient and feasible WPC using the covariance matrix estimator of Fan et al. (2013). However, we demonstrate that existing results on large covariance estimation based on absolute convergence are not suitable for statistical inferences of the structural parameters. What is needed is some weighted consistency and the associated rate of convergence, which are obtained in this paper. Finally, the proposed method is applied to the US divorce rate data. We find that the efficient WPC identifies the significant effects of divorce-law reforms on the divorce rate, and it provides more accurate estimation and tighter confidence intervals than existing methods. △ Less

Submitted 12 November, 2013; v1 submitted 9 July, 2013; originally announced July 2013.

arXiv:1205.6617 [pdf, ps, other]

doi 10.1214/11-AOS966

Statistical analysis of factor models of high dimension

Authors: Jushan Bai, Kunpeng Li

Abstract: This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We… ▽ More This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered. △ Less

Submitted 30 May, 2012; originally announced May 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS966 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS966

Journal ref: Annals of Statistics 2012, Vol. 40, No. 1, 436-465

arXiv:0805.1768 [pdf, ps, other]

Panel Cointegration with Global Stochastic Trends

Authors: Jushan Bai, Chihwa Kao, Serena Ng

Abstract: This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting est… ▽ More This paper studies estimation of panel cointegration models with cross-sectional dependence generated by unobserved global stochastic trends. The standard least squares estimator is, in general, inconsistent owing to the spuriousness induced by the unobservable I(1) trends. We propose two iterative procedures that jointly estimate the slope parameters and the stochastic trends. The resulting estimators are referred to respectively as CupBC (continuously-updated and bias-corrected) and the CupFM (continuously-updated and fully-modified) estimators. We establish their consistency and derive their limiting distributions. Both are asymptotically unbiased and asymptotically mixed normal and permit inference to be conducted using standard test statistics. The estimators are also valid when there are mixed stationary and non-stationary factors, as well as when the factors are all stationary. △ Less

Submitted 12 May, 2008; originally announced May 2008.

Showing 1–32 of 32 results for author: Bai, J