-
Connections between convex optimization algorithms and subspace correction methods
Authors:
Boou Jiang,
Jongho Park,
Jinchao Xu
Abstract:
We show that a broad range of convex optimization algorithms, including alternating projection, operator splitting, and multiplier methods, can be systematically derived from the framework of subspace correction methods via convex duality. To formalize this connection, we introduce the notion of dualization, a process that transforms an iterative method for the dual problem into an equivalent meth…
▽ More
We show that a broad range of convex optimization algorithms, including alternating projection, operator splitting, and multiplier methods, can be systematically derived from the framework of subspace correction methods via convex duality. To formalize this connection, we introduce the notion of dualization, a process that transforms an iterative method for the dual problem into an equivalent method for the primal problem. This concept establishes new connections across these algorithmic classes, encompassing both well-known and new methods. In particular, we show that classical algorithms such as the von Neumann, Dykstra, Peaceman--Rachford, and Douglas--Rachford methods can be interpreted as dualizations of subspace correction methods applied to appropriate dual formulations. Beyond unifying existing methods, our framework enables the systematic development of new algorithms for convex optimization. For instance, we derive parallel variants of alternating projection and operator splitting methods, as dualizations of parallel subspace correction methods, that are well-suited for large-scale problems on modern computing architectures and offer straightforward convergence guarantees. We also propose new alternating direction method of multipliers-type algorithms, derived as dualizations of certain operator splitting methods. These algorithms naturally ensure convergence even in the multi-block setting, where the conventional method does not guarantee convergence when applied to more than two blocks. This unified perspective not only facilitates algorithm design and the transfer of theoretical results but also opens new avenues for research and innovation in convex optimization.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
A Learning-Based Inexact ADMM for Solving Quadratic Programs
Authors:
Xi Gao,
Jinxin Xiong,
Linxin Yang,
Akang Wang,
Weiwei Xu,
Jiang Xue
Abstract:
Convex quadratic programs (QPs) constitute a fundamental computational primitive across diverse domains including financial optimization, control systems, and machine learning. The alternating direction method of multipliers (ADMM) has emerged as a preferred first-order approach due to its iteration efficiency - exemplified by the state-of-the-art OSQP solver. Machine learning-enhanced optimizatio…
▽ More
Convex quadratic programs (QPs) constitute a fundamental computational primitive across diverse domains including financial optimization, control systems, and machine learning. The alternating direction method of multipliers (ADMM) has emerged as a preferred first-order approach due to its iteration efficiency - exemplified by the state-of-the-art OSQP solver. Machine learning-enhanced optimization algorithms have recently demonstrated significant success in speeding up the solving process. This work introduces a neural-accelerated ADMM variant that replaces exact subproblem solutions with learned approximations through a parameter-efficient Long Short-Term Memory (LSTM) network. We derive convergence guarantees within the inexact ADMM formalism, establishing that our learning-augmented method maintains primal-dual convergence while satisfying residual thresholds. Extensive experimental results demonstrate that our approach achieves superior solution accuracy compared to existing learning-based methods while delivering significant computational speedups of up to $7\times$, $28\times$, and $22\times$ over Gurobi, SCS, and OSQP, respectively. Furthermore, the proposed method outperforms other learning-to-optimize methods in terms of solution quality. Detailed performance analysis confirms near-perfect compliance with the theoretical assumptions, consequently ensuring algorithm convergence.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Majorization and Inequalities among Complete Homogeneous Symmetric Functions
Authors:
Jia Xu,
Yong Yao
Abstract:
Inequalities among symmetric functions are fundamental in various branches of mathematics, thus motivating a systematic study of their structure. Majorization has been shown to characterize inequalities among commonly used symmetric functions, except for complete homogeneous symmetric functions (shortened as CHs). In 2011, Cuttler, Greene, and Skandera posed a natural question: Can majorization al…
▽ More
Inequalities among symmetric functions are fundamental in various branches of mathematics, thus motivating a systematic study of their structure. Majorization has been shown to characterize inequalities among commonly used symmetric functions, except for complete homogeneous symmetric functions (shortened as CHs). In 2011, Cuttler, Greene, and Skandera posed a natural question: Can majorization also characterize inequalities among CHs? Their work demonstrated that majorization characterizes inequalities among CHs up to degree 7 and suggested exploring its validity for higher degrees. In this paper, we show that, for every degree greater than 7, majorization does not characterize inequalities among CHs.
△ Less
Submitted 14 May, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
Assessing Risk Heterogeneity through Heavy-Tailed Frequency and Severity Mixtures
Authors:
Michael R. Powers,
Jiaxin Xu
Abstract:
In operational risk management and actuarial finance, the analysis of risk often begins by dividing a random damage-generation process into its separate frequency and severity components. In the present article, we construct canonical families of mixture distributions for each of these components, based on a Negative Binomial kernel for frequency and a Gamma kernel for severity. The mixtures are e…
▽ More
In operational risk management and actuarial finance, the analysis of risk often begins by dividing a random damage-generation process into its separate frequency and severity components. In the present article, we construct canonical families of mixture distributions for each of these components, based on a Negative Binomial kernel for frequency and a Gamma kernel for severity. The mixtures are employed to assess the heterogeneity of risk factors underlying an empirical distribution through the shape of the implied mixing distribution. From the duality of the Negative Binomial and Gamma distributions, we first derive necessary and sufficient conditions for heavy-tailed (i.e., inverse power-law) canonical mixtures. We then formulate flexible 4-parameter families of mixing distributions for Geometric and Exponential kernels to generate heavy-tailed 4-parameter mixture models, and extend these mixtures to arbitrary Negative Binomial and Gamma kernels, respectively, yielding 5-parameter mixtures for detecting and measuring risk heterogeneity. To check the robustness of such heterogeneity inferences, we show how a fitted 5-parameter model may be re-expressed in terms of alternative Negative Binomial or Gamma kernels whose associated mixing distributions form a "calibrated" family.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Integral Representations of Sobolev Spaces via ReLU$^k$ Activation Function and Optimal Error Estimates for Linearized Networks
Authors:
Xinliang Liu,
Tong Mao,
Jinchao Xu
Abstract:
This paper presents two main theoretical results concerning shallow neural networks with ReLU$^k$ activation functions. We establish a novel integral representation for Sobolev spaces, showing that every function in $H^{\frac{d+2k+1}{2}}(Ω)$ can be expressed as an $L^2$-weighted integral of ReLU$^k$ ridge functions over the unit sphere. This result mirrors the known representation of Barron spaces…
▽ More
This paper presents two main theoretical results concerning shallow neural networks with ReLU$^k$ activation functions. We establish a novel integral representation for Sobolev spaces, showing that every function in $H^{\frac{d+2k+1}{2}}(Ω)$ can be expressed as an $L^2$-weighted integral of ReLU$^k$ ridge functions over the unit sphere. This result mirrors the known representation of Barron spaces and highlights a fundamental connection between Sobolev regularity and neural network representations. Moreover, we prove that linearized shallow networks -- constructed by fixed inner parameters and optimizing only the linear coefficients -- achieve optimal approximation rates $O(n^{-\frac{1}{2}-\frac{2k+1}{2d}})$ in Sobolev spaces.
△ Less
Submitted 12 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Codimension sequence, grade, and generating degree: an operadic approach
Authors:
Y. -H. Bao,
D. -X. Fu,
J. -N. Xu,
Y. Ye,
J. J. Zhang,
Y. -F. Zhang,
Z. -B. Zhao
Abstract:
We study several classes of operadic ideals of the unital associative algebra operad $\uas$. As an application, we classify quotient operads of $\uas$ of GK-dimension $\leq 6$. This corresponds to a classification of all T-ideals of codimension growth $n^g$ with $g\leq 5$ (or equivalently, varieties of grade $g$ with $g\leq 5$).
We study several classes of operadic ideals of the unital associative algebra operad $\uas$. As an application, we classify quotient operads of $\uas$ of GK-dimension $\leq 6$. This corresponds to a classification of all T-ideals of codimension growth $n^g$ with $g\leq 5$ (or equivalently, varieties of grade $g$ with $g\leq 5$).
△ Less
Submitted 4 May, 2025; v1 submitted 27 April, 2025;
originally announced April 2025.
-
Linear Quadratic Mean Field Stackelberg Games: Open-loop and Feedback Solutions
Authors:
Bing-Chang Wang,
Juanjuan Xu,
Huanshui Zhang,
Yong Liang
Abstract:
This paper investigates open-loop and feedback solutions of linear quadratic mean field (MF) games with a leader and a large number of followers. The leader first gives its strategy and then all the followers cooperate to optimize the social cost as the sum of their costs. By variational analysis with MF approximations, we obtain a set of open-loop controls of players in terms of solutions to MF f…
▽ More
This paper investigates open-loop and feedback solutions of linear quadratic mean field (MF) games with a leader and a large number of followers. The leader first gives its strategy and then all the followers cooperate to optimize the social cost as the sum of their costs. By variational analysis with MF approximations, we obtain a set of open-loop controls of players in terms of solutions to MF forward-backward stochastic differential equations (FBSDEs), which is further shown be to an asymptotic Stackelberg equilibrium. By applying the matrix maximum principle, a set of decentralized feedback strategies is constructed for all the players. For open-loop and feedback solutions, the corresponding optimal costs of all players are explicitly given by virtue of the solutions to two Riccati equations, respectively. The performances of two solutions are compared by the numerical simulation.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Distributed Solving of Linear Quadratic Optimal Controller with Terminal State Constraint
Authors:
Wenjing Yang,
Juanjuan Xu
Abstract:
This paper is concerned with the linear quadratic (LQ) optimal control of continuous-time system with terminal state constraint. In particular, multiple agents exist in the system which can only access partial information of the matrix parameters. This makes the classical solving method based on Riccati equation with global information suffering. The main contribution is to present a distributed a…
▽ More
This paper is concerned with the linear quadratic (LQ) optimal control of continuous-time system with terminal state constraint. In particular, multiple agents exist in the system which can only access partial information of the matrix parameters. This makes the classical solving method based on Riccati equation with global information suffering. The main contribution is to present a distributed algorithm to derive the optimal controller which is consisting of the distributed iterations for the Riccati equation, a backward differential equation driven by the optimal Lagrange multiplier and the optimal state. The effectiveness of the proposed algorithm is verified by two numerical examples.
△ Less
Submitted 28 April, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
Spectrum Assignment of Stochastic Systems with Multiplicative Noise
Authors:
Xiaomin Xue,
Juanjuan Xu,
Huanshui Zhang
Abstract:
This paper studies the spectrum assignment of a class of stochastic systems with multiplicative noise. A novel $α$-spectrum assignment is proposed for discrete-time and continuous-time stochastic systems with multiplicative noise. In particular, $0$-spectrum assignment is equivalent to the pole assignment for the deterministic systems. The main contribution is two-fold: On the one hand, we present…
▽ More
This paper studies the spectrum assignment of a class of stochastic systems with multiplicative noise. A novel $α$-spectrum assignment is proposed for discrete-time and continuous-time stochastic systems with multiplicative noise. In particular, $0$-spectrum assignment is equivalent to the pole assignment for the deterministic systems. The main contribution is two-fold: On the one hand, we present the conditions for $α$-spectrum assignment and the design of feedback controllers based on the system parameters. On the other hand, when the system parameters are unknown, we present a stochastic approximation algorithm to learn the feedback gains which guarantee the spectrum of the stochastic systems to achieve the predetermined value. Numerical examples are provided to demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
A point cloud reconstruction method based on uncertainty feature enhancement for aerodynamic shape optimization
Authors:
Junlin Li,
Yang Zhang,
Bo Pang,
Junqiang Bai,
Jiakuan Xu
Abstract:
The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architectu…
▽ More
The precision of shape representation and the dimensionality of the design space significantly influence the cost and outcomes of aerodynamic optimization. The design space can be represented more compactly by maintaining geometric precision while reducing dimensions, hence enhancing the cost-effectiveness of the optimization process. This research presents a new point cloud autoencoder architecture, called AE-BUFE, designed to attain efficient and precise generalized representations of 3D aircraft through uncertainty analysis of the deformation relationships among surface grid points. The deep learning architecture consists of two components: the uncertainty index-based feature enhancement module and the point cloud autoencoder module. It learns the shape features of the point cloud geometric representation to establish a low-dimensional latent space. To assess and evaluate the efficiency of the method, a comparison was conducted with the prevailing point cloud autoencoder architecture and the proper orthogonal decomposition (POD) linear dimensionality reduction method under conditions of complex shape deformation. The results showed that the new architecture significantly improved the extraction effect of the low-dimensional latent space. Then, we developed the SBO optimization framework based on the AE-BUFE parameterization method and completed a multi-objective aerodynamic optimization design for a wide-speed-range vehicle considering volume and moment constraints. While ensuring the take-off and landing performance, the aerodynamic performance is improved at transonic and hypersonic conditions, which verifies the efficiency and engineering practicability of this method.
△ Less
Submitted 2 April, 2025; v1 submitted 29 March, 2025;
originally announced March 2025.
-
Efficient QR-Based CP Decomposition Acceleration via Dimension Tree and Extrapolation
Authors:
Wenchao Xie,
Jiawei Xu,
Zheng Peng,
Qingsong Wang
Abstract:
The canonical polyadic (CP) decomposition is one of the most widely used tensor decomposition techniques. The conventional CP decomposition algorithm combines alternating least squares (ALS) with the normal equation. However, the normal equation is susceptible to numerical ill-conditioning, which can adversely affect the decomposition results. To mitigate this issue, ALS combined with QR decomposi…
▽ More
The canonical polyadic (CP) decomposition is one of the most widely used tensor decomposition techniques. The conventional CP decomposition algorithm combines alternating least squares (ALS) with the normal equation. However, the normal equation is susceptible to numerical ill-conditioning, which can adversely affect the decomposition results. To mitigate this issue, ALS combined with QR decomposition has been proposed as a more numerically stable alternative. Although this method enhances stability, its iterative process involves tensor-times-matrix (TTM) operations, which typically result in higher computational costs. To reduce this cost, we propose branch reutilization of dimension tree, which increases the reuse of intermediate tensors and reduces the number of TTM operations. This strategy achieves a $33\%$ reduction in computational complexity for third and fourth order tensors. Additionally, we introduce a specialized extrapolation method in CP-ALS-QR algorithm, leveraging the unique structure of the matrix $\mathbf{Q}_0$ to further enhance convergence. By integrating both techniques, we develop a novel CP decomposition algorithm that significantly improves efficiency. Numerical experiments on five real-world datasets show that our proposed algorithm reduces iteration costs and enhances fitting accuracy compared to the CP-ALS-QR algorithm.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
The broken sample problem revisited: Proof of a conjecture by Bai-Hsing and high-dimensional extensions
Authors:
Simiao Jiao,
Yihong Wu,
Jiaming Xu
Abstract:
We revisit the classical broken sample problem: Two samples of i.i.d. data points $\mathbf{X}=\{X_1,\cdots, X_n\}$ and $\mathbf{Y}=\{Y_1,\cdots,Y_m\}$ are observed without correspondence with $m\leq n$. Under the null hypothesis, $\mathbf{X}$ and $\mathbf{Y}$ are independent. Under the alternative hypothesis, $\mathbf{Y}$ is correlated with a random subsample of $\mathbf{X}$, in the sense that…
▽ More
We revisit the classical broken sample problem: Two samples of i.i.d. data points $\mathbf{X}=\{X_1,\cdots, X_n\}$ and $\mathbf{Y}=\{Y_1,\cdots,Y_m\}$ are observed without correspondence with $m\leq n$. Under the null hypothesis, $\mathbf{X}$ and $\mathbf{Y}$ are independent. Under the alternative hypothesis, $\mathbf{Y}$ is correlated with a random subsample of $\mathbf{X}$, in the sense that $(X_{π(i)},Y_i)$'s are drawn independently from some bivariate distribution for some latent injection $π:[m] \to [n]$. Originally introduced by DeGroot, Feder, and Goel (1971) to model matching records in census data, this problem has recently gained renewed interest due to its applications in data de-anonymization, data integration, and target tracking. Despite extensive research over the past decades, determining the precise detection threshold has remained an open problem even for equal sample sizes ($m=n$). Assuming $m$ and $n$ grow proportionally, we show that the sharp threshold is given by a spectral and an $L_2$ condition of the likelihood ratio operator, resolving a conjecture of Bai and Hsing (2005) in the positive. These results are extended to high dimensions and settle the sharp detection thresholds for Gaussian and Bernoulli models.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
"All-Something-Nothing" Phase Transitions in Planted k-Factor Recovery
Authors:
Julia Gaudio,
Colin Sandon,
Jiaming Xu,
Dana Yang
Abstract:
This paper studies the problem of inferring a $k$-factor, specifically a spanning $k$-regular graph, planted within an Erdos-Renyi random graph $G(n,λ/n)$. We uncover an interesting "all-something-nothing" phase transition. Specifically, we show that as the average degree $λ$ surpasses the critical threshold of $1/k$, the inference problem undergoes a transition from almost exact recovery ("all" p…
▽ More
This paper studies the problem of inferring a $k$-factor, specifically a spanning $k$-regular graph, planted within an Erdos-Renyi random graph $G(n,λ/n)$. We uncover an interesting "all-something-nothing" phase transition. Specifically, we show that as the average degree $λ$ surpasses the critical threshold of $1/k$, the inference problem undergoes a transition from almost exact recovery ("all" phase) to partial recovery ("something" phase). Moreover, as $λ$ tends to infinity, the accuracy of recovery diminishes to zero, leading to the onset of the "nothing" phase. This finding complements the recent result by Mossel, Niles-Weed, Sohn, Sun, and Zadik who established that for certain sufficiently dense graphs, the problem undergoes an "all-or-nothing" phase transition, jumping from near-perfect to near-zero recovery. In addition, we characterize the recovery accuracy of a linear-time iterative pruning algorithm and show that it achieves almost exact recovery when $λ< 1/k$. A key component of our analysis is a two-step cycle construction: we first build trees through local neighborhood exploration and then connect them by sprinkling using reserved edges. Interestingly, for proving impossibility of almost exact recovery, we construct $Θ(n)$ many small trees of size $Θ(1)$, whereas for establishing the algorithmic lower bound, a single large tree of size $Θ(\sqrt{n\log n})$ suffices.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
The Dry Ten Martini Problem for $C^2$ cosine-type quasiperiodic Schrödinger operators
Authors:
Lingrui Ge,
Yiqian Wang,
Jiahao Xu
Abstract:
This paper solves ``The Dry Ten Martini Problem'' for $C^2$ cosine-type quasiperiodic Schrödinger operators with large coupling constants and Diophantine frequencies, a model originally introduced by Sinai in 1987 \cite{sinai}. This shows that the analyticity assumption on the potential is not essential for obtaining a dry Cantor spectrum and can be replaced by a certain geometric condition in the…
▽ More
This paper solves ``The Dry Ten Martini Problem'' for $C^2$ cosine-type quasiperiodic Schrödinger operators with large coupling constants and Diophantine frequencies, a model originally introduced by Sinai in 1987 \cite{sinai}. This shows that the analyticity assumption on the potential is not essential for obtaining a dry Cantor spectrum and can be replaced by a certain geometric condition in the low regularity case. In addition, we prove the homogeneity of the spectrum and the absolute continuity of the integrated density of states (IDS).
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
A Criterion for Extending Continuous-Mixture Identifiability Results
Authors:
Michael R. Powers,
Jiaxin Xu
Abstract:
For continuous mixtures of random variables, we provide a simple criterion -- generating-function accessibility -- to extend previously known kernel-based identifiability (or unidentifiability) results to new kernel distributions. This criterion, based on functional relationships between the relevant kernels' moment-generating functions or Laplace transforms, may be applied to continuous mixtures…
▽ More
For continuous mixtures of random variables, we provide a simple criterion -- generating-function accessibility -- to extend previously known kernel-based identifiability (or unidentifiability) results to new kernel distributions. This criterion, based on functional relationships between the relevant kernels' moment-generating functions or Laplace transforms, may be applied to continuous mixtures of both discrete and continuous random variables. To illustrate the proposed approach, we present results for several specific kernels.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
The Weighted Grand Herz-Morrey-Lizorkin-Triebel Spaces with Variable Exponents
Authors:
Shengrong Wang,
Pengfei Guo,
Jingshi Xu
Abstract:
Let a vector-valued sublinear operator satisfy the size condition and be bounded on weighted Lebesgue spaces with variable exponent. Then we obtain its boundedness on weighted grand Herz-Morrey spaces with variable exponents. Next we introduce weighted grand Herz-Morrey-Triebel-Lizorkin spaces with variable exponents and provide their equivalent quasi-norms via maximal functions.
Let a vector-valued sublinear operator satisfy the size condition and be bounded on weighted Lebesgue spaces with variable exponent. Then we obtain its boundedness on weighted grand Herz-Morrey spaces with variable exponents. Next we introduce weighted grand Herz-Morrey-Triebel-Lizorkin spaces with variable exponents and provide their equivalent quasi-norms via maximal functions.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
The Commutators of $n$-dimensional Rough Fractional Hardy Operators on Two Weighted Grand Herz-Morrey Spaces with Variable Exponents
Authors:
Shengrong Wang,
Pengfei Guo,
Jingshi Xu
Abstract:
In this paper, we obtain the boundedness of $m$th order commutators generated by the $n$-dimensional fractional Hardy operator with rough kernel and its adjoint operator with BMO functions on two weighted grand Herz-Morrey spaces with variable exponents. Replacing Lipschitz functions with BMO functions the corresponding result is also given.
In this paper, we obtain the boundedness of $m$th order commutators generated by the $n$-dimensional fractional Hardy operator with rough kernel and its adjoint operator with BMO functions on two weighted grand Herz-Morrey spaces with variable exponents. Replacing Lipschitz functions with BMO functions the corresponding result is also given.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Dynamic Pricing with Adversarially-Censored Demands
Authors:
Jianyu Xu,
Yining Wang,
Xi Chen,
Yu-Xiang Wang
Abstract:
We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives…
▽ More
We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees
Authors:
Yuhao Zhou,
Jintao Xu,
Chenglong Bao,
Chao Ding,
Jun Zhu
Abstract:
We consider the problem of finding an $ε$-stationary point of a nonconvex function with a Lipschitz continuous Hessian and propose a quadratic regularized Newton method incorporating a new class of regularizers constructed from the current and previous gradients. The method leverages a recently developed linear conjugate gradient approach with a negative curvature monitor to solve the regularized…
▽ More
We consider the problem of finding an $ε$-stationary point of a nonconvex function with a Lipschitz continuous Hessian and propose a quadratic regularized Newton method incorporating a new class of regularizers constructed from the current and previous gradients. The method leverages a recently developed linear conjugate gradient approach with a negative curvature monitor to solve the regularized Newton equation. Notably, our algorithm is adaptive, requiring no prior knowledge of the Lipschitz constant of the Hessian, and achieves a global complexity of $O(ε^{-\frac{3}{2}}) + \tilde O(1)$ in terms of the second-order oracle calls, and $\tilde O(ε^{-\frac{7}{4}})$ for Hessian-vector products, respectively. Moreover, when the iterates converge to a point where the Hessian is positive definite, the method exhibits quadratic local convergence. Preliminary numerical results illustrate the competitiveness of our algorithm.
△ Less
Submitted 14 February, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
A Proof of The Changepoint Detection Threshold Conjecture in Preferential Attachment Models
Authors:
Hang Du,
Shuyang Gong,
Jiaming Xu
Abstract:
We investigate the problem of detecting and estimating a changepoint in the attachment function of a network evolving according to a preferential attachment model on $n$ vertices, using only a single final snapshot of the network. Bet et al.~\cite{bet2023detecting} show that a simple test based on thresholding the number of vertices with minimum degrees can detect the changepoint when the change o…
▽ More
We investigate the problem of detecting and estimating a changepoint in the attachment function of a network evolving according to a preferential attachment model on $n$ vertices, using only a single final snapshot of the network. Bet et al.~\cite{bet2023detecting} show that a simple test based on thresholding the number of vertices with minimum degrees can detect the changepoint when the change occurs at time $n-Ω(\sqrt{n})$. They further make the striking conjecture that detection becomes impossible for any test if the change occurs at time $n-o(\sqrt{n}).$ Kaddouri et al.~\cite{kaddouri2024impossibility} make a step forward by proving the detection is impossible if the change occurs at time $n-o(n^{1/3}).$ In this paper, we resolve the conjecture affirmatively, proving that detection is indeed impossible if the change occurs at time $n-o(\sqrt{n}).$ Furthermore, we establish that estimating the changepoint with an error smaller than $o(\sqrt{n})$ is also impossible, thereby confirming that the estimator proposed in Bhamidi et al.~\cite{bhamidi2018change} is order-optimal.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
A Bayesian decision-theoretic approach to sparse estimation
Authors:
Aihua Li,
Surya T. Tokdar,
Jason Xu
Abstract:
We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of…
▽ More
We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the posterior mean which is the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes and offers far better performance in high-dimensional settings with highly correlated predictors. Our numerical results in such settings show that certain combinations of priors and loss functions significantly improve the solution path compared to existing methods, prioritizing true signals early along the path before false signals are selected. Consequently, Bayesian Decoupling produces estimates with better prediction and selection performance. Finally, a real data application illustrates the practical advantages of our approaches which select sparser models with larger coefficient estimates.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach
Authors:
Jianyu Xu,
Xuan Wang,
Yu-Xiang Wang,
Jiashuo Jiang
Abstract:
We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an effici…
▽ More
We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an efficient algorithm that utilizes a "Lower-Confidence Bound (LCB)" meta-strategy over multiple OCO agents. Our algorithm achieves $\tilde{O}(\sqrt{Tmn})$ regret (for $m$ suppliers and $n$ consumers), which is optimal with respect to the time horizon $T$. Our results illustrate an effective integration of statistical learning methodologies with complex operations research problems.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Passing through nondegenerate singularities in mean curvature flows
Authors:
Ao Sun,
Zhihan Wang,
Jinxin Xue
Abstract:
In this paper, we study the properties of nondegenerate cylindrical singularities of mean curvature flow. We prove they are isolated in spacetime and provide a complete description of the geometry and topology change of the flow passing through the singularities. Particularly, the topology change agrees with the level sets change near a critical point of a Morse function, which is the same as perf…
▽ More
In this paper, we study the properties of nondegenerate cylindrical singularities of mean curvature flow. We prove they are isolated in spacetime and provide a complete description of the geometry and topology change of the flow passing through the singularities. Particularly, the topology change agrees with the level sets change near a critical point of a Morse function, which is the same as performing surgery. The proof is based on a new $L^2$-distance monotonicity formula, which allows us to derive a discrete almost monotonicity of the ``decay order", a discrete mean curvature flow analog to Almgren's frequency function.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Remarks on log pluricanonical representations
Authors:
Osamu Fujino,
Jinsong Xu
Abstract:
We show the finiteness of log pluricanonical representations under the assumption of the existence of a good minimal model.
We show the finiteness of log pluricanonical representations under the assumption of the existence of a good minimal model.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Dynamics and large deviations for fractional stochastic partial differential equations with Lévy noise
Authors:
Jiaohui Xu,
Tomás Caraballo,
José Valero
Abstract:
This paper is mainly concerned with a kind of fractional stochastic evolution equations driven by Lévy noise in a bounded domain. We first state the well-posedness of the problem via iterative approximations and energy estimates. Then, the existence and uniqueness of weak pullback mean random attractors for the equations {are} established by defining a mean random dynamical system. Next, we prove…
▽ More
This paper is mainly concerned with a kind of fractional stochastic evolution equations driven by Lévy noise in a bounded domain. We first state the well-posedness of the problem via iterative approximations and energy estimates. Then, the existence and uniqueness of weak pullback mean random attractors for the equations {are} established by defining a mean random dynamical system. Next, we prove the existence of invariant measures when the problem is autonomous by means of the fact that $H^γ(\mathcal{O})$ is compactly embedded in $L^2(\mathcal{O})$ with $γ\in (0,1)$. Moreover, the uniqueness of this invariant measure is presented which ensures the ergodicity of the problem. Finally, a large deviation principle result for solutions of SPDEs perturbed by small Lévy noise and Brownian motion is obtained by a variational formula for positive functionals of a Poisson random measure and Brownian motion. Additionally, the results are illustrated by the fractional stochastic Chafee-Infante equations
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
A decoupled linear, mass-conservative block-centered finite difference method for the Keller-Segel chemotaxis system
Authors:
Jie Xu,
Hongfei Fu
Abstract:
As a class of nonlinear partial differential equations, the Keller-Segel system is widely used to model chemotaxis in biology. In this paper, we present the construction and analysis of a decoupled linear, mass-conservative, block-centered finite difference method for the classical Keller-Segel chemotaxis system. We show that the scheme is mass conservative for the cell density at the discrete lev…
▽ More
As a class of nonlinear partial differential equations, the Keller-Segel system is widely used to model chemotaxis in biology. In this paper, we present the construction and analysis of a decoupled linear, mass-conservative, block-centered finite difference method for the classical Keller-Segel chemotaxis system. We show that the scheme is mass conservative for the cell density at the discrete level. In addition, second-order temporal and spatial convergence for both the cell density and the chemoattractant concentration are rigorously discussed, using the mathematical induction method, the discrete energy method and detailed analysis of the truncation errors. Our scheme is proposed and analyzed on non-uniform spatial grids, which leads to more accurate and efficient modeling results for the chemotaxis system with rapid blow-up phenomenon. Furthermore, the existence and uniqueness of solutions to the Keller-Segel chemotaxis system are also discussed. Numerical experiments are presented to verify the theoretical results and to show the robustness and accuracy of the scheme.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Cyclicity of Cowen-Douglas tuples
Authors:
Jing Xu,
Shanshan Ji,
Yufang Xie,
Kui Ji
Abstract:
The study of Cowen-Douglas operators involves not only operator-theoretic tools but also complex geometry on holomorphic vector bundles. By leveraging the properties of holomorphic vector bundles, this paper investigates the cyclicity of Cowen-Douglas tuples and demonstrates conclusively that every such tuple is cyclic.
The study of Cowen-Douglas operators involves not only operator-theoretic tools but also complex geometry on holomorphic vector bundles. By leveraging the properties of holomorphic vector bundles, this paper investigates the cyclicity of Cowen-Douglas tuples and demonstrates conclusively that every such tuple is cyclic.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
On understanding and overcoming spectral biases of deep neural network learning methods for solving PDEs
Authors:
Zhi-Qin John Xu,
Lulu Zhang,
Wei Cai
Abstract:
In this review, we survey the latest approaches and techniques developed to overcome the spectral bias towards low frequency of deep neural network learning methods in learning multiple-frequency solutions of partial differential equations. Open problems and future research directions are also discussed.
In this review, we survey the latest approaches and techniques developed to overcome the spectral bias towards low frequency of deep neural network learning methods in learning multiple-frequency solutions of partial differential equations. Open problems and future research directions are also discussed.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
CeViT: Copula-Enhanced Vision Transformer in multi-task learning and bi-group image covariates with an application to myopia screening
Authors:
Chong Zhong,
Yang Li,
Jinfeng Xu,
Xiang Fu,
Yunhao Liu,
Qiuyi Huang,
Danjuan Yang,
Meiyan Li,
Aiyi Liu,
Alan H. Welsh,
Xingtao Zhou,
Bo Fu,
Catherine C. Liu
Abstract:
We aim to assist image-based myopia screening by resolving two longstanding problems, "how to integrate the information of ocular images of a pair of eyes" and "how to incorporate the inherent dependence among high-myopia status and axial length for both eyes." The classification-regression task is modeled as a novel 4-dimensional muti-response regression, where discrete responses are allowed, tha…
▽ More
We aim to assist image-based myopia screening by resolving two longstanding problems, "how to integrate the information of ocular images of a pair of eyes" and "how to incorporate the inherent dependence among high-myopia status and axial length for both eyes." The classification-regression task is modeled as a novel 4-dimensional muti-response regression, where discrete responses are allowed, that relates to two dependent 3rd-order tensors (3D ultrawide-field fundus images). We present a Vision Transformer-based bi-channel architecture, named CeViT, where the common features of a pair of eyes are extracted via a shared Transformer encoder, and the interocular asymmetries are modeled through separated multilayer perceptron heads. Statistically, we model the conditional dependence among mixture of discrete-continuous responses given the image covariates by a so-called copula loss. We establish a new theoretical framework regarding fine-tuning on CeViT based on latent representations, allowing the black-box fine-tuning procedure interpretable and guaranteeing higher relative efficiency of fine-tuning weight estimation in the asymptotic setting. We apply CeViT to an annotated ultrawide-field fundus image dataset collected by Shanghai Eye \& ENT Hospital, demonstrating that CeViT enhances the baseline model in both accuracy of classifying high-myopia and prediction of AL on both eyes.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Dynamics and Wong-Zakai approximations of stochastic nonlocal PDEs with long time memory
Authors:
Jiaohui Xu,
Tomás Caraballo,
José Valero
Abstract:
In this paper, a combination of Galerkin's method and Dafermos' transformation is first used to prove the existence and uniqueness of solutions for a class of stochastic nonlocal PDEs with long time memory driven by additive noise. Next, the existence of tempered random attractors for such equations is established in an appropriate space for the analysis of problems with delay and memory. Eventual…
▽ More
In this paper, a combination of Galerkin's method and Dafermos' transformation is first used to prove the existence and uniqueness of solutions for a class of stochastic nonlocal PDEs with long time memory driven by additive noise. Next, the existence of tempered random attractors for such equations is established in an appropriate space for the analysis of problems with delay and memory. Eventually, the convergence of solutions of Wong-Zakai approximations and upper semicontinuity of random attractors of the approximate random system, as the step sizes of approximations approach zero, are analyzed in a detailed way.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Precompactness in bivariate metric semigroup-valued bounded variation spaces
Authors:
Jingshi Xu,
Yinglian Niu
Abstract:
In this paper, we show that if a set in bivariate metric semigroups-valued bounded variation spaces is pointwise totally bounded and joint equivariated then it is precompact. These spaces include bounded Jordan variation spaces, bounded Wiener variation spaces, bounded Waterman variation spaces, bounded Riesz variation spaces and bounded Korenblum variation spaces. To do so, we introduce the conce…
▽ More
In this paper, we show that if a set in bivariate metric semigroups-valued bounded variation spaces is pointwise totally bounded and joint equivariated then it is precompact. These spaces include bounded Jordan variation spaces, bounded Wiener variation spaces, bounded Waterman variation spaces, bounded Riesz variation spaces and bounded Korenblum variation spaces. To do so, we introduce the concept of equimetric set.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
Wheel-like bricks and minimal matching covered graphs
Authors:
Xiaoling He,
Fuliang Lu,
Jinxin Xue
Abstract:
A connected graph G with at least two vertices is matching covered if each of its edges lies in a perfect matching. We say that an edge e in a matching covered graph G is removable if G-e is matching covered. A pair {e; f} of edges of a matching covered graph G is a removable doubleton if G-e-f is matching covered, but neither G-e nor G-f is. Removable edges and removable doubletons are called rem…
▽ More
A connected graph G with at least two vertices is matching covered if each of its edges lies in a perfect matching. We say that an edge e in a matching covered graph G is removable if G-e is matching covered. A pair {e; f} of edges of a matching covered graph G is a removable doubleton if G-e-f is matching covered, but neither G-e nor G-f is. Removable edges and removable doubletons are called removable classes, introduced by Lovasz and Plummer in connection with ear decompositions of matching covered graphs. A 3-connected graph is a brick if the removal of any two distinct vertices, the left graph has a perfect matching. A brick G is wheel-like if G has a vertex h, such that every removable class of G has an edge incident with h. Lucchesi and Murty proposed a problem of characterizing wheel-like bricks. We show that every wheel-like brick may be obtained by splicing graphs whose underlying simple graphs are odd wheels in a certain manner. A matching covered graph is minimal if the removal of any edge, the left graph is not matching covered. Lovasz and Plummer proved that the minimum degree of a minimal matching covered bipartite graph different from K2 is 2 by ear decompositions in 1977. By the properties of wheel-like bricks, we prove that the minimum degree of a minimal matching covered graph other than K2 is 2 or 3.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Construction of Toda flow via Sato-Segal-Wilson theory
Authors:
Shuo Zhang,
Shinichi Kotani,
Jiahao Xu
Abstract:
A Toda flow is constructed on a space of bounded initial data through Sato-Segal-Wilson theory. The flow is described by the Weyl functions of the underlying Jacobi operators. This is a continuation of the previous work on the KdV flow.
A Toda flow is constructed on a space of bounded initial data through Sato-Segal-Wilson theory. The flow is described by the Weyl functions of the underlying Jacobi operators. This is a continuation of the previous work on the KdV flow.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Approximation Schemes for Age of Information Minimization in UAV Grid Patrols
Authors:
Weiqi Wang,
Jin Xu
Abstract:
Motivated by the critical need for unmanned aerial vehicles (UAVs) to patrol grid systems in hazardous and dynamically changing environments, this study addresses a routing problem aimed at minimizing the time-average Age of Information (AoI) for edges in general graphs. We establish a lower bound for all feasible patrol policies and demonstrate that this bound is tight when the graph contains an…
▽ More
Motivated by the critical need for unmanned aerial vehicles (UAVs) to patrol grid systems in hazardous and dynamically changing environments, this study addresses a routing problem aimed at minimizing the time-average Age of Information (AoI) for edges in general graphs. We establish a lower bound for all feasible patrol policies and demonstrate that this bound is tight when the graph contains an Eulerian cycle. For graphs without Eulerian cycles, it becomes challenging to identify the optimal patrol strategy due to the extensive range of feasible options. Our analysis shows that restricting the strategy to periodic sequences still results in an exponentially large number of possible strategies. To address this complexity, we introduce two polynomial-time approximation schemes, each involving a two-step process: constructing multigraphs first and then embedding Eulerian cycles within these multigraphs. We prove that both schemes achieve an approximation ratio of 2. Further, both analytical and numerical results suggest that evenly and sparsely distributing edge visits within a periodic route significantly reduces the average AoI compared to strategies that merely minimize the route travel distance. Building on this insight, we propose a heuristic method that not only maintains the approximation ratio of 2 but also ensures robust performance across varying random graphs.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
A Codimension Two Approach to the $\mathbb{S}^1$-Stability Conjecture
Authors:
Steven Rosenberg,
Jie Xu
Abstract:
J. Rosenberg's $\mathbb{S}^1$-stability conjecture states that a closed oriented manifold $X$ admits a positive scalar curvature metric iff $X\times \mathbb{S}^1$ admits a positive scalar curvature metric $h$. As pointed out by J. Rosenberg and others, there are known counterexamples in dimension four. We prove this conjecture whenever $h$ satisfies a geometric bound, depending only on the dimensi…
▽ More
J. Rosenberg's $\mathbb{S}^1$-stability conjecture states that a closed oriented manifold $X$ admits a positive scalar curvature metric iff $X\times \mathbb{S}^1$ admits a positive scalar curvature metric $h$. As pointed out by J. Rosenberg and others, there are known counterexamples in dimension four. We prove this conjecture whenever $h$ satisfies a geometric bound, depending only on the dimension of $ X $, which measures the discrepancy between $\partial_θ\in T\mathbb{S}^1$ and the normal vector field to $X\times \{P\}$, for a fixed $P\in \mathbb{S}^1.$
△ Less
Submitted 16 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
Distributed Bilevel Optimization via Adaptive Penalization with Time-Scale Separation
Authors:
Youcheng Niu,
Jinming Xu,
Ying Sun,
Li Chai,
Jiming Chen
Abstract:
This paper studies a class of distributed bilevel optimization (DBO) problems with a coupled inner-level subproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian information. To address this, we propose an equivalent constrained reformulation by treating the inner-level subproblem as an inequality constraint, and introduce an adaptive…
▽ More
This paper studies a class of distributed bilevel optimization (DBO) problems with a coupled inner-level subproblem. Existing approaches typically rely on hypergradient estimations involving computationally expensive Hessian information. To address this, we propose an equivalent constrained reformulation by treating the inner-level subproblem as an inequality constraint, and introduce an adaptive penalty function to properly penalize both inequality and consensus constraints based on subproblem properties. Moreover, we propose a loopless distributed algorithm, \ALGNAME, that employs multiple-timescale updates to solve each subproblem asymptotically without requiring Hessian information. Theoretically, we establish convergence rates of $\mathcal{O}(\frac{κ^4}{(1-ρ)^2 K^{1/3}})$ for nonconvex-strongly-convex cases and $\mathcal{O}(\frac{κ^2}{(1-ρ)^2 K^{2/3}})$ for distributed min-max problems. Our analysis shows the clear dependence of convergence performance on bilevel heterogeneity, the adaptive penalty parameter, and network connectivity, with a weaker assumption on heterogeneity requiring only bounded first-order heterogeneity at the optimum. Numerical experiments validate our theoretical findings.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Robust Contraction Decomposition for Minor-Free Graphs and its Applications
Authors:
Sayan Bandyapadhyay,
William Lochet,
Daniel Lokshtanov,
Dániel Marx,
Pranabendu Misra,
Daniel Neuen,
Saket Saurabh,
Prafullkumar Tale,
Jie Xue
Abstract:
We prove a robust contraction decomposition theorem for $H$-minor-free graphs, which states that given an $H$-minor-free graph $G$ and an integer $p$, one can partition in polynomial time the vertices of $G$ into $p$ sets $Z_1,\dots,Z_p$ such that $\operatorname{tw}(G/(Z_i \setminus Z')) = O(p + |Z'|)$ for all $i \in [p]$ and $Z' \subseteq Z_i$. Here, $\operatorname{tw}(\cdot)$ denotes the treewid…
▽ More
We prove a robust contraction decomposition theorem for $H$-minor-free graphs, which states that given an $H$-minor-free graph $G$ and an integer $p$, one can partition in polynomial time the vertices of $G$ into $p$ sets $Z_1,\dots,Z_p$ such that $\operatorname{tw}(G/(Z_i \setminus Z')) = O(p + |Z'|)$ for all $i \in [p]$ and $Z' \subseteq Z_i$. Here, $\operatorname{tw}(\cdot)$ denotes the treewidth of a graph and $G/(Z_i \setminus Z')$ denotes the graph obtained from $G$ by contracting all edges with both endpoints in $Z_i \setminus Z'$.
Our result generalizes earlier results by Klein [SICOMP 2008] and Demaine et al. [STOC 2011] based on partitioning $E(G)$, and some recent theorems for planar graphs by Marx et al. [SODA 2022], for bounded-genus graphs (more generally, almost-embeddable graphs) by Bandyapadhyay et al. [SODA 2022], and for unit-disk graphs by Bandyapadhyay et al. [SoCG 2022].
The robust contraction decomposition theorem directly results in parameterized algorithms with running time $2^{\widetilde{O}(\sqrt{k})} \cdot n^{O(1)}$ or $n^{O(\sqrt{k})}$ for every vertex/edge deletion problems on $H$-minor-free graphs that can be formulated as Permutation CSP Deletion or 2-Conn Permutation CSP Deletion. Consequently, we obtain the first subexponential-time parameterized algorithms for Subset Feedback Vertex Set, Subset Odd Cycle Transversal, Subset Group Feedback Vertex Set, 2-Conn Component Order Connectivity on $H$-minor-free graphs. For other problems which already have subexponential-time parameterized algorithms on $H$-minor-free graphs (e.g., Odd Cycle Transversal, Vertex Multiway Cut, Vertex Multicut, etc.), our theorem gives much simpler algorithms of the same running time.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Dynamical Persistent Homology via Wasserstein Gradient Flow
Authors:
Minghua Wang,
Jinhui Xu
Abstract:
In this study, we introduce novel methodologies designed to adapt original data in response to the dynamics of persistence diagrams along Wasserstein gradient flows. Our research focuses on the development of algorithms that translate variations in persistence diagrams back into the data space. This advancement enables direct manipulation of the data, guided by observed changes in persistence diag…
▽ More
In this study, we introduce novel methodologies designed to adapt original data in response to the dynamics of persistence diagrams along Wasserstein gradient flows. Our research focuses on the development of algorithms that translate variations in persistence diagrams back into the data space. This advancement enables direct manipulation of the data, guided by observed changes in persistence diagrams, offering a powerful tool for data analysis and interpretation in the context of topological data analysis.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Fibration method with multiple fibers and strong approximation
Authors:
Dasheng Wei,
Jie Xu,
Yi Zhu
Abstract:
We develop the fibration method to produce rational (or integral) points on the total space with few multiple fibers over the projective line over number fields. As its application, we prove strong approximation without off any place and arithmetic purity for two classes of open rationally connected varieties: the smooth locus of singular del Pezzo surfaces of degree $\geq 4$ and the smooth locus…
▽ More
We develop the fibration method to produce rational (or integral) points on the total space with few multiple fibers over the projective line over number fields. As its application, we prove strong approximation without off any place and arithmetic purity for two classes of open rationally connected varieties: the smooth locus of singular del Pezzo surfaces of degree $\geq 4$ and the smooth locus of complete normal toric varieties. We also study strong approximation for the intersection of two affine quadrics. As its application, we get an unconditional result of fibration method for rank 4.
△ Less
Submitted 3 December, 2024; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Stable gradient-adjusted root mean square propagation on least squares problem
Authors:
Runze Li,
Jintao Xu,
Wenxun Xing
Abstract:
Root mean square propagation (abbreviated as RMSProp) is a first-order stochastic algorithm used in machine learning widely. In this paper, a stable gradient-adjusted RMSProp (abbreviated as SGA-RMSProp) with mini-batch stochastic gradient is proposed for the linear least squares problem. R-linear convergence of the algorithm is established on the consistent linear least squares problem. The algor…
▽ More
Root mean square propagation (abbreviated as RMSProp) is a first-order stochastic algorithm used in machine learning widely. In this paper, a stable gradient-adjusted RMSProp (abbreviated as SGA-RMSProp) with mini-batch stochastic gradient is proposed for the linear least squares problem. R-linear convergence of the algorithm is established on the consistent linear least squares problem. The algorithm is also proved to converge R-linearly to a neighborhood of the minimizer for the inconsistent case, with the region of the neighborhood being controlled by the batch size. Furthermore, numerical experiments are conducted to compare the performances of SGA-RMSProp and stochastic gradient descend (abbreviated as SGD) with different batch sizes. The faster initial convergence rate of SGA-RMSProp is observed through numerical experiments and an adaptive strategy for switching from SGA-RMSProp to SGD is proposed, which combines the benefits of these two algorithms.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Edge universality of $β$-additions through Dunkl operators
Authors:
David Keating,
Jiaming Xu
Abstract:
It is well known that the edge limit of Gaussian/Laguerre Beta ensembles is given by Airy($β$) point process. We prove an universality result that this also holds for a general class of additions of Gaussian and Laguerre ensembles. In order to make sense of $β$-addition, we introduce type A Bessel function as the characteristic function of our matrix ensemble following the line of Gorin-Marcus, Be…
▽ More
It is well known that the edge limit of Gaussian/Laguerre Beta ensembles is given by Airy($β$) point process. We prove an universality result that this also holds for a general class of additions of Gaussian and Laguerre ensembles. In order to make sense of $β$-addition, we introduce type A Bessel function as the characteristic function of our matrix ensemble following the line of Gorin-Marcus, Benaych Georges-Cuenca-Gorin. Then we extract its moment information through the action of Dunkl operators, a class of differential operators originated from special function theory. We do the action explicitly on the Bessel generating functions of our additions, and after the asymptotic analysis, we obtain certain limiting functional in terms of conditional Brownian bridges of the Laplace transform of Airy($β$), which is universal up to proper rescaling among all our additions.
△ Less
Submitted 14 December, 2024; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Lagrangian dual with zero duality gap that admits decomposition
Authors:
Diego Cifuentes,
Santanu S. Dey,
Jingye Xu
Abstract:
For mixed integer programs (MIPs) with block structures and coupling constraints, on dualizing the coupling constraints the resulting Lagrangian relaxation becomes decomposable into blocks which allows for the use of parallel computing. However, the resulting Lagrangian dual can have non-zero duality gap due to the inherent non-convexity of MIPs. In this paper, we propose two reformulations of suc…
▽ More
For mixed integer programs (MIPs) with block structures and coupling constraints, on dualizing the coupling constraints the resulting Lagrangian relaxation becomes decomposable into blocks which allows for the use of parallel computing. However, the resulting Lagrangian dual can have non-zero duality gap due to the inherent non-convexity of MIPs. In this paper, we propose two reformulations of such MIPs by adding redundant constraints, such that the Lagrangian dual obtained by dualizing the coupling constraints and the redundant constraints have zero duality gap while still remaining decomposable. One of these reformulations is similar, although not the same as the RLT hierarchy. In this case, we present multiplicative bounds on the quality of the dual bound at each level of the hierarchy for packing and covering MIPs. We show our results are applicable to general sparse MIPs, where decomposability is revealed via the tree-decomposition of the intersection graph of the constraint matrix. In preliminary experiments, we observe that the proposed Lagrangian duals give better dual bounds than classical Lagrangian dual and Gurobi in equal time, where Gurobi is not exploiting decomposability.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System
Authors:
Yun He,
Xuxing Chen,
Jiayi Xu,
Renqin Cai,
Yiling You,
Jennifer Cao,
Minhui Huang,
Liu Yang,
Yiqun Liu,
Xiaoyi Liu,
Rong Jin,
Sem Park,
Bo Long,
Xue Feng
Abstract:
In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them…
▽ More
In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them separately. To carefully balance the optimization, we propose a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems. It balances the per-task gradients to alleviate the negative transfer, while saving the huge cost for grid search or manual explorations for appropriate task weights. Moreover, compared with prior work that normally balance the per-task gradients of shared parameters, MultiBalance is more efficient since only requiring to access per-task gradients with respect to the shared feature representations. We conduct experiments on Meta's large-scale ads and feeds multi-task recommendation system, and observe that MultiBalance achieves significant gains (e.g., 0.738% improvement for normalized entropy (NE)) with neutral training cost in Queries Per Second (QPS), which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Airy$_β$ line ensemble and its Laplace transform
Authors:
Vadim Gorin,
Jiaming Xu,
Lingfu Zhang
Abstract:
The Airy$_β$ line ensemble is a random collection of continuous curves, which should serve as a universal edge scaling limit in problems related to eigenvalues of random matrices and models of 2d statistical mechanics. This line ensemble unifies many existing universal objects including Tracy-Widom distributions, eigenvalues of the Stochastic Airy Operator, Airy$_2$ process from the KPZ theory. He…
▽ More
The Airy$_β$ line ensemble is a random collection of continuous curves, which should serve as a universal edge scaling limit in problems related to eigenvalues of random matrices and models of 2d statistical mechanics. This line ensemble unifies many existing universal objects including Tracy-Widom distributions, eigenvalues of the Stochastic Airy Operator, Airy$_2$ process from the KPZ theory. Here $β>0$ is a real parameter governing the strength of the repulsion between the curves.
We introduce and characterize the Airy$_β$ line ensemble in terms of the Laplace transform, by producing integral formulas for its joint multi-time moments. We prove two asymptotic theorems for each $β>0$: the trajectories of the largest eigenvalues in the Dyson Brownian Motion converge to the Airy$_β$ line ensemble; the extreme particles in the G$β$E corners process converge to the same limit.
The proofs are based on the convergence of random walk expansions for the multi-time moments of prelimit objects towards their Brownian counterparts. The expansions are produced through Dunkl differential-difference operators acting on multivariate Bessel generating functions.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Stochastic Optimal Linear Quadratic Regulation Control of Discrete-time Systems with Delay and Quadratic Constraints
Authors:
Dawei Liu,
Juanjuan Xu,
huanshui Zhang
Abstract:
This article explores the discrete-time stochastic optimal LQR control with delay and quadratic constraints. The inclusion of delay, compared to delay-free optimal LQR control with quadratic constraints, significantly increases the complexity of the problem. Using Lagrangian duality, the optimal control is obtained by solving the Riccati-ZXL equation in conjunction with a gradient ascent algorithm…
▽ More
This article explores the discrete-time stochastic optimal LQR control with delay and quadratic constraints. The inclusion of delay, compared to delay-free optimal LQR control with quadratic constraints, significantly increases the complexity of the problem. Using Lagrangian duality, the optimal control is obtained by solving the Riccati-ZXL equation in conjunction with a gradient ascent algorithm. Specifically, the parameterized optimal controller and cost function are derived by solving the Riccati-ZXL equation, with a gradient ascent algorithm determining the optimal parameter. The primary contribution of this work is presenting the optimal control as a feedback mechanism based on the state's conditional expectation, wherein the gain is determined using the Riccati-ZXL equation and the gradient ascent algorithm. Numerical examples demonstrate the effectiveness of the obtained results.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Revealing Floating-Point Accumulation Orders in Software/Hardware Implementations
Authors:
Peichen Xie,
Yanjie Gao,
Yang Wang,
Jilong Xue
Abstract:
Accumulation-based operations, such as summation and matrix multiplication, are fundamental to numerous computational domains. However, their accumulation orders are often undocumented in existing software and hardware implementations, making it difficult for developers to ensure consistent results across systems. To address this issue, we introduce FPRev, a diagnostic tool designed to reveal the…
▽ More
Accumulation-based operations, such as summation and matrix multiplication, are fundamental to numerous computational domains. However, their accumulation orders are often undocumented in existing software and hardware implementations, making it difficult for developers to ensure consistent results across systems. To address this issue, we introduce FPRev, a diagnostic tool designed to reveal the accumulation order in the software and hardware implementations through numerical testing. With FPRev, developers can identify and compare accumulation orders, enabling developers to create reproducible software and verify implementation equivalence.
FPRev is a testing-based tool that non-intrusively reveals the accumulation order by analyzing the outputs of the tested implementation for distinct specially designed inputs. Employing FPRev, we showcase the accumulation orders of popular libraries (such as NumPy and PyTorch) on CPUs and GPUs (including GPUs with specialized matrix accelerators such as Tensor Cores). We also validate the efficiency of FPRev through extensive experiments. FPRev exhibits a lower time complexity compared to the basic solution. FPRev will be open-sourced on GitHub.
△ Less
Submitted 27 April, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Planar wheel-like bricks
Authors:
Fuliang Lu,
Jinxin Xue
Abstract:
An edge e in a matching covered graph G is removable if G-e is matching covered; a pair {e; f} of edges of G is a removable doubleton if G-e-f is matching covered, but neither G-e nor G-f is. Removable edges and removable doubletons are called removable classes, which was introduced by Lovasz and Plummer in connection with ear decompositions of matching covered graphs. A brick is a nonbipartite ma…
▽ More
An edge e in a matching covered graph G is removable if G-e is matching covered; a pair {e; f} of edges of G is a removable doubleton if G-e-f is matching covered, but neither G-e nor G-f is. Removable edges and removable doubletons are called removable classes, which was introduced by Lovasz and Plummer in connection with ear decompositions of matching covered graphs. A brick is a nonbipartite matching covered graph without nontrivial tight cuts. A brick G is wheel-like if G has a vertex h, such that every removable class of G has an edge incident with h. Lucchesi and Murty conjectured that every planar wheel-like brick is an odd wheel. We present a proof of this conjecture in this paper.
△ Less
Submitted 27 October, 2024;
originally announced October 2024.
-
Experimental Designs for Optimizing Last-Mile Delivery
Authors:
Nicholas Rios,
Jie Xu
Abstract:
Companies like Amazon and UPS are heavily invested in last-mile delivery problems. Optimizing last-delivery operations not only creates tremendous cost savings for these companies but also generate broader societal and environmental benefits in terms of better delivery service and reduced air pollutants and greenhouse gas emissions. Last-mile delivery is readily formulated as the Travelling Salesm…
▽ More
Companies like Amazon and UPS are heavily invested in last-mile delivery problems. Optimizing last-delivery operations not only creates tremendous cost savings for these companies but also generate broader societal and environmental benefits in terms of better delivery service and reduced air pollutants and greenhouse gas emissions. Last-mile delivery is readily formulated as the Travelling Salesman Problem (TSP), where a salesperson must visit several cities and return to the origin with the least cost. A solution to this problem is a Hamiltonian circuit in an undirected graph. Many methods exist for solving the TSP, but they often assume the travel costs are fixed. In practice, travel costs between delivery zones are random quantities, as they are subject to variation from traffic, weather, and other factors. Innovations such as truck-drone last-mile delivery creates even more uncertainties due to scarce data. A Bayesian D-optimal experimental design in conjunction with a regression model are proposed to estimate these unknown travel costs, and subsequently search for a highly efficient solution to the TSP. This framework can naturally be extended to incorporate the use of drones and any other emerging technology that has use in last-mile delivery.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
IPM-LSTM: A Learning-Based Interior Point Method for Solving Nonlinear Programs
Authors:
Xi Gao,
Jinxin Xiong,
Akang Wang,
Qihong Duan,
Jiang Xue,
Qingjiang Shi
Abstract:
Solving constrained nonlinear programs (NLPs) is of great importance in various domains such as power systems, robotics, and wireless communication networks. One widely used approach for addressing NLPs is the interior point method (IPM). The most computationally expensive procedure in IPMs is to solve systems of linear equations via matrix factorization. Recently, machine learning techniques have…
▽ More
Solving constrained nonlinear programs (NLPs) is of great importance in various domains such as power systems, robotics, and wireless communication networks. One widely used approach for addressing NLPs is the interior point method (IPM). The most computationally expensive procedure in IPMs is to solve systems of linear equations via matrix factorization. Recently, machine learning techniques have been adopted to expedite classic optimization algorithms. In this work, we propose using Long Short-Term Memory (LSTM) neural networks to approximate the solution of linear systems and integrate this approximating step into an IPM. The resulting approximate NLP solution is then utilized to warm-start an interior point solver. Experiments on various types of NLPs, including Quadratic Programs and Quadratically Constrained Quadratic Programs, show that our approach can significantly accelerate NLP solving, reducing iterations by up to 60% and solution time by up to 70% compared to the default solver.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
A Local Method for Compact and Non-compact Yamabe Problems
Authors:
Jie Xu
Abstract:
Let $ (M, g) $ be a compact manifold or a complete non-compact manifold without boundary, $ \dim M \geqslant 4 $, and not locally conformally flat. In this article, we introduce a new local method to resolve the Yamabe problem on compact manifold for dimensions at least $ 4 $, and the Yamabe problem on non-compact complete manifolds without boundary, which are pointwise conformal to subsets of som…
▽ More
Let $ (M, g) $ be a compact manifold or a complete non-compact manifold without boundary, $ \dim M \geqslant 4 $, and not locally conformally flat. In this article, we introduce a new local method to resolve the Yamabe problem on compact manifold for dimensions at least $ 4 $, and the Yamabe problem on non-compact complete manifolds without boundary, which are pointwise conformal to subsets of some compact manifolds. In particular, the new local method applies to the hard cases--the Yamabe constants are positive. Our local method also generalizes Brezis and Nirenberg's nonlinear eigenvalue problem to subsets of manifolds.
△ Less
Submitted 22 November, 2024; v1 submitted 17 October, 2024;
originally announced October 2024.