-
Arbitrarily Small Execution-Time Certificate: What was Missed in Analog Optimization
Authors:
Liang Wu,
Ambrose Adegbege,
Yongduan Song,
Richard D. Braatz
Abstract:
Numerical optimization (solving optimization problems using digital computers) currently dominates, but has three major drawbacks: high energy consumption, poor scalability, and lack of an execution time certificate. To address these challenges, this article explores the recent resurgence of analog computers, proposing a novel paradigm of arbitrarily small execution-time-certified analog optimizat…
▽ More
Numerical optimization (solving optimization problems using digital computers) currently dominates, but has three major drawbacks: high energy consumption, poor scalability, and lack of an execution time certificate. To address these challenges, this article explores the recent resurgence of analog computers, proposing a novel paradigm of arbitrarily small execution-time-certified analog optimization (solving optimization problems via analog computers). To achieve ultra-low energy consumption, this paradigm transforms optimization problems into ordinary differential equations (ODEs) and leverages the ability of analog computers to naturally solve ODEs (no need for time-discretization) in physically real time. However, this transformation can fail if the optimization problem, such as the general convex nonlinear programs (NLPs) considered in this article, has no feasible solution. To avoid transformation failure and enable infeasibility detection, this paradigm introduces the homogeneous monotone complementarity problem formulation for convex NLPs. To achieve scalability and execution time certificate, this paradigm introduces the Newton-based fixed-time-stable scheme for the transformed ODE, whose equilibrium time $T_p$ can be prescribed by choosing the ODE's time coefficient as $k=\fracπ{2T_p}$. This equation certifies that the equilibrium time (execution time) is independent of the dimension of optimization problems and can be arbitrarily small if the analog computer allows.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Note on a sum involving the divisor function
Authors:
Liuying Wu
Abstract:
Let $d(n)$ be the divisor function and denote by $[t]$ the integral part of the real number $t$. In this paper, we prove that $$\sum_{n\leq x^{1/c}}d\left(\left[\frac{x}{n^c}\right]\right)=d_cx^{1/c}+\mathcal{O}_{\varepsilon,c} \left(x^{\max\{(2c+2)/(2c^2+5c+2),5/(5c+6)\}+\varepsilon}\right),$$ where $d_c=\sum_{k\geq1}d(k)\left(\frac{1}{k^{1/c}}-\frac{1}{(k+1)^{1/c}}\right)$ is a constant. This re…
▽ More
Let $d(n)$ be the divisor function and denote by $[t]$ the integral part of the real number $t$. In this paper, we prove that $$\sum_{n\leq x^{1/c}}d\left(\left[\frac{x}{n^c}\right]\right)=d_cx^{1/c}+\mathcal{O}_{\varepsilon,c} \left(x^{\max\{(2c+2)/(2c^2+5c+2),5/(5c+6)\}+\varepsilon}\right),$$ where $d_c=\sum_{k\geq1}d(k)\left(\frac{1}{k^{1/c}}-\frac{1}{(k+1)^{1/c}}\right)$ is a constant. This result constitutes an improvement upon that of Feng.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Endpoint boundedness of singular integrals: CMO space associated to Schrödinger operators
Authors:
Xueting Han,
Ji Li,
Liangchuan Wu
Abstract:
Let $ \mathcal{L} = -Δ+ V $ be a Schrödinger operator acting on $ L^2(\mathbb{R}^n) $, where the nonnegative potential $ V $ belongs to the reverse Hölder class $ RH_q $ for some $ q \geq n/2 $. This article is primarily concerned with the study of endpoint boundedness for classical singular integral operators in the context of the space $ \mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $, consisting of…
▽ More
Let $ \mathcal{L} = -Δ+ V $ be a Schrödinger operator acting on $ L^2(\mathbb{R}^n) $, where the nonnegative potential $ V $ belongs to the reverse Hölder class $ RH_q $ for some $ q \geq n/2 $. This article is primarily concerned with the study of endpoint boundedness for classical singular integral operators in the context of the space $ \mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $, consisting of functions of vanishing mean oscillation associated with $ \mathcal{L} $.
We establish the following main results: (i) the standard Hardy--Littlewood maximal operator is bounded on $\mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $; (ii) for each $ j = 1, \ldots, n$, the adjoint of the Riesz transform $ \partial_j \mathcal{L}^{-1/2} $ is bounded from $ C_0(\mathbb{R}^n) $ into $ \mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $; and (iii) the approximation to the identity generated by the Poisson and heat semigroups associated with $ \mathcal{L} $ characterizes $ \mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $ appropriately.
These results recover the classical analogues corresponding to the Laplacian as a special case. However, the presence of the potential $ V $ introduces substantial analytical challenges, necessitating tools beyond the scope of classical Calderón--Zygmund theory. Our approach leverages precise heat kernel estimates and the structural properties of $ \mathrm{CMO}_{\mathcal{L}}(\mathbb{R}^n) $ established by Song and the third author in [J. Geom. Anal. 32 (2022), no. 4, Paper No. 130, 37 pp].
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Quasi-stationarity of the Dyson Brownian Motion With Collisions
Authors:
Arnaud Guillin,
Boris Nectoux,
Liming Wu
Abstract:
In this work, we investigate the ergodic behavior of a system of particules, subject to collisions, before it exits a fixed subdomain of its state space. This system is composed of several one-dimensional ordered Brownian particules in interaction with electrostatic repulsions, which is usually referred as the (generalized) Dyson Brownian motion. The starting points of our analysis are the work [E…
▽ More
In this work, we investigate the ergodic behavior of a system of particules, subject to collisions, before it exits a fixed subdomain of its state space. This system is composed of several one-dimensional ordered Brownian particules in interaction with electrostatic repulsions, which is usually referred as the (generalized) Dyson Brownian motion. The starting points of our analysis are the work [E. C{é}pa and D. L{é}pingle, 1997 Probab. Theory Relat. Fields] which provides existence and uniqueness of such a system subject to collisions via the theory of multivalued SDEs and a Krein-Rutman type theorem derived in [A. Guillin, B. Nectoux, L. Wu, 2020 J. Eur. Math. Soc.].
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
The fibered rotation number for ergodic symplectic cocycles and its applications: I. Gap Labelling Theorem
Authors:
Xianzhe Li,
Li Wu
Abstract:
Let $ (Θ,T,μ) $ be an ergodic topological dynamical system. The fibered rotation number for cocycles in $ Θ\times \mathrm{SL}(2,\mathbb{R}) $, acting on $ Θ\times \mathbb{R}\mathbb{P}^1
$ is well-defined and has wide applications in the study of the spectral theory of Schrödinger operators. In this paper, we will provide its natural generalization for higher dimensional cocycles in…
▽ More
Let $ (Θ,T,μ) $ be an ergodic topological dynamical system. The fibered rotation number for cocycles in $ Θ\times \mathrm{SL}(2,\mathbb{R}) $, acting on $ Θ\times \mathbb{R}\mathbb{P}^1
$ is well-defined and has wide applications in the study of the spectral theory of Schrödinger operators. In this paper, we will provide its natural generalization for higher dimensional cocycles in $ Θ\times\mathrm{SP}(2m,\mathbb{R}) $ or $ Θ\times \mathrm{HSP}(2m,\mathbb{C}) $, where $ \mathrm{SP}(2m,\mathbb{R}) $ and $ \mathrm{HSP}(2m,\mathbb{C}) $ respectively refer to the $ 2m $-dimensional symplectic or Hermitian-symplectic matrices. As a corollary, we establish the equivalence between the integrated density of states for generalized Schrödinger operators and the fibered rotation number; and the Gap Labelling Theorem via the Schwartzman group, as expected from the one dimensional case [AS1983, JM1982].
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Authors:
Jinbo Wang,
Mingze Wang,
Zhanpeng Zhou,
Junchi Yan,
Weinan E,
Lei Wu
Abstract:
Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is important. In this paper, we uncover a clear Sharpness Disparity across these blocks, which emerges early in training and intriguingly persists throughout the train…
▽ More
Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is important. In this paper, we uncover a clear Sharpness Disparity across these blocks, which emerges early in training and intriguingly persists throughout the training process. Motivated by this finding, we propose Blockwise Learning Rate (LR), a strategy that tailors the LR to each block's sharpness, accelerating large language model (LLM) pre-training. By integrating Blockwise LR into AdamW, we consistently achieve lower terminal loss and nearly $2\times$ speedup compared to vanilla AdamW. We demonstrate this acceleration across GPT-2 and LLaMA, with model sizes ranging from 0.12B to 1.1B and datasets of OpenWebText and MiniPile. Finally, we incorporate Blockwise LR into Adam-mini (Zhang et al., 2024), a recently proposed memory-efficient variant of Adam, achieving a combined $2\times$ speedup and $2\times$ memory saving. These results underscore the potential of exploiting the sharpness disparity to improve LLM training.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
The fractional Riesz transform and their commutator in Dunkl setting
Authors:
Yanping Chen,
Xueting Han,
Liangchuan Wu
Abstract:
In this paper, we study the boundedness of the fractional Riesz transforms in the Dunkl setting. Moreover, we establish the necessary and sufficient conditions for the boundedness of their commutator with respect to the central BMO space associated with Euclidean metric and the BMO space associated with Dunkl metric, respectively. Based on this, we further characterize the compactness of the commu…
▽ More
In this paper, we study the boundedness of the fractional Riesz transforms in the Dunkl setting. Moreover, we establish the necessary and sufficient conditions for the boundedness of their commutator with respect to the central BMO space associated with Euclidean metric and the BMO space associated with Dunkl metric, respectively. Based on this, we further characterize the compactness of the commutator in terms of the corresponding types of VMO spaces.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
EIQP: Execution-time-certified and Infeasibility-detecting QP Solver
Authors:
Liang Wu,
Wei Xiao,
Richard D. Braatz
Abstract:
Solving real-time quadratic programming (QP) is a ubiquitous task in control engineering, such as in model predictive control and control barrier function-based QP. In such real-time scenarios, certifying that the employed QP algorithm can either return a solution within a predefined level of optimality or detect QP infeasibility before the predefined sampling time is a pressing requirement. This…
▽ More
Solving real-time quadratic programming (QP) is a ubiquitous task in control engineering, such as in model predictive control and control barrier function-based QP. In such real-time scenarios, certifying that the employed QP algorithm can either return a solution within a predefined level of optimality or detect QP infeasibility before the predefined sampling time is a pressing requirement. This article considers convex QP (including linear programming) and adopts its homogeneous formulation to achieve infeasibility detection. Exploiting this homogeneous formulation, this article proposes a novel infeasible interior-point method (IPM) algorithm with the best theoretical $O(\sqrt{n})$ iteration complexity that feasible IPM algorithms enjoy. The iteration complexity is proved to be \textit{exact} (rather than an upper bound), \textit{simple to calculate}, and \textit{data independent}, with the value $\left\lceil\frac{\log(\frac{n+1}ε)}{-\log(1-\frac{0.414213}{\sqrt{n+1}})}\right\rceil$ (where $n$ and $ε$ denote the number of constraints and the predefined optimality level, respectively), making it appealing to certify the execution time of online time-varying convex QPs. The proposed algorithm is simple to implement without requiring a line search procedure (uses the full Newton step), and its C-code implementation (offering MATLAB, Julia, and Python interfaces) and numerical examples are publicly available at https://github.com/liangwu2019/EIQP.
△ Less
Submitted 14 February, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
The uniform quantitive weighted boundedness of fractional Marcinkiewicz integral and its commutator
Authors:
Huoxiong Wu,
Lin Wu
Abstract:
Suppose that $Ω\in L^{\infty}(\mathbb{S} ^{n-1})$ is homogeneous of degree zero with mean value zero. Then we consider a fractional type Marcinkiewicz integral operator $$μ_{Ω,β}f(x) = \left ( \int_{0}^{\infty } \left | \int_{\left | x-y \right |\le t }^{} \frac{Ω(x-y)}{\left | x-y \right |^{n-1-β} } f(y)dy \right | ^{2}\frac{dt}{t^3} \right )^{\frac{1}{2} },\quad 0<β<n.$$ Our main contribution is…
▽ More
Suppose that $Ω\in L^{\infty}(\mathbb{S} ^{n-1})$ is homogeneous of degree zero with mean value zero. Then we consider a fractional type Marcinkiewicz integral operator $$μ_{Ω,β}f(x) = \left ( \int_{0}^{\infty } \left | \int_{\left | x-y \right |\le t }^{} \frac{Ω(x-y)}{\left | x-y \right |^{n-1-β} } f(y)dy \right | ^{2}\frac{dt}{t^3} \right )^{\frac{1}{2} },\quad 0<β<n.$$ Our main contribution is the quantitive weighted result of the classical Marcinkiewicz integral $μ_Ω$ proved by Hu and Qu [Math. Ineq. appl., 22(2019), 885-899] can be recovered from the quantitative weighted estimates of $μ_{Ω,β}$ in this paper when $β\to 0^+$. As inference, we also gives the uniform quantitive weighted bounds for the corresponding fractional commutators of $μ_{Ω,β}$ when $β\rightarrow 0^+$.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Refined regularity for nonlocal elliptic equations and applications
Authors:
Wenxiong Chen,
Congming Li,
Leyun Wu,
Zhouping Xin
Abstract:
In this paper, we establish refined regularity estimates for nonnegative solutions to the fractional Poisson equation $$ (-Δ)^s u(x) =f(x),\,\, x\in B_1(0). $$ Specifically, we have derived Hölder, Schauder, and Ln-Lipschitz regularity estimates for any nonnegative solution $u,$ provided that only the local $L^\infty$ norm of $u$ is bounded. These estimates stand in sharp contrast to the existing…
▽ More
In this paper, we establish refined regularity estimates for nonnegative solutions to the fractional Poisson equation $$ (-Δ)^s u(x) =f(x),\,\, x\in B_1(0). $$ Specifically, we have derived Hölder, Schauder, and Ln-Lipschitz regularity estimates for any nonnegative solution $u,$ provided that only the local $L^\infty$ norm of $u$ is bounded. These estimates stand in sharp contrast to the existing results where the global $L^\infty$ norm of $u$ is required. Our findings indicate that the local values of the solution $u$ and $f$ are sufficient to control the local values of higher order derivatives of $u$. Notably, this makes it possible to establish a priori estimates in unbounded domains by using blowing up and re-scaling argument.
As applications, we derive singularity and decay estimates for solutions to some super-linear nonlocal problems in unbounded domains, and in particular, we obtain a priori estimates for a family of fractional Lane-Emden type equations in $\mathbb{R}^n.$ This is achieved by adopting a different method using auxiliary functions, which is applicable to both local and nonlocal problems.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Point Cloud Neural Operator for Parametric PDEs on Complex and Variable Geometries
Authors:
Chenyu Zeng,
Yanshu Zhang,
Jiayi Zhou,
Yuhan Wang,
Zilin Wang,
Yuhao Liu,
Lei Wu,
Daniel Zhengyu Huang
Abstract:
Surrogate models are critical for accelerating computationally expensive simulations in science and engineering, particularly for solving parametric partial differential equations (PDEs). Developing practical surrogate models poses significant challenges, particularly in handling geometrically complex and variable domains, which are often discretized as point clouds. In this work, we systematicall…
▽ More
Surrogate models are critical for accelerating computationally expensive simulations in science and engineering, particularly for solving parametric partial differential equations (PDEs). Developing practical surrogate models poses significant challenges, particularly in handling geometrically complex and variable domains, which are often discretized as point clouds. In this work, we systematically investigate the formulation of neural operators -- maps between infinite-dimensional function spaces -- on point clouds to better handle complex and variable geometries while mitigating discretization effects. We introduce the Point Cloud Neural Operator (PCNO), designed to efficiently approximate solution maps of parametric PDEs on such domains. We evaluate the performance of PCNO on a range of pedagogical PDE problems, focusing on aspects such as boundary layers, adaptively meshed point clouds, and variable domains with topological variations. Its practicality is further demonstrated through three-dimensional applications, such as predicting pressure loads on various vehicle types and simulating the inflation process of intricate parachute structures.
△ Less
Submitted 15 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
A real-time battle situation intelligent awareness system based on Meta-learning & RNN
Authors:
Yuchun Li,
Zihan Lin,
Xize Wang,
Chunyang Liu,
Liaoyuan Wu,
Fang Zhang
Abstract:
In modern warfare, real-time and accurate battle situation analysis is crucial for making strategic and tactical decisions. The proposed real-time battle situation intelligent awareness system (BSIAS) aims at meta-learning analysis and stepwise RNN (recurrent neural network) modeling, where the former carries out the basic processing and analysis of battlefield data, which includes multi-steps suc…
▽ More
In modern warfare, real-time and accurate battle situation analysis is crucial for making strategic and tactical decisions. The proposed real-time battle situation intelligent awareness system (BSIAS) aims at meta-learning analysis and stepwise RNN (recurrent neural network) modeling, where the former carries out the basic processing and analysis of battlefield data, which includes multi-steps such as data cleansing, data fusion, data mining and continuously updates, and the latter optimizes the battlefield modeling by stepwise capturing the temporal dependencies of data set. BSIAS can predict the possible movement from any side of the fence and attack routes by taking a simulated battle as an example, which can be an intelligent support platform for commanders to make scientific decisions during wartime. This work delivers the potential application of integrated BSIAS in the field of battlefield command & analysis engineering.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
A formula of local Maslov index and applications
Authors:
Li Wu,
Chaofeng Zhu
Abstract:
In this paper, we explicitly express the local Maslov index by a Maslov index in finite dimensional case without symplectic reduction. Then we calculate the Maslov index for the path of pairs of Lagrangian subspaces in triangular form. In particular, we get the Maslov-type index of a given symplectic path in triangle form. As applications, we calculate the splitting numbers of the symplectic matri…
▽ More
In this paper, we explicitly express the local Maslov index by a Maslov index in finite dimensional case without symplectic reduction. Then we calculate the Maslov index for the path of pairs of Lagrangian subspaces in triangular form. In particular, we get the Maslov-type index of a given symplectic path in triangle form. As applications, we calculate the splitting numbers of the symplectic matrix in triangle form, dependence of iteration theory on triangular frames and mod 2 Maslov-type index for a real symplectic path. We study the continuity of families of bounded linear relations and families of bounded linear operators acting on closed linear subspaces as technique preparations.
△ Less
Submitted 26 January, 2025; v1 submitted 19 January, 2025;
originally announced January 2025.
-
Poincaré polynomials of moduli spaces of one-dimensional sheaves on the projective plane
Authors:
Shuai Guo,
Longting Wu,
with an appendix by Miguel Moreira
Abstract:
Let $M_β$ denote the moduli space of stable one-dimensional sheaves on a del Pezzo surface $S$, supported on curves of class $β$ with Euler characteristic one. We show that the divisibility property of the Poincaré polynomial of $M_β$, proposed by Choi-van Garrel-Katz-Takahashi follows from Bousseau's conjectural refined sheaves/Gromov-Witten correspondence. Since this correspondence is known for…
▽ More
Let $M_β$ denote the moduli space of stable one-dimensional sheaves on a del Pezzo surface $S$, supported on curves of class $β$ with Euler characteristic one. We show that the divisibility property of the Poincaré polynomial of $M_β$, proposed by Choi-van Garrel-Katz-Takahashi follows from Bousseau's conjectural refined sheaves/Gromov-Witten correspondence. Since this correspondence is known for $S=\mathbb{P}^2$, our result proves Choi-van Garrel-Katz-Takahashi's conjecture in this case.
For $S=\mathbb{P}^2$, our proof also introduces a novel approach to computing the Poincaré polynomials using Gromov-Witten invariants of local $\mathbb{P}^2$ and a local elliptic curve. Specifically, we compute the Poincaré polynomials of $M_{d}$ with degrees $d\leq 16$ and derive a closed formula for the leading Betti numbers $b_i(M_d)$ with $d\geq 6$ and $i\leq 4d-22$. We also propose a conjectural formula for the leading Betti numbers $b_i(M_d)$ with $d\geq 4$ and $i\leq 6d-20$. In the Appendix (by M. Moreira), a more general conjecture concerning the higher range Betti numbers of $M_{d}$ is presented, along with another conjecture that involves refinements from the perverse/Chern filtration.
△ Less
Submitted 9 March, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
On Coordinated Drone-Courier Logistics for Intra-city Express Services
Authors:
Shuiwang Chen,
Kai Wang,
Lingxiao Wu,
Wei Qi
Abstract:
Problem definition: Drones, despite being acknowledged as a transformative force in the city logistics sector, are unable to execute the \textit{last-meter delivery} (unloading goods directly to customers' doorsteps) due to airspace restrictions and safety concerns. To leverage advancements and overcome the limitations of drones in providing intra-city express services, we introduce a coordinated…
▽ More
Problem definition: Drones, despite being acknowledged as a transformative force in the city logistics sector, are unable to execute the \textit{last-meter delivery} (unloading goods directly to customers' doorsteps) due to airspace restrictions and safety concerns. To leverage advancements and overcome the limitations of drones in providing intra-city express services, we introduce a coordinated drone-courier logistics system where drones operate within a closed network among vertiports, while couriers connect customers to the drone delivery system. This paper aims to shed light on this coordinated system in terms of system feasibility, network interactivity, and long-term sustainability. Methodology/Results: We develop an integrated optimization model to optimize the network planning of the coordinated logistics system. The interplay between network planning and tactical operations is mirrored by a queueing network model, resulting in the nonlinear and nonconvex (partially convex and partially concave) feasible region of the optimization model. An iterative exact algorithm that tightens lower and upper bounds by adaptively refining the linear approximations of nonlinear constraints is developed to provide optimality-guaranteed solutions with finite convergence. The computational experiments demonstrate the scalability and robustness of our algorithm across various network configurations and scenarios.Managerial implications: The case study, based on a real-world dataset from SF Express, a logistics giant in China, validates that the coordinated logistics system efficiently attains cost and time savings by leveraging the effective turnover of drones and the coordination between drones and couriers. The optimal network design features a concentrated structure, streamlining demand consolidation and reducing deadhead repositioning.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Weighted norm estimates of noncommutative Calderón-Zygmund operators
Authors:
Wenfei Fan,
Yong Jiao,
Lian Wu,
Dejian Zhou
Abstract:
This paper is devoted to studying weighted endpoint estimates of operator-valued singular integrals. Our main results include weighted weak-type $(1,1)$ estimate of noncommutative maximal Calderón-Zygmund operators, corresponding version of square functions and a weighted $H_1- L_1$ type inequality. All these results are obtained under the condition that the weight belonging to the Muchenhoupt…
▽ More
This paper is devoted to studying weighted endpoint estimates of operator-valued singular integrals. Our main results include weighted weak-type $(1,1)$ estimate of noncommutative maximal Calderón-Zygmund operators, corresponding version of square functions and a weighted $H_1- L_1$ type inequality. All these results are obtained under the condition that the weight belonging to the Muchenhoupt $A_1$ class and certain regularity assumptions imposed on kernels which are weaker than the Lipschitz condition.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
The $L_q$ Minkowski problem for $\mathbf{p}$-harmonic measure
Authors:
Hai Li,
Longyu Wu,
Baocheng Zhu
Abstract:
In this paper, we consider an extremal problem associated with the solution to a boundary value problem. Our main focus is on establishing a variational formula for a functional related to the $\mathbf{p}$-harmonic measure, from which a new measure is derived. This further motivates us to study the Minkowski problem for this new measure. As a main result, we prove the existence of solutions to the…
▽ More
In this paper, we consider an extremal problem associated with the solution to a boundary value problem. Our main focus is on establishing a variational formula for a functional related to the $\mathbf{p}$-harmonic measure, from which a new measure is derived. This further motivates us to study the Minkowski problem for this new measure. As a main result, we prove the existence of solutions to the $L_q$ Minkowski problem associated with the $\mathbf{p}$-harmonic measure for $0<q<1$ and $1<\mathbf{p}\ne n+1$.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Large deviations of the empirical measures of a strong-Feller Markov process inside a subset and quasi-ergodic distribution
Authors:
Arnaud Guillin,
Boris Nectoux,
Liming Wu
Abstract:
In this work, we establish, for a strong Feller process, the large deviation principle for the occupation measure conditioned not to exit a given subregion. The rate function vanishes only at a unique measure, which is the so-called quasi-ergodic distribution of the process in this subregion. In addition, we show that the rate function is the Dirichlet form in the particular case when the process…
▽ More
In this work, we establish, for a strong Feller process, the large deviation principle for the occupation measure conditioned not to exit a given subregion. The rate function vanishes only at a unique measure, which is the so-called quasi-ergodic distribution of the process in this subregion. In addition, we show that the rate function is the Dirichlet form in the particular case when the process is reversible. We apply our results to several stochastic processes such as the solutions of elliptic stochastic differential equations driven by a rotationally invariant $α$-stable process, the kinetic Langevin process, and the overdamped Langevin process driven by a Brownian motion.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
A Note on a Recent Attempt to Prove the Irrationality of $ζ(5)$
Authors:
Keyu Chen,
Wei He,
Yixin He,
Yuxiang Huang,
Yanyang Li,
Quanyu Tang,
Lei Wu,
Shenhao Xu,
Shuo Yang,
Zijun Yu
Abstract:
Recently Shekhar Suman [arXiv: 2407.07121v6 [math.GM] 3 Aug 2024] made an attempt to prove the irrationality of $ζ(5)$. But unfortunately the proof is not correct. In this note, we discuss the fallacy in the proof.
Recently Shekhar Suman [arXiv: 2407.07121v6 [math.GM] 3 Aug 2024] made an attempt to prove the irrationality of $ζ(5)$. But unfortunately the proof is not correct. In this note, we discuss the fallacy in the proof.
△ Less
Submitted 9 January, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Long time behavior of killed Feynman-Kac semigroups with singular Schr{ö}dinger potentials
Authors:
Arnaud Guillin,
D I Lu,
Boris Nectoux,
Liming Wu
Abstract:
In this work, we investigate the compactness and the long time behavior of killed Feynman-Kac semigroups of various processes arising from statistical physics with very general singular Schr{ö}dinger potentials. The processes we consider cover a large class of processes used in statistical physics, with strong links with quantum mechanics and (local or not) Schr{ö}dinger operators (including e.g.…
▽ More
In this work, we investigate the compactness and the long time behavior of killed Feynman-Kac semigroups of various processes arising from statistical physics with very general singular Schr{ö}dinger potentials. The processes we consider cover a large class of processes used in statistical physics, with strong links with quantum mechanics and (local or not) Schr{ö}dinger operators (including e.g. fractional Laplacians). For instance we consider solutions to elliptic differential equations, L{é}vy processes, the kinetic Langevin process with locally Lipschitz gradient fields, and systems of interacting L{é}vy particles. Our analysis relies on a Perron-Frobenius type theorem derived in a previous work [A. Guillin, B. Nectoux, L. Wu, 2020 J. Eur. Math. Soc.] for Feller kernels and on the tools introduced in [L. Wu, 2004, Probab. Theory Relat. Fields] to compute bounds on the essential spectral radius of a bounded nonnegative kernel.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
How Transformers Get Rich: Approximation and Dynamics Analysis
Authors:
Mingze Wang,
Ruoxi Yu,
Weinan E,
Lei Wu
Abstract:
Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remains limited. A recent work (Elhage et al., 2021) identified a ``rich'' in-context mechanism known as induction head, contrasting with ``lazy'' $n$-gram models that overlook long-range dependencies. In this work, we provide both approximation and dynamics a…
▽ More
Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remains limited. A recent work (Elhage et al., 2021) identified a ``rich'' in-context mechanism known as induction head, contrasting with ``lazy'' $n$-gram models that overlook long-range dependencies. In this work, we provide both approximation and dynamics analyses of how transformers implement induction heads. In the {\em approximation} analysis, we formalize both standard and generalized induction head mechanisms, and examine how transformers can efficiently implement them, with an emphasis on the distinct role of each transformer submodule. For the {\em dynamics} analysis, we study the training dynamics on a synthetic mixed target, composed of a 4-gram and an in-context 2-gram component. This controlled setting allows us to precisely characterize the entire training process and uncover an {\em abrupt transition} from lazy (4-gram) to rich (induction head) mechanisms as training progresses.
△ Less
Submitted 29 January, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
Smart energy management: process structure-based hybrid neural networks for optimal scheduling and economic predictive control in integrated systems
Authors:
Long Wu,
Xunyuan Yin,
Lei Pan,
Jinfeng Liu
Abstract:
Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. To address its operational challenges, we propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales. This neural network-based modeling approach develops time-series multi-layer perceptrons (ML…
▽ More
Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. To address its operational challenges, we propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales. This neural network-based modeling approach develops time-series multi-layer perceptrons (MLPs) for the operating units and integrates them with prior process knowledge about system structure and fundamental dynamics. This integration forms three hybrid NNs (long-term, slow, and fast MLPs) that predict the entire system dynamics across multiple time scales. Leveraging these MLPs, we design an NN-based scheduler and an NN-based economic model predictive control (NEMPC) framework to meet global operational requirements: rapid electrical power responsiveness to operators requests, adequate cooling supply to customers, and increased system profitability, while addressing the dynamic time-scale multiplicity present in IESs. The proposed day-ahead scheduler is formulated using the ReLU network-based MLP, which effectively represents IES performance under a broad range of conditions from a long-term perspective. The scheduler is then exactly recast into a mixed-integer linear programming problem for efficient evaluation. The real-time NEMPC, based on slow and fast MLPs, comprises two sequential distributed control agents: a slow NEMPC for the cooling-dominant subsystem with slower transient responses and a fast NEMPC for the power-dominant subsystem with faster responses. Extensive simulations demonstrate that the developed scheduler and NEMPC schemes outperform their respective benchmark scheduler and controller by about 25% and 40%. Together, they enhance overall system performance by over 70% compared to benchmark approaches.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Hölder regularity and Liouville Theorem for the Schrödinger equation with certain critical potentials, and applications to Dirichlet problems
Authors:
Bo Li,
Ji Li,
Liangchuan Wu
Abstract:
Let $(X,d,μ)$ be a metric measure space satisfying a doubling property with the upper/lower dimension $Q\ge n>1$, and admitting an $L^2$-Poincaré inequality. In this article, we establish the Hölder continuity and a Liouville-type theorem for the (elliptic-type) Schrödinger equation $$\mathbb L u(x,t)=-\partial^2_{t}u(x,t)+\mathcal L u(x,t)+V(x)u(x,t)=0,\quad x\in X,\, t\in\mathbb R, $$ where…
▽ More
Let $(X,d,μ)$ be a metric measure space satisfying a doubling property with the upper/lower dimension $Q\ge n>1$, and admitting an $L^2$-Poincaré inequality. In this article, we establish the Hölder continuity and a Liouville-type theorem for the (elliptic-type) Schrödinger equation $$\mathbb L u(x,t)=-\partial^2_{t}u(x,t)+\mathcal L u(x,t)+V(x)u(x,t)=0,\quad x\in X,\, t\in\mathbb R, $$ where $\mathcal L$ is a non-negative operator generated by a Dirichlet form on $X$, and the non-negative potential $V$ is a Muckenhoupt weight belonging to the reverse Hölder class ${RH}_q(X)$ for some $q>\max\{Q/2,1\}$. Note that $Q/2$ is critical for the regularity theory of $-Δ+V$ on $\mathbb{R}^Q$ ($Q\ge3$) by Shen's work in 1995, which hints the critical index of $V$ for the regularity results above on $X\times \mathbb R$ may be $(Q+1)/2$. Our results show that this critical index is in fact $\max\{Q/2,1\}$. Our approach primarily relies on the controllable growth of $V$ and the elliptic theory for the operator $\mathbb L$/$-\partial^2_{t}+\mathcal{L}$ on $X\times \mathbb R$, rather than the analogs for $\mathcal L+V$/$\mathcal{L}$ on $X$, under the critical index setting. As applications, we further obtain some characterizations for solutions to the Schrödinger equation $-\partial^2_{t}u+\mathcal L u+Vu=0$ in $X\times \mathbb R_+$ with boundary values in BMO/CMO/Morrey spaces related to $V$, improving previous results to the critical index $q>\max\{Q/2,1\}$.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
On maximal functions generated by Hörmander-type spectral multipliers
Authors:
Peng Chen,
Xixi Lin,
Liangchuan Wu,
Lixin Yan
Abstract:
Let $(X,d,μ)$ be a metric space with doubling measure and $L$ be a nonnegative self-adjoint operator on $L^2(X)$ whose heat kernel satisfies the Gaussian upper bound. We assume that there exists an $L$-harmonic function $h$ such that the semigroup $\exp(-tL)$, after applying the Doob transform related to $h$, satisfies the upper and lower Gaussian estimates. In this paper we apply the Doob transfo…
▽ More
Let $(X,d,μ)$ be a metric space with doubling measure and $L$ be a nonnegative self-adjoint operator on $L^2(X)$ whose heat kernel satisfies the Gaussian upper bound. We assume that there exists an $L$-harmonic function $h$ such that the semigroup $\exp(-tL)$, after applying the Doob transform related to $h$, satisfies the upper and lower Gaussian estimates. In this paper we apply the Doob transform and some techniques as in Grafakos-Honzík-Seeger \cite{GHS2006} to obtain an optimal $\sqrt{\log(1+N)}$ bound in $L^p$ for the maximal function $\sup_{1\leq i\leq N}|m_i(L)f|$ for multipliers $m_i,1\leq i\leq N,$ with uniform estimates. Based on this, we establish sufficient conditions on the bounded Borel function $m$ such that the maximal function $M_{m,L}f(x) = \sup_{t>0} |m(tL)f(x)|$ is bounded on $L^p(X)$.
The applications include Schrödinger operators with inverse square potential, Scattering operators, Bessel operators and Laplace-Beltrami operators.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Dynamical Sampling in Shift-Invariant Spaces Associated with multi-dimensional Special Affine Fourier Transform
Authors:
Meng Ning,
Li-Ping Wu,
Qing-yue Zhang,
Bei Liu
Abstract:
The Special Affine Fourier Transformation(SAFT), which generalizes several well-known unitary transformations, has been demonstrated as a valuable tool in signal processing and optics. In this paper, we explore the multivariate dynamical sampling problem in shift-invariant spaces associated with the multi-dimensional SAFT. Specifically, we derive a sufficient and necessary condition under which a…
▽ More
The Special Affine Fourier Transformation(SAFT), which generalizes several well-known unitary transformations, has been demonstrated as a valuable tool in signal processing and optics. In this paper, we explore the multivariate dynamical sampling problem in shift-invariant spaces associated with the multi-dimensional SAFT. Specifically, we derive a sufficient and necessary condition under which a function in a shift-invariant space can be stably recovered from its dynamical sampling measurements associated with the multi-dimensional SAFT . We also present a straightforward example to elucidate our main result.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Bernstein-Sato ideals
Authors:
Nero Budur,
Robin van der Veer,
Lei Wu,
Peng Zhou
Abstract:
In this paper, we review several results on the zero loci of Bernstein-Sato ideals related to singularities of hypersurfaces. This is an exposition for the Frontiers of Science Awards in Mathematics presenting results from one of our articles, with history, motivation, and further developments.
In this paper, we review several results on the zero loci of Bernstein-Sato ideals related to singularities of hypersurfaces. This is an exposition for the Frontiers of Science Awards in Mathematics presenting results from one of our articles, with history, motivation, and further developments.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Periodic Trading Activities in Financial Markets: Mean-field Liquidation Game with Major-Minor Players
Authors:
Yufan Chen,
Lan Wu,
Renyuan Xu,
Ruixun Zhang
Abstract:
Motivated by recent empirical findings on the periodic phenomenon of aggregated market volumes in equity markets, we aim to understand the causes and consequences of periodic trading activities through a game-theoretic perspective, examining market interactions among different types of participants. Specifically, we introduce a new mean-field liquidation game involving major and minor traders, whe…
▽ More
Motivated by recent empirical findings on the periodic phenomenon of aggregated market volumes in equity markets, we aim to understand the causes and consequences of periodic trading activities through a game-theoretic perspective, examining market interactions among different types of participants. Specifically, we introduce a new mean-field liquidation game involving major and minor traders, where the major trader evaluates her strategy against a periodic targeting strategy while a continuum of minor players trade against her. We establish the existence and uniqueness of an open-loop Nash equilibrium. In addition, we prove an O(1/sqrt N) approximation rate of the mean-field solution to the Nash equilibrium in a major-minor game with N minor players. In equilibrium, minor traders exhibit front-running behaviors in both the periodic and trend components of their strategies, reducing the major trader's profit. Such strategic interactions diminish the strength of periodicity in both overall trading volume and asset prices. Our model rationalizes observed periodic trading activities in the market and offers new insights into market dynamics.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
A complementary result on a singular mean field equation with a sign-changing potential function
Authors:
Lina Wu
Abstract:
In this note, we study the singular mean field equation defined on a Riemann surface with a sign-changing potential function. We prove if some singular sources happen to be placed on the zero-level curve of the potential function, a priori estimate can still be obtained. As a consequence of this estimate, existence and multiplicity results can still be obtained based on the topology of the manifol…
▽ More
In this note, we study the singular mean field equation defined on a Riemann surface with a sign-changing potential function. We prove if some singular sources happen to be placed on the zero-level curve of the potential function, a priori estimate can still be obtained. As a consequence of this estimate, existence and multiplicity results can still be obtained based on the topology of the manifold.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Global Well-Posedness of Contact Lines: 2D Navier-Stokes Flow
Authors:
Yan Guo,
Ian Tice,
Lei Wu,
Yunrui Zheng
Abstract:
Based on the global a priori estimates in [Guo-Tice, J. Eur. Math. Soc. (2024)], we establish the well-posedness of a viscous fluid model satisfying the dynamic law for the contact line \begin{equation*} \mathscr{W}(\p_tζ(\pm\ell,t))=[\![γ]\!]\mpσ\frac{\p_1ζ}{(1+|\p_1ζ|^2)^{1/2}}(\pm\ell,t) \end{equation*} in 2D domain, where $ζ(x_1,t)$ is a free surface with two contact points $ζ(\pm\ell,t)$,…
▽ More
Based on the global a priori estimates in [Guo-Tice, J. Eur. Math. Soc. (2024)], we establish the well-posedness of a viscous fluid model satisfying the dynamic law for the contact line \begin{equation*} \mathscr{W}(\p_tζ(\pm\ell,t))=[\![γ]\!]\mpσ\frac{\p_1ζ}{(1+|\p_1ζ|^2)^{1/2}}(\pm\ell,t) \end{equation*} in 2D domain, where $ζ(x_1,t)$ is a free surface with two contact points $ζ(\pm\ell,t)$, $[\![γ]\!]$ and $σ$ are constants characterizing the solid-fluid-gas free energy, and the increasing $\mathscr{W}$ is the contact point velocity response function. Motivated by the energy-dissipation structure, our construction relies on the construction of a pressureless weak solution for the coupled velocity and free interface for the linearized problems, via a Galerkin approximation with a time-dependent basis and an artificial regularization for the capillary operator.
△ Less
Submitted 30 July, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Global dynamics for the generalized chemotaxis-Navier-Stokes system in $\mathbb{R}^3$
Authors:
Qingyou He,
Ling-Yun Shou,
Leyun Wu
Abstract:
We consider the chemotaxis-Navier-Stokes system with generalized fluid dissipation in $\mathbb{R}^3$:
\begin{eqnarray*}
\begin{cases} \partial_t n+u\cdot \nabla n=Δn- \nabla \cdot (χ(c)n \nabla c),\\ \partial_t c+u \cdot \nabla c=Δc-nf(c),\\ \partial_t u +u \cdot \nabla u+\nabla P=-(-Δ)^αu-n\nabla φ,\\ \nabla \cdot u=0,
\end{cases} \end{eqnarray*} which describes the motion of swimming bacte…
▽ More
We consider the chemotaxis-Navier-Stokes system with generalized fluid dissipation in $\mathbb{R}^3$:
\begin{eqnarray*}
\begin{cases} \partial_t n+u\cdot \nabla n=Δn- \nabla \cdot (χ(c)n \nabla c),\\ \partial_t c+u \cdot \nabla c=Δc-nf(c),\\ \partial_t u +u \cdot \nabla u+\nabla P=-(-Δ)^αu-n\nabla φ,\\ \nabla \cdot u=0,
\end{cases} \end{eqnarray*} which describes the motion of swimming bacteria or bacillus subtilis suspended to water flows. First, we prove some blow-up criteria of strong solutions to the Cauchy problem, including the Prodi-Serrin type criterion ($α>\frac{3}{4}$) and the Beir${\rm\tilde{a}}$o da Veiga type criterion $(α>\frac{1}{2})$. Then, we verify the global existence and uniqueness of strong solutions for arbitrarily large initial fluid velocity and bacteria density for $α\geq \frac{5}{4}$. Furthermore, in the scenario of $\frac{3}{4}<α<\frac{5}{4}$, we establish uniform regularity estimates and optimal time-decay rates of global solutions if the $L^2$-norm of initial data is small. To our knowledge, this is the first result concerning the global existence and large-time behavior of strong solutions for the chemotaxis-Navier-Stokes equations with possibly large oscillations.
△ Less
Submitted 7 August, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Improving Generalization and Convergence by Enhancing Implicit Regularization
Authors:
Mingze Wang,
Jinbo Wang,
Haotian He,
Zilin Wang,
Guanhua Huang,
Feiyu Xiong,
Zhiyu Li,
Weinan E,
Lei Wu
Abstract:
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I…
▽ More
In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
△ Less
Submitted 31 October, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Global-in-time maximal regularity for the Cauchy problem of the heat equation in BMO and applications
Authors:
Xuan Thinh Duong,
Ji Li,
Liangchuan Wu,
Lixin Yan
Abstract:
In this article, we establish global-in-time maximal regularity for the Cauchy problem of the classical heat equation $\partial_t u(x,t)-Δu(x,t)=f(x,t)$ with $u(x,0)=0$ in a certain $\rm BMO$ setting, which improves the local-in-time result initially proposed by Ogawa and Shimizu in \cite{OS, OS2}. In further developing our method originally formulated for the heat equation, we obtain analogous gl…
▽ More
In this article, we establish global-in-time maximal regularity for the Cauchy problem of the classical heat equation $\partial_t u(x,t)-Δu(x,t)=f(x,t)$ with $u(x,0)=0$ in a certain $\rm BMO$ setting, which improves the local-in-time result initially proposed by Ogawa and Shimizu in \cite{OS, OS2}. In further developing our method originally formulated for the heat equation, we obtain analogous global ${\rm BMO}$-maximal regularity associated to the Schrödinger operator $\mathcal L=-Δ+V$, where the nonnegative potential $V$ belongs to the reverse Hölder class ${\rm RH}_q$ for some $q> n/2$. This extension includes several inhomogeneous estimates as ingredients, such as Carleson-type estimates for the external forces.
Our new methodology is to exploit elaborate heat kernel estimates, along with matched space-time decomposition on the involving integral-type structure of maximal operators, as well as some global techniques such as those from de Simon's work and Schur's lemma. One crucial trick is to utilize the mean oscillation therein to contribute a higher and necessary decay order for global-in-time estimates.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
The modified Korteweg--de Vries limit of the Ablowitz--Ladik system
Authors:
Rowan Killip,
Zhimeng Ouyang,
Monica Visan,
Lei Wu
Abstract:
For slowly-varying initial data, solutions to the Ablowitz-Ladik system have been proven to converge to solutions of the cubic Schrödinger equation. In this paper we show that in the continuum limit, solutions to the Ablowitz-Ladik system with $H^1$ initial data may also converge to solutions of the modified Korteweg--de Vries equation. To exhibit this new limiting behavior, it suffices that the i…
▽ More
For slowly-varying initial data, solutions to the Ablowitz-Ladik system have been proven to converge to solutions of the cubic Schrödinger equation. In this paper we show that in the continuum limit, solutions to the Ablowitz-Ladik system with $H^1$ initial data may also converge to solutions of the modified Korteweg--de Vries equation. To exhibit this new limiting behavior, it suffices that the initial data is supported near the inflection points of the dispersion relation associated with the Ablowitz-Ladik system.
Our arguments employ harmonic analysis tools, Strichartz estimates, and the conservation of mass and energy. Correspondingly, they are applicable beyond the completely integrable models of greatest interest to us.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Communication Efficient Distributed Training with Distributed Lion
Authors:
Bo Liu,
Lemeng Wu,
Lizhang Chen,
Kaizhao Liang,
Jiaxu Zhu,
Chen Liang,
Raghuraman Krishnamoorthi,
Qiang Liu
Abstract:
The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires communicating binary or lower-precision vectors be…
▽ More
The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires communicating binary or lower-precision vectors between workers to the center server, significantly reducing the communication cost. Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that Distributed Lion presents a more favorable performance-bandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Schatten classes and commutators in the two weight setting, II. Riesz transforms
Authors:
Michael Lacey,
Ji Li,
Brett D. Wick,
Liangchuan Wu
Abstract:
We characterize the Schatten class $S^p$ of the commutator of Riesz transforms $[b,R_j]$ in $\mathbb R^n$ ($j=1,\ldots, n$) in the two weight setting for $n< p<\infty$, by introducing the condition that the symbol $b$ being in Besov spaces associated with the given two weights. At the critical index $p=n$, the commutator $[b,R_j]$ belongs to Schatten class $S^{n}$ if and only if $b$ is a constant,…
▽ More
We characterize the Schatten class $S^p$ of the commutator of Riesz transforms $[b,R_j]$ in $\mathbb R^n$ ($j=1,\ldots, n$) in the two weight setting for $n< p<\infty$, by introducing the condition that the symbol $b$ being in Besov spaces associated with the given two weights. At the critical index $p=n$, the commutator $[b,R_j]$ belongs to Schatten class $S^{n}$ if and only if $b$ is a constant, and to the weak Schatten class $S^{n,\infty}$ if and only if $b$ is in an oscillation sequence space associated with the given two weights. As a direct application, we have the Schatten class estimate for A. Connes' quantised derivative in the two weight setting.
△ Less
Submitted 13 December, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
A Parallel Vector-form $LDL^\top$ Decomposition for Accelerating Execution-time-certified $\ell_1$-penalty Soft-constrained MPC
Authors:
Liang Wu,
Liwei Zhou,
Richard D. Braatz
Abstract:
Handling possible infeasibility and providing an execution time certificate are two pressing requirements of real-time Model Predictive Control (MPC). To meet these two requirements simultaneously, this paper proposes an $\ell_1$-penalty soft-constrained MPC formulation that is globally feasible and solvable with an execution time certificate using our proposed algorithm. This paper proves for the…
▽ More
Handling possible infeasibility and providing an execution time certificate are two pressing requirements of real-time Model Predictive Control (MPC). To meet these two requirements simultaneously, this paper proposes an $\ell_1$-penalty soft-constrained MPC formulation that is globally feasible and solvable with an execution time certificate using our proposed algorithm. This paper proves for the first time that $\ell_1$-penalty soft-constrained MPC problems can be equivalently transformed into a box-constrained quadratic programming (Box-QP) and then our previous execution-time-certified algorithm \cite{wu2023direct} (only limited to Box-QP) can be applied. However, our previous Box-QP algorithm \cite{wu2023direct}, which provides a theoretical execution-time certificate, is conservative in its iteration analysis, thus sacrificing computation efficiency. To this end, this paper proposes a novel $LDL^\top$ decomposition for the first time, to accelerate the computation of Newton step at each iteration. The speedup of our $LDL^\top$ decomposition comes from two-fold: \textit{i)} exploitation of the fact that the number of inequality constraints is generally larger than the number of variables in condensed MPC formulations, \textit{ii)} vectorized and parallel implementation based on based on its vector-wise operations, instead of element-wise operations of previous decomposition methods. Numerical experiments demonstrate great speedups of the proposed $LDL^\top$ decomposition (even up to 1000-fold, compared to the standard Choleksky method), which thus helps our solver achieve comparable computation performance to the state-of-the-art solvers such as IPOPT and OSQP. Code is available at \url{https://github.com/liangwu2019/L1-penalty-QP}.
△ Less
Submitted 8 August, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Generalized Langevin and Nos{é}-Hoover processes absorbed at the boundary of a metastable domain
Authors:
Arnaud Guillin,
D I Lu,
Boris Nectoux,
Liming Wu
Abstract:
In this paper, we prove in a very weak regularity setting existence and uniqueness of quasi-stationary distributions as well as exponential convergence towards the quasi-stationary distribution for the generalized Langevin and the Nos{é}-Hoover processes, two processes which are widely used in molecular dynamics. The case of singular potentials is considered. With the techniques used in this work,…
▽ More
In this paper, we prove in a very weak regularity setting existence and uniqueness of quasi-stationary distributions as well as exponential convergence towards the quasi-stationary distribution for the generalized Langevin and the Nos{é}-Hoover processes, two processes which are widely used in molecular dynamics. The case of singular potentials is considered. With the techniques used in this work, we are also able to greatly improve existing results on quasi-stationary distributions for the kinetic Langevin process to a weak regularity setting.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
Authors:
Liu Ziyin,
Mingze Wang,
Hongchao Li,
Lei Wu
Abstract:
Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries -- a broad subclass of continuous symmetries present in the model architecture or loss function -- interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a ``Noether flow")…
▽ More
Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries -- a broad subclass of continuous symmetries present in the model architecture or loss function -- interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a ``Noether flow") of the parameters $θ$ along the degenerate direction to a unique initialization-independent fixed point $θ^*$. These points are referred to as the {\it noise equilibria} because, at these points, noise contributions from different directions are balanced and aligned. Then, we show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks and have practical implications for understanding techniques like representation normalization and warmup.
△ Less
Submitted 6 November, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Time-certified Input-constrained NMPC via Koopman Operator
Authors:
Liang Wu,
Krystian Ganko,
Richard D. Braatz
Abstract:
Determining solving-time certificates of nonlinear model predictive control (NMPC) implementations is a pressing requirement when deploying NMPC in production environments. Such a certificate guarantees that the NMPC controller returns a solution before the next sampling time. However, NMPC formulations produce nonlinear programs (NLPs) for which it is very difficult to derive their solving-time c…
▽ More
Determining solving-time certificates of nonlinear model predictive control (NMPC) implementations is a pressing requirement when deploying NMPC in production environments. Such a certificate guarantees that the NMPC controller returns a solution before the next sampling time. However, NMPC formulations produce nonlinear programs (NLPs) for which it is very difficult to derive their solving-time certificates. Our previous work, Wu and Braatz (2023), challenged this limitation with a proposed input-constrained MPC algorithm having exact iteration complexity but was restricted to linear MPC formulations. This work extends the algorithm to solve input-constrained NMPC problems, by using the Koopman operator and a condensing MPC technique. We illustrate the algorithm performance on a high-dimensional, nonlinear partial differential equation (PDE) control case study, in which we theoretically and numerically certify the solving time to be less than the sampling time.
△ Less
Submitted 26 February, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
The Local Landscape of Phase Retrieval Under Limited Samples
Authors:
Kaizhao Liu,
Zihao Wang,
Lei Wu
Abstract:
In this paper, we present a fine-grained analysis of the local landscape of phase retrieval under the regime of limited samples. Specifically, we aim to ascertain the minimal sample size required to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and estab…
▽ More
In this paper, we present a fine-grained analysis of the local landscape of phase retrieval under the regime of limited samples. Specifically, we aim to ascertain the minimal sample size required to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix has negative eigenvalues, provided $d$ is sufficiently large. % Consequently, the local landscape is highly non-convex. We next consider the one-point convexity and show that, as long as $n=ω(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent, initialized from any point in this domain, can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetildeΘ\left(\sqrt{1/d}\right)$ such that one-point convexity breaks down in the corresponding smaller local ball. This indicates an impossibility of establishing a convergence to the exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity.
△ Less
Submitted 11 October, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Authors:
Mingze Wang,
Zeping Min,
Lei Wu
Abstract:
In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient D…
▽ More
In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
△ Less
Submitted 25 December, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
PFA and the definability of the nonstationary ideal
Authors:
Stefan Hoffelner,
Paul Larson,
Ralf Schindler,
Liuzhen Wu
Abstract:
We produce, relative to a ${\sf ZFC}$ model with a supercompact cardinal, a ${\sf ZFC}$ model of the Proper Forcing Axiom in which the nonstationary ideal on $ω_1$ is $Π_1$-definable in a parameter from $H_{\aleph_2}$.
We produce, relative to a ${\sf ZFC}$ model with a supercompact cardinal, a ${\sf ZFC}$ model of the Proper Forcing Axiom in which the nonstationary ideal on $ω_1$ is $Π_1$-definable in a parameter from $H_{\aleph_2}$.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
A new class of partial orders
Authors:
Huihui Zhu,
Liyun Wu
Abstract:
Let $R$ be a unital $*$-ring. For any $a,w,b\in R$, we apply the defined $w$-core inverse to define a new class of partial orders in $R$, called the $w$-core partial order. Suppose $a,b\in R$ are $w$-core invertible. We say that $a$ is below $b$ under the $w$-core partial order, denoted by $a\overset{\tiny{\textcircled{\#}}}\leq_w b$, if…
▽ More
Let $R$ be a unital $*$-ring. For any $a,w,b\in R$, we apply the defined $w$-core inverse to define a new class of partial orders in $R$, called the $w$-core partial order. Suppose $a,b\in R$ are $w$-core invertible. We say that $a$ is below $b$ under the $w$-core partial order, denoted by $a\overset{\tiny{\textcircled{\#}}}\leq_w b$, if $a_w^{\tiny{\textcircled{\#}}} a=a_w^{\tiny{\textcircled{\#}}} b$ and $awa_w^{\tiny{\textcircled{\#}}} =bwa_w^{\tiny{\textcircled{\#}}}$, where $a_w^{\tiny{\textcircled{\#}}}$ denotes the $w$-core inverse of $a$. Characterizations of the $w$-core partial order are given. Also, the relationships with several types of partial orders are considered. In particular, we show that the core partial order coincides with the $a$-core partial order, and the star partial order coincides with the $a^*$-core partial order.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in the National Football League
Authors:
Nathan Sandholtz,
Lucas Wu,
Martin Puterman,
Timothy C. Y. Chan
Abstract:
For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using an inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics o…
▽ More
For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using an inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics of which we estimate from NFL play-by-play data from the 2014 through 2022 seasons. We assume that coaches' observed decisions are optimal but that the risk preferences governing their decisions are unknown. This yields an inverse decision problem for which the optimality criterion, or risk measure, of the MDP is the estimand. Using the quantile function to parameterize risk, we estimate which quantile-optimal policy yields the coaches' observed decisions as minimally suboptimal. In general, we find that coaches' fourth-down behavior is consistent with optimizing low quantiles of the next-state value distribution, which corresponds to conservative risk preferences. We also find that coaches exhibit higher risk tolerances when making decisions in the opponent's half of the field as opposed to their own half, and that league average fourth down risk tolerances have increased over time.
△ Less
Submitted 15 August, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Finding the spectral radius of a nonnegative irreducible symmetric tensor via DC programming
Authors:
Xueli Bai,
Dong-Hui Li,
Lei Wu,
Jiefeng Xu
Abstract:
The Perron-Frobenius theorem says that the spectral radius of an irreducible nonnegative tensor is the unique positive eigenvalue corresponding to a positive eigenvector. With this in mind, the purpose of this paper is to find the spectral radius and its corresponding positive eigenvector of an irreducible nonnegative symmetric tensor. By transferring the eigenvalue problem into an equivalent prob…
▽ More
The Perron-Frobenius theorem says that the spectral radius of an irreducible nonnegative tensor is the unique positive eigenvalue corresponding to a positive eigenvector. With this in mind, the purpose of this paper is to find the spectral radius and its corresponding positive eigenvector of an irreducible nonnegative symmetric tensor. By transferring the eigenvalue problem into an equivalent problem of minimizing a concave function on a closed convex set, which is typically a DC (difference of convex functions) programming, we derive a simpler and cheaper iterative method. The proposed method is well-defined. Furthermore, we show that both sequences of the eigenvalue estimates and the eigenvector evaluations generated by the method $Q$-linearly converge to the spectral radius and its corresponding eigenvector, respectively. To accelerate the method, we introduce a line search technique. The improved method retains the same convergence property as the original version. Preliminary numerical results show that the improved method performs quite well.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
A direct approach to sharp Li-Yau Estimates on closed manifolds with negative Ricci lower bound
Authors:
Xingyu Song,
Ling Wu,
Meng Zhu
Abstract:
Recently, Qi S.Zhang [26] has derived a sharp Li-Yau estimate for positive solutions of the heat equation on closed Riemannian manifolds with the Ricci curvature bounded below by a negative constant. The proof is based on an integral iteration argument which utilizes Hamilton's gradient estimate, heat kernel Gaussian bounds and parabolic Harnack inequality.
In this paper, we show that the sharp…
▽ More
Recently, Qi S.Zhang [26] has derived a sharp Li-Yau estimate for positive solutions of the heat equation on closed Riemannian manifolds with the Ricci curvature bounded below by a negative constant. The proof is based on an integral iteration argument which utilizes Hamilton's gradient estimate, heat kernel Gaussian bounds and parabolic Harnack inequality.
In this paper, we show that the sharp Li-Yau estimate can actually be obtained directly following the classical maximum principle argument, which simplifies the proof in [26]. In addition, we apply the same idea to the heat and conjugate heat equations under the Ricci flow and prove some Li-Yau type estimates with optimal coefficients.
△ Less
Submitted 24 August, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
A direct optimization algorithm for input-constrained MPC
Authors:
Liang Wu,
Richard D. Braatz
Abstract:
Providing an execution time certificate is a pressing requirement when deploying Model Predictive Control (MPC) in real-time embedded systems such as microcontrollers. Real-time MPC requires that its worst-case (maximum) execution time must be theoretically guaranteed to be smaller than the sampling time in closed-loop. This technical note considers input-constrained MPC problems and exploits the…
▽ More
Providing an execution time certificate is a pressing requirement when deploying Model Predictive Control (MPC) in real-time embedded systems such as microcontrollers. Real-time MPC requires that its worst-case (maximum) execution time must be theoretically guaranteed to be smaller than the sampling time in closed-loop. This technical note considers input-constrained MPC problems and exploits the structure of the resulting box-constrained QPs. Then, we propose a \textit{cost-free} and \textit{data-independent} initialization strategy, which enables us, for the first time, to remove the initialization assumption of feasible full-Newton interior-point algorithms. We prove that the number of iterations of our proposed algorithm is \textit{only dimension-dependent} (\textit{data-independent}), \textit{simple-calculated}, and \textit{exact} (not \textit{worst-case}) with the value $\left\lceil\frac{\log(\frac{2n}ε)}{-2\log(\frac{\sqrt{2n}}{\sqrt{2n}+\sqrt{2}-1})}\right\rceil \!+ 1$, where $n$ denotes the problem dimension and $ε$ denotes the constant stopping tolerance. These features enable our algorithm to trivially certify the execution time of nonlinear MPC (via online linearized schemes) or adaptive MPC problems. The execution-time-certified capability of our algorithm is theoretically and numerically validated through an open-loop unstable AFTI-16 example.
△ Less
Submitted 30 March, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Heat kernel estimate for the Laplace-Beltrami operator under Bakry-Émery Ricci curvature condition and applications
Authors:
Xingyu Song,
Ling Wu,
Meng Zhu
Abstract:
We establish a Gaussian upper bound of the heat kernel for the Laplace-Beltrami operator on complete Riemannian manifolds with Bakry-Émery Ricci curvature bounded below. As applications, we first prove an L^1-Liouville property for non-negative subharmonic functions when the potential function of the Bakry-Émery Ricci curvature tensor is of at most quadratic growth. Then we derive lower bounds of…
▽ More
We establish a Gaussian upper bound of the heat kernel for the Laplace-Beltrami operator on complete Riemannian manifolds with Bakry-Émery Ricci curvature bounded below. As applications, we first prove an L^1-Liouville property for non-negative subharmonic functions when the potential function of the Bakry-Émery Ricci curvature tensor is of at most quadratic growth. Then we derive lower bounds of the eigenvalues of the Laplace-Beltrami operator on closed manifolds. An upper bound of the bottom spectrum is also obtained.
△ Less
Submitted 26 June, 2023; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Learning Unnormalized Statistical Models via Compositional Optimization
Authors:
Wei Jiang,
Jiayu Qin,
Lingyu Wu,
Changyou Chen,
Tianbao Yang,
Lijun Zhang
Abstract:
Learning unnormalized statistical models (e.g., energy-based models) is computationally challenging due to the complexity of handling the partition function. To eschew this complexity, noise-contrastive estimation~(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. However, as found in previous works, NCE may perform poorly in many t…
▽ More
Learning unnormalized statistical models (e.g., energy-based models) is computationally challenging due to the complexity of handling the partition function. To eschew this complexity, noise-contrastive estimation~(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. However, as found in previous works, NCE may perform poorly in many tasks due to its flat loss landscape and slow convergence. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models from the perspective of compositional optimization. To tackle the partition function, a noise distribution is introduced such that the log partition function can be written as a compositional function whose inner function can be estimated with stochastic samples. Hence, the objective can be optimized by stochastic compositional optimization algorithms. Despite being a simple method, we demonstrate that it is more favorable than NCE by (1) establishing a fast convergence rate and quantifying its dependence on the noise distribution through the variance of stochastic estimators; (2) developing better results for one-dimensional Gaussian mean estimation by showing our objective has a much favorable loss landscape and hence our method enjoys faster convergence; (3) demonstrating better performance on multiple applications, including density estimation, out-of-distribution detection, and real image generation.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces
Authors:
Hongrui Chen,
Jihao Long,
Lei Wu
Abstract:
In this work, we analyze the learnability of reproducing kernel Hilbert spaces (RKHS) under the $L^\infty$ norm, which is critical for understanding the performance of kernel methods and random feature models in safety- and security-critical applications. Specifically, we relate the $L^\infty$ learnability of a RKHS to the spectrum decay of the associate kernel and both lower bounds and upper boun…
▽ More
In this work, we analyze the learnability of reproducing kernel Hilbert spaces (RKHS) under the $L^\infty$ norm, which is critical for understanding the performance of kernel methods and random feature models in safety- and security-critical applications. Specifically, we relate the $L^\infty$ learnability of a RKHS to the spectrum decay of the associate kernel and both lower bounds and upper bounds of the sample complexity are established. In particular, for dot-product kernels on the sphere, we identify conditions when the $L^\infty$ learning can be achieved with polynomial samples. Let $d$ denote the input dimension and assume the kernel spectrum roughly decays as $λ_k\sim k^{-1-β}$ with $β>0$. We prove that if $β$ is independent of the input dimension $d$, then functions in the RKHS can be learned efficiently under the $L^\infty$ norm, i.e., the sample complexity depends polynomially on $d$. In contrast, if $β=1/\mathrm{poly}(d)$, then the $L^\infty$ learning requires exponentially many samples.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.