-
Spectra of high-dimensional sparse random geometric graphs
Authors:
Yifan Cao,
Yizhe Zhu
Abstract:
We analyze the spectral properties of the high-dimensional random geometric graph $ G(n, d, p)$, formed by sampling $n$ i.i.d vectors $\{v_i\}_{i=1}^{n}$ uniformly on a $d$-dimensional unit sphere and connecting each pair $\{i,j\}$ whenever $\langle v_i, v_j \rangle \geq τ$ so that $p=\mathbb P(\langle v_i,v_j\rangle \geq τ)$. This model defines a nonlinear random matrix ensemble with dependent en…
▽ More
We analyze the spectral properties of the high-dimensional random geometric graph $ G(n, d, p)$, formed by sampling $n$ i.i.d vectors $\{v_i\}_{i=1}^{n}$ uniformly on a $d$-dimensional unit sphere and connecting each pair $\{i,j\}$ whenever $\langle v_i, v_j \rangle \geq τ$ so that $p=\mathbb P(\langle v_i,v_j\rangle \geq τ)$. This model defines a nonlinear random matrix ensemble with dependent entries. We show that if $d =ω( np\log^{2}(1/p))$ and $np\to\infty$, the limiting spectral distribution of the normalized adjacency matrix $\frac{A}{\sqrt{np(1-p)}}$ is the semicircle law. To our knowledge, this is the first such result for $G(n, d, p)$ in the sparse regime. In the constant sparsity case $p=α/n$, we further show that if $d=ω(\log^2(n))$ the limiting spectral distribution of $A$ in $G(n,d, α/n)$ coincides with that of the Erdős-Rényi graph $ G(n,α/n)$. Our approach combines the classical moment method in random matrix theory with a novel recursive decomposition of closed walk graphs, leveraging block cut trees and ear decompositions, to control $\mathbb E \mathrm{tr}(A^k)$. A refined high trace analysis further yields a near-optimal bound on the second eigenvalue when $np=Ω(\log^4 (n))$, removing technical conditions previously imposed in (Liu et al. 2023).
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
Quantum Learning and Estimation for Distribution Networks and Energy Communities Coordination
Authors:
Yingrui Zhuang,
Lin Cheng,
Yuji Cao,
Tongxin Li,
Ning Qi,
Yan Xu,
Yue Chen
Abstract:
Price signals from distribution networks (DNs) guide energy communities (ECs) to adjust energy usage, enabling effective coordination for reliable power system operation. However, this coordination faces significant challenges due to the limited availability of information (i.e., only the aggregated energy usage of ECs is available to DNs), and the high computational burden of accounting for uncer…
▽ More
Price signals from distribution networks (DNs) guide energy communities (ECs) to adjust energy usage, enabling effective coordination for reliable power system operation. However, this coordination faces significant challenges due to the limited availability of information (i.e., only the aggregated energy usage of ECs is available to DNs), and the high computational burden of accounting for uncertainties and the associated risks through numerous scenarios. To address these challenges, we propose a quantum learning and estimation approach to enhance coordination between DNs and ECs. Specifically, leveraging advanced quantum properties such as quantum superposition and entanglement, we develop a hybrid quantum temporal convolutional network-long short-term memory (Q-TCN-LSTM) model to establish an end-to-end mapping between ECs' responses and the price incentives from DNs. Moreover, we develop a quantum estimation method based on quantum amplitude estimation (QAE) and two phase-rotation circuits to significantly accelerate the optimization process under numerous uncertainty scenarios. Numerical experiments demonstrate that, compared to classical neural networks, the proposed Q-TCN-LSTM model improves the mapping accuracy by 69.2% while reducing the model size by 99.75% and the computation time by 93.9%. Compared to classical Monte Carlo simulation, QAE achieves comparable accuracy with a dramatic reduction in computational time (up to 99.99%) and requires significantly fewer computational resources.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Online Learning Control Strategies for Industrial Processes with Application for Loosening and Conditioning
Authors:
Yue Wu,
Jianfu Cao,
Ye Cao
Abstract:
This paper proposes a novel adaptive Koopman Model Predictive Control (MPC) framework, termed HPC-AK-MPC, designed to address the dual challenges of time-varying dynamics and safe operation in complex industrial processes. The framework integrates two core strategies: online learning and historically-informed safety constraints. To contend with process time-variance, a Recursive Extended Dynamic M…
▽ More
This paper proposes a novel adaptive Koopman Model Predictive Control (MPC) framework, termed HPC-AK-MPC, designed to address the dual challenges of time-varying dynamics and safe operation in complex industrial processes. The framework integrates two core strategies: online learning and historically-informed safety constraints. To contend with process time-variance, a Recursive Extended Dynamic Mode Decomposition (rEDMDc) technique is employed to construct an adaptive Koopman model capable of updating its parameters from real-time data, endowing the controller with the ability to continuously learn and track dynamic changes. To tackle the critical issue of safe operation under model uncertainty, we introduce a novel Historical Process Constraint (HPC) mechanism. This mechanism mines successful operational experiences from a historical database and, by coupling them with the confidence level of the online model, generates a dynamic "safety corridor" for the MPC optimization problem. This approach transforms implicit expert knowledge into explicit, adaptive constraints, establishing a dynamic balance between pursuing optimal performance and ensuring robust safety. The proposed HPC-AK-MPC method is applied to a real-world tobacco loosening and conditioning process and systematically validated using an "advisor mode" simulation framework with industrial data. Experimental results demonstrate that, compared to historical operations, the proposed method significantly improves the Process Capability Index (Cpk) for key quality variables across all tested batches, proving its substantial potential in enhancing control performance while guaranteeing operational safety.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Flux Globalization Based Well-Balanced Path-Conservative Central-Upwind Schemes for Shallow Water Linearized Moment Equations
Authors:
Yangyang Cao,
Qian Huang,
Julian Koellermeier,
Alexander Kurganov,
Yongle Liu
Abstract:
We develop second-order path-conservative central-upwind (PCCU) schemes for the hyperbolic shallow water linearized moment equations (HSWLME), which are an extension of standard depth-averaged models for free-surface flows. The proposed PCCU schemes are constructed via flux globalization strategies adapted to the nonconservative form via a path-conservative finite-volume method. The resulting sche…
▽ More
We develop second-order path-conservative central-upwind (PCCU) schemes for the hyperbolic shallow water linearized moment equations (HSWLME), which are an extension of standard depth-averaged models for free-surface flows. The proposed PCCU schemes are constructed via flux globalization strategies adapted to the nonconservative form via a path-conservative finite-volume method. The resulting scheme is well-balanced (WB) in the sense that it is capable of exactly preserving physically relevant steady states including moving-water ones. We validate the proposed scheme on several benchmarks, including smooth solutions, small perturbation of steady states, and dam-break scenarios. These results demonstrate that our flux globalization based WB PCCU schemes provide a reliable framework for computing solutions of shallow water moment models with nonlinear and nonconservative features.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality
Authors:
Yong-Syun Cao,
Shinpei Imori,
Ching-Kang Ing
Abstract:
Imori and Ing (2025) proposed the importance-weighted orthogonal greedy algorithm (IWOGA) for model selection in high-dimensional misspecified regression models under covariate shift. To determine the number of IWOGA iterations, they introduced the high-dimensional importance-weighted information criterion (HDIWIC). They argued that the combined use of IWOGA and HDIWIC, IWOGA + HDIWIC, achieves an…
▽ More
Imori and Ing (2025) proposed the importance-weighted orthogonal greedy algorithm (IWOGA) for model selection in high-dimensional misspecified regression models under covariate shift. To determine the number of IWOGA iterations, they introduced the high-dimensional importance-weighted information criterion (HDIWIC). They argued that the combined use of IWOGA and HDIWIC, IWOGA + HDIWIC, achieves an optimal trade-off between variance and squared bias, leading to optimal convergence rates in terms of conditional mean squared prediction error. In this article, we provide a theoretical justification for this claim by establishing the optimality of IWOGA + HDIWIC under a set of reasonable assumptions.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Optimal Control of Stochastic Partial Differential Equations with Partial Observations: Stochastic Maximum Principles and Numerical Approximation
Authors:
Yanzhao Cao,
Hongjiang Qian,
George Yin
Abstract:
This work establishes a general stochastic maximum principle for partially observed optimal control of semi-linear stochastic partial differential equations in a nonconvex control domain. The state evolves in a Hilbert space driven by a cylindrical Wiener process and finitely many Brownian motions, while observations are in an Euclidean space having correlated noise. For convex control domain and…
▽ More
This work establishes a general stochastic maximum principle for partially observed optimal control of semi-linear stochastic partial differential equations in a nonconvex control domain. The state evolves in a Hilbert space driven by a cylindrical Wiener process and finitely many Brownian motions, while observations are in an Euclidean space having correlated noise. For convex control domain and diffusion coefficients in the state being control-independent, numerical algorithms are developed to solve the partially observed optimal control problems using stochastic gradient descent algorithm combined with finite element approximations and the branching filtering algorithm. Numerical experiments are conducted for demonstration.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Computing the Tropical Abel--Jacobi Transform and Tropical Distances for Metric Graphs
Authors:
Yueqi Cao,
Anthea Monod
Abstract:
Metric graphs are important models for capturing the structure of complex data across various domains. While much effort has been devoted to extracting geometric and topological features from graph data, computational aspects of metric graphs as abstract tropical curves remains unexplored. In this paper, we present the first computational and machine learning-driven study of metric graphs from the…
▽ More
Metric graphs are important models for capturing the structure of complex data across various domains. While much effort has been devoted to extracting geometric and topological features from graph data, computational aspects of metric graphs as abstract tropical curves remains unexplored. In this paper, we present the first computational and machine learning-driven study of metric graphs from the perspective of tropical algebraic geometry. Specifically, we study the tropical Abel--Jacobi transform, a vectorization of points on a metric graph via the tropical Abel--Jacobi map into its associated flat torus, the tropical Jacobian. We develop algorithms to compute this transform and investigate how the resulting embeddings depend on different combinatorial models of the same metric graph.
Once embedded, we compute pairwise distances between points in the tropical Jacobian under two natural metrics: the tropical polarization distance and the Foster--Zhang distance. Computing these distances are generally NP-hard as they turn out to be linked to classical lattice problems in computational complexity, however, we identify a class of metric graphs where fast and explicit computations are feasible. For the general case, we propose practical algorithms for both exact and approximate distance matrix computations using lattice basis reduction and mixed-integer programming solvers. Our work lays the groundwork for future applications of tropical geometry and the tropical Abel--Jacobi transform in machine learning and data analysis.
△ Less
Submitted 17 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Local-in-time well-posedness for the regular solution to the 2D full compressible Navier-Stokes equations with degenerate viscosities and heat conductivity
Authors:
Yue Cao,
Xun Jiang
Abstract:
This paper considers the two-dimensional Cauchy problem of the full compressible Navier-Stokes equations with far-field vacuum in $\mathbb{R}^2$, where the viscosity and heat-conductivity coefficients depend on the absolute temperature $θ$ in the form of $θ^ν$ with $ν>0$. Due to the appearance of the vacuum, the momentum equation are both degenerate in the time evolution and spatial dissipation, w…
▽ More
This paper considers the two-dimensional Cauchy problem of the full compressible Navier-Stokes equations with far-field vacuum in $\mathbb{R}^2$, where the viscosity and heat-conductivity coefficients depend on the absolute temperature $θ$ in the form of $θ^ν$ with $ν>0$. Due to the appearance of the vacuum, the momentum equation are both degenerate in the time evolution and spatial dissipation, which makes the study on the well-posedness challenged. By establishing some new singular-weighted (negative powers of the density $ρ$) estimates of the solution, we establish the local-in-time well-posedness of the regular solution with far-field vacuum in terms of $ρ$, the velocity $u$ and the entropy $S$.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Optional intervals event, sequential operation and their applications in physics, computer science and applied mathematics
Authors:
Zhongyuan. Li,
Yanlei. Gong,
Lei. Yu,
Yue. Cao,
Bo. Yin
Abstract:
In this paper, we introduce algebraic theories such as set theory and group theory into the analysis of event execution order. We propose concepts like "optional intervals event" and "sequential operation", summarize their algebraic properties and draw Cayley tables. Based on these efforts, we offer new interpretations for certain physical phenomena and computer application scenarios. Finally, we…
▽ More
In this paper, we introduce algebraic theories such as set theory and group theory into the analysis of event execution order. We propose concepts like "optional intervals event" and "sequential operation", summarize their algebraic properties and draw Cayley tables. Based on these efforts, we offer new interpretations for certain physical phenomena and computer application scenarios. Finally, we present other issues derived from this paradigm. These concepts can deepen our understanding of motion and find applications in areas such as event arrangement, physical simulation, and computer modeling
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Numerical approximations for partially observed optimal control of stochastic partial differential equations
Authors:
Feng Bao,
Yanzhao Cao,
Hongjiang Qian
Abstract:
In this paper, we study numerical approximations for optimal control of a class of stochastic partial differential equations with partial observations. The system state evolves in a Hilbert space, whereas observations are given in finite-dimensional space $\rr^d$. We begin by establishing stochastic maximum principles (SMPs) for such problems, where the system state is driven by a cylindrical Wien…
▽ More
In this paper, we study numerical approximations for optimal control of a class of stochastic partial differential equations with partial observations. The system state evolves in a Hilbert space, whereas observations are given in finite-dimensional space $\rr^d$. We begin by establishing stochastic maximum principles (SMPs) for such problems, where the system state is driven by a cylindrical Wiener process. The corresponding adjoint equations are characterized by backward stochastic partial differential equations. We then develop numerical algorithms to solve the partially observed optimal control. Our approach combines the stochastic gradient descent method, guided by the SMP, with a particle filtering algorithm to estimate the conditional distributions of the state of the system. Finally, we demonstrate the effectiveness of our proposed algorithm through numerical experiments.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
Weak Convergence Analysis for the Finite Element Approximation to Stochastic Allen-Cahn Equation Driven by Multiplicative White Noise
Authors:
Minxing Zhang,
Yongkui Zou,
Ran Zhang,
Yanzhao Cao
Abstract:
In this paper, we aim to study the optimal weak convergence order for the finite element approximation to a stochastic Allen-Cahn equation driven by multiplicative white noise. We first construct an auxiliary equation based on the splitting-up technique and derive prior estimates for the corresponding Kolmogorov equation and obtain the strong convergence order of 1 in time between the auxiliary an…
▽ More
In this paper, we aim to study the optimal weak convergence order for the finite element approximation to a stochastic Allen-Cahn equation driven by multiplicative white noise. We first construct an auxiliary equation based on the splitting-up technique and derive prior estimates for the corresponding Kolmogorov equation and obtain the strong convergence order of 1 in time between the auxiliary and exact solutions. Then, we prove the optimal weak convergence order of the finite element approximation to the stochastic Allen-Cahn equation by deriving the weak convergence order between the finite element approximation and the auxiliary solution via the theory of Kolmogorov equation and Malliavin calculus. Finally, we present a numerical experiment to illustrate the theoretical analysis.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
$K$-theoretic pullbacks for Lagrangians on derived critical loci
Authors:
Yalong Cao,
Yukinobu Toda,
Gufang Zhao
Abstract:
Given a regular function $φ$ on a smooth stack, and a $(-1)$-shifted Lagrangian $M$ on the derived critical locus of $φ$, under fairly general hypotheses, we construct a pullback map from the Grothendieck group of coherent matrix factorizations of $φ$ to that of coherent sheaves on $M$. This map satisfies a functoriality property with respect to the composition of Lagrangian correspondences, as we…
▽ More
Given a regular function $φ$ on a smooth stack, and a $(-1)$-shifted Lagrangian $M$ on the derived critical locus of $φ$, under fairly general hypotheses, we construct a pullback map from the Grothendieck group of coherent matrix factorizations of $φ$ to that of coherent sheaves on $M$. This map satisfies a functoriality property with respect to the composition of Lagrangian correspondences, as well as the usual bivariance and base-change properties.
We provide three applications of the construction, one in the definition of quantum $K$-theory of critical loci (Landau-Ginzburg models), paving the way to generalize works of Okounkov school from Nakajima quiver varieties to quivers with potentials, one in establishing a degeneration formula for $K$-theoretic Donaldson-Thomas theory of local Calabi-Yau 4-folds, the other in confirming a $K$-theoretic version of Joyce-Safronov conjecture.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Authors:
Daniel Yiming Cao,
August Y. Chen,
Karthik Sridharan,
Benjamin Tang
Abstract:
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We deve…
▽ More
We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study the convergence of a large class of first-order optimization algorithms (which we call decrease procedures) under generalizations of smoothness. We instantiate our framework to analyze the convergence of first order optimization algorithms to first and \textit{second} order stationary points under generalizations of smoothness. As a consequence, we establish the first convergence guarantees for first order methods to second order stationary points under generalizations of smoothness. We demonstrate that several canonical examples fall under our framework, and highlight practical implications.
△ Less
Submitted 26 June, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
Learning Hamiltonian Systems with Pseudo-symplectic Neural Network
Authors:
Xupeng Cheng,
Lijin Wang,
Yanzhao Cao,
Chen Chen
Abstract:
In this paper, we introduces a Pseudo-Symplectic Neural Network (PSNN) for learning general Hamiltonian systems (both separable and non-separable) from data. To address the limitations of existing structure-preserving methods (e.g., implicit symplectic integrators restricted to separable systems or explicit approximations requiring high computational costs), PSNN integrates an explicit pseudo-symp…
▽ More
In this paper, we introduces a Pseudo-Symplectic Neural Network (PSNN) for learning general Hamiltonian systems (both separable and non-separable) from data. To address the limitations of existing structure-preserving methods (e.g., implicit symplectic integrators restricted to separable systems or explicit approximations requiring high computational costs), PSNN integrates an explicit pseudo-symplectic integrator as its dynamical core, achieving nearly exact symplecticity with minimal structural error. Additionally, the authors propose learnable Padé-type activation functions based on Padé approximation theory, which empirically outperform classical ReLU, Taylor-based activations, and PAU. By combining these innovations, PSNN demonstrates superior performance in learning and forecasting diverse Hamiltonian systems (e.g., modified pendulum, galactic dynamics), surpassing state-of-the-art models in accuracy, long-term stability, and energy preservation, while requiring shorter training time, fewer samples, and reduced parameters. This framework bridges the gap between computational efficiency and geometric structure preservation in Hamiltonian system modeling.
△ Less
Submitted 6 March, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Splitting finite element approximations for quasi-static electroporoelasticity equations
Authors:
Xuan Liu,
Yongkui Zou,
Ran Zhang,
Yanzhao Cao,
Amnon J. Meir
Abstract:
The electroporoelasticity model, which couples Maxwell's equations with Biot's equations, plays a critical role in applications such as water conservancy exploration, earthquake early warning, and various other fields. This work focuses on investigating its well-posedness and analyzing error estimates for a splitting backward Euler finite element method. We first define a weak solution consistent…
▽ More
The electroporoelasticity model, which couples Maxwell's equations with Biot's equations, plays a critical role in applications such as water conservancy exploration, earthquake early warning, and various other fields. This work focuses on investigating its well-posedness and analyzing error estimates for a splitting backward Euler finite element method. We first define a weak solution consistent with the finite element framework. Then, we prove the uniqueness and existence of such a solution using the Galerkin method and derive a priori estimates for high-order regularity. Using a splitting technique, we define an approximate splitting solution and analyze its convergence order. Next, we apply Nedelec's curl-conforming finite elements, Lagrange elements, and the backward Euler method to construct a fully discretized scheme. We demonstrate the stability of the splitting numerical solution and provide error estimates for its convergence order in both temporal and spatial variables. Finally, we present numerical experiments to validate the theoretical results, showing that our method significantly reduces computational complexity compared to the classical finite element method.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Learning Spectral Methods by Transformers
Authors:
Yihan He,
Yuan Cao,
Hong-Yu Chen,
Dennis Wu,
Jianqing Fan,
Han Liu
Abstract:
Transformers demonstrate significant advantages as the building block of modern LLMs. In this work, we study the capacities of Transformers in performing unsupervised learning. We show that multi-layered Transformers, given a sufficiently large set of pre-training instances, are able to learn the algorithms themselves and perform statistical estimation tasks given new instances. This learning para…
▽ More
Transformers demonstrate significant advantages as the building block of modern LLMs. In this work, we study the capacities of Transformers in performing unsupervised learning. We show that multi-layered Transformers, given a sufficiently large set of pre-training instances, are able to learn the algorithms themselves and perform statistical estimation tasks given new instances. This learning paradigm is distinct from the in-context learning setup and is similar to the learning procedure of human brains where skills are learned through past experience. Theoretically, we prove that pre-trained Transformers can learn the spectral methods and use the classification of bi-class Gaussian mixture model as an example. Our proof is constructive using algorithmic design techniques. Our results are built upon the similarities of multi-layered Transformer architecture with the iterative recovery algorithms used in practice. Empirically, we verify the strong capacity of the multi-layered (pre-trained) Transformer on unsupervised learning through the lens of both the PCA and the Clustering tasks performed on the synthetic and real-world datasets.
△ Less
Submitted 12 January, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
Towards Simple and Provable Parameter-Free Adaptive Gradient Methods
Authors:
Yuanzhe Tao,
Huizhuo Yuan,
Xun Zhou,
Yuan Cao,
Quanquan Gu
Abstract:
Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, adhoc tuning of learning rates poses a challenge, leading to inefficiencies in practice. To address this issue, recent research has focused on developing "learning-rate-free" or "parameter-free" algorithms that…
▽ More
Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, adhoc tuning of learning rates poses a challenge, leading to inefficiencies in practice. To address this issue, recent research has focused on developing "learning-rate-free" or "parameter-free" algorithms that operate effectively without the need for learning rate tuning. Despite these efforts, existing parameter-free variants of AdaGrad and Adam tend to be overly complex and/or lack formal convergence guarantees. In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates. Experimental results across various deep learning tasks validate the competitive performance of AdaGrad++ and Adam++.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
Grams: Gradient Descent with Adaptive Momentum Scaling
Authors:
Yang Cao,
Xiaoyu Li,
Zhao Song
Abstract:
We introduce $\mathbf{G}$radient Descent with $\mathbf{A}$daptive $\mathbf{M}$omentum $\mathbf{S}$caling ($\mathbf{Grams}$), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum,…
▽ More
We introduce $\mathbf{G}$radient Descent with $\mathbf{A}$daptive $\mathbf{M}$omentum $\mathbf{S}$caling ($\mathbf{Grams}$), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers. We theoretically demonstrate that Grams descents faster than other state-of-the-art optimizers and establish a global convergence guarantee for Grams. We also validate its effectiveness through extensive empirical evaluations. The results demonstrate Grams' superior performance, including faster convergence and better generalization, compared to widely-used optimizers such as Adam, Lion, and their cautious variants. Our results highlight Grams' potential as a transformative approach for efficiently training and fine-tuning large language models. Code is available at https://github.com/Gunale0926/Grams.
△ Less
Submitted 5 March, 2025; v1 submitted 22 December, 2024;
originally announced December 2024.
-
VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction
Authors:
Yadi Cao,
Yuxuan Liu,
Liu Yang,
Rose Yu,
Hayden Schaeffer,
Stanley Osher
Abstract:
In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), wh…
▽ More
In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multiphysics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines: DPOT and MPP, reducing the averaged last-step rollout error by 37.9% compared to DPOT and 44.7% compared to MPP, while requiring only 72.5% and 34.8% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in imperfect measurement systems where sampling frequencies may differ or frames might be dropped - common challenges in real-world settings - without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41% relative performance degradation compared to 71.37%-74.49% degradation in baseline methods, demonstrating its versatility for deploying in realistic applications. Our scripts for processing datasets and code are publicly available at https://github.com/Eydcao/VICON.
△ Less
Submitted 19 May, 2025; v1 submitted 24 November, 2024;
originally announced November 2024.
-
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Authors:
Zihao Li,
Yuan Cao,
Cheng Gao,
Yihan He,
Han Liu,
Jason M. Klusowski,
Jianqing Fan,
Mengdi Wang
Abstract:
Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning one of the most classical nonparametric estimators, the one-…
▽ More
Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning one of the most classical nonparametric estimators, the one-nearest neighbor prediction rule. Under a theoretical framework where the prompt contains a sequence of labeled training data and unlabeled test data, we show that, although the loss function is nonconvex when trained with gradient descent, a single softmax attention layer can successfully learn to behave like a one-nearest neighbor classifier. Our result gives a concrete example of how transformers can be trained to implement nonparametric machine learning algorithms, and sheds light on the role of softmax attention in transformer models.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Stable Similarity Comparison of Persistent Homology Groups
Authors:
Jiaxing He,
Bingzhe Hou,
Tieru Wu,
Yang Cao
Abstract:
Classification in the sense of similarity is an important issue. In this paper, we study similarity classification in Topological Data Analysis. We define a pseudometric $d_{S}^{(p)}$ to measure the distance between barcodes generated by persistent homology groups of topological spaces, and we provide that our pseudometric $d_{S}^{(2)}$ is a similarity invariant. Thereby, we establish a connection…
▽ More
Classification in the sense of similarity is an important issue. In this paper, we study similarity classification in Topological Data Analysis. We define a pseudometric $d_{S}^{(p)}$ to measure the distance between barcodes generated by persistent homology groups of topological spaces, and we provide that our pseudometric $d_{S}^{(2)}$ is a similarity invariant. Thereby, we establish a connection between Operator Theory and Topological Data Analysis. We give the calculation formula of the pseudometric $d_{S}^{(2)}$ $(d_{S}^{(1)})$ by arranging all eigenvalues of matrices determined by barcodes in descending order to get the infimum over all matchings. Since conformal linear transformation is one representative type of similarity transformations, we construct comparative experiments on both synthetic datasets and waves from an online platform to demonstrate that our pseudometric $d_{S}^{(2)}$ $(d_{S}^{(1)})$ is stable under conformal linear transformations, whereas the bottleneck and Wasserstein distances are not. In particular, our pseudometric on waves is only related to the waveform but is independent on the frequency and amplitude. Furthermore, the computation time for $d_{S}^{(2)}$ $(d_{S}^{(1)})$ is significantly less than the computation time for bottleneck distance and is comparable to the computation time for accelerated Wasserstein distance between barcodes.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Global Convergence in Training Large-Scale Transformers
Authors:
Cheng Gao,
Yuan Cao,
Zihao Li,
Yihan He,
Mengdi Wang,
Han Liu,
Jason Matthew Klusowski,
Jianqing Fan
Abstract:
Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood. This paper rigorously analyzes the convergence properties of gradient flow in training Transformers with weight decay regularization. First, we construct the mean-field limit of large-scale Transformers, showing that as the model width and dept…
▽ More
Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood. This paper rigorously analyzes the convergence properties of gradient flow in training Transformers with weight decay regularization. First, we construct the mean-field limit of large-scale Transformers, showing that as the model width and depth go to infinity, gradient flow converges to the Wasserstein gradient flow, which is represented by a partial differential equation. Then, we demonstrate that the gradient flow reaches a global minimum consistent with the PDE solution when the weight decay regularization parameter is sufficiently small. Our analysis is based on a series of novel mean-field techniques that adapt to Transformers. Compared with existing tools for deep networks (Lu et al., 2020) that demand homogeneity and global Lipschitz smoothness, we utilize a refined analysis assuming only $\textit{partial homogeneity}$ and $\textit{local Lipschitz smoothness}$. These new techniques may be of independent interest.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
A Differentially Private Energy Trading Mechanism Approaching Social Optimum
Authors:
Yuji Cao,
Yue Chen
Abstract:
This paper proposes a differentially private energy trading mechanism for prosumers in peer-to-peer (P2P) markets, offering provable privacy guarantees while approaching the Nash equilibrium with nearly socially optimal efficiency. We first model the P2P energy trading as a (generalized) Nash game and prove the vulnerability of traditional distributed algorithms to privacy attacks through an adver…
▽ More
This paper proposes a differentially private energy trading mechanism for prosumers in peer-to-peer (P2P) markets, offering provable privacy guarantees while approaching the Nash equilibrium with nearly socially optimal efficiency. We first model the P2P energy trading as a (generalized) Nash game and prove the vulnerability of traditional distributed algorithms to privacy attacks through an adversarial inference model. To address this challenge, we develop a privacy-preserving Nash equilibrium seeking algorithm incorporating carefully calibrated Laplacian noise. We prove that the proposed algorithm achieves $ε$-differential privacy while converging in expectation to the Nash equilibrium with a suitable stepsize. Numerical experiments are conducted to evaluate the algorithm's robustness against privacy attacks, convergence behavior, and optimality compared to the non-private solution. Results demonstrate that our mechanism effectively protects prosumers' sensitive information while maintaining near-optimal market outcomes, offering a practical approach for privacy-preserving coordination in P2P markets.
△ Less
Submitted 21 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
Characterization of Circular-arc Graphs: III. Chordal Graphs
Authors:
Yixin Cao,
Tomasz Krawczyk
Abstract:
We identify all minimal chordal graphs that are not circular-arc graphs, thereby resolving one of ``the main open problems'' concerning the structures of circular-arc graphs as posed by Dur{á}n, Grippo, and Safe in 2011. The problem had been attempted even earlier, and previous efforts have yielded partial results, particularly for claw-free graphs and graphs with an independence number of at most…
▽ More
We identify all minimal chordal graphs that are not circular-arc graphs, thereby resolving one of ``the main open problems'' concerning the structures of circular-arc graphs as posed by Dur{á}n, Grippo, and Safe in 2011. The problem had been attempted even earlier, and previous efforts have yielded partial results, particularly for claw-free graphs and graphs with an independence number of at most four. The answers turn out to have very simple structures: all the nontrivial ones belong to a single family. Our findings are based on a structural study of McConnell's flipping, which transforms circular-arc graphs into interval graphs with certain representation patterns.
△ Less
Submitted 22 February, 2025; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Characterization of Circular-arc Graphs: II. McConnell Flipping
Authors:
Yixin Cao,
Tomasz Krawczyk
Abstract:
McConnell [FOCS 2001] presented a flipping transformation from circular-arc graphs to interval graphs with certain patterns of representations. Beyond its algorithmic implications, this transformation is instrumental in identifying all minimal graphs that are not circular-arc graphs. We conduct a structural study of this transformation, and for $C_{4}$-free graphs, we achieve a complete characteri…
▽ More
McConnell [FOCS 2001] presented a flipping transformation from circular-arc graphs to interval graphs with certain patterns of representations. Beyond its algorithmic implications, this transformation is instrumental in identifying all minimal graphs that are not circular-arc graphs. We conduct a structural study of this transformation, and for $C_{4}$-free graphs, we achieve a complete characterization of these patterns. This characterization allows us, among other things, to identify all minimal chordal graphs that are not circular-arc graphs in a companion paper.
△ Less
Submitted 29 April, 2025; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Strong convergence of an explicit full-discrete scheme for stochastic Burgers-Huxley equation
Authors:
Yibo Wang,
Wanrong Cao,
Yanzhao Cao
Abstract:
The strong convergence of an explicit full-discrete scheme is investigated for the stochastic Burgers-Huxley equation driven by additive space-time white noise, which possesses both Burgers-type and cubic nonlinearities. To discretize the continuous problem in space, we utilize a spectral Galerkin method. Subsequently, we introduce a nonlinear-tamed exponential integrator scheme, resulting in a fu…
▽ More
The strong convergence of an explicit full-discrete scheme is investigated for the stochastic Burgers-Huxley equation driven by additive space-time white noise, which possesses both Burgers-type and cubic nonlinearities. To discretize the continuous problem in space, we utilize a spectral Galerkin method. Subsequently, we introduce a nonlinear-tamed exponential integrator scheme, resulting in a fully discrete scheme. Within the framework of semigroup theory, this study provides precise estimations of the Sobolev regularity, $L^\infty$ regularity in space, and Hölder continuity in time for the mild solution, as well as for its semi-discrete and full-discrete approximations. Building upon these results, we establish moment boundedness for the numerical solution and obtain strong convergence rates in both spatial and temporal dimensions. A numerical example is presented to validate the theoretical findings.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Homomorphic data compression for real time photon correlation analysis
Authors:
Sebastian Strempfer,
Zichao Wendy Di,
Kazutomo Yoshii,
Yue Cao,
Qingteng Zhang,
Eric M. Dufresne,
Mathew Cherukara,
Suresh Narayanan,
Martin V. Holt,
Antonino Miceli,
Tao Zhou
Abstract:
The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing and offline storage of the big data. Such challenge is particularly prominent for x-ray photon correlation spectroscopy (XPCS), where real time analyses require…
▽ More
The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing and offline storage of the big data. Such challenge is particularly prominent for x-ray photon correlation spectroscopy (XPCS), where real time analyses require simultaneous calculation on all the previously acquired data in the time series. We present a homomorphic compression scheme to effectively reduce the computational time and memory space required for XPCS analysis. Leveraging similarities in the mathematical expression between a matrix-based compression algorithm and the correlation calculation, our approach allows direct operation on the compressed data without their decompression. The lossy compression reduces the computational time by a factor of 10,000, enabling real time calculation of the correlation functions at kHz framerate. Our demonstration of a homomorphic compression of scientific data provides an effective solution to the big data challenge at coherent light sources. Beyond the example shown in this work, the framework can be extended to facilitate real-time operations directly on a compressed data stream for other techniques.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
On well (edge) dominated and equimatchable strong product graphs
Authors:
Yixin Cao,
Guiqiang Mou,
Jianxin Wang
Abstract:
A graph is well-(edge-)dominated if every minimal (edge) dominating set is minimum. A graph is equimatchable if every maximal matching is maximum. We study these concepts on strong product graphs. We fully characterize well-edge-dominated and equimatchable strong product graphs of nontrivial graphs, and identify a large family of graphs whose strong products with any well-dominated graph are well-…
▽ More
A graph is well-(edge-)dominated if every minimal (edge) dominating set is minimum. A graph is equimatchable if every maximal matching is maximum. We study these concepts on strong product graphs. We fully characterize well-edge-dominated and equimatchable strong product graphs of nontrivial graphs, and identify a large family of graphs whose strong products with any well-dominated graph are well-dominated.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Authors:
Si Yi Meng,
Antonio Orvieto,
Daniel Yiming Cao,
Christopher De Sa
Abstract:
We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical…
▽ More
We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/λ$ suffices for global convergence. However, for all step sizes between $1/λ$ and the critical step size $2/λ$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/λ$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization.
△ Less
Submitted 4 November, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Estimation of Out-of-Sample Sharpe Ratio for High Dimensional Portfolio Optimization
Authors:
Xuran Meng,
Yuan Cao,
Weichen Wang
Abstract:
Portfolio optimization aims at constructing a realistic portfolio with significant out-of-sample performance, which is typically measured by the out-of-sample Sharpe ratio. However, due to in-sample optimism, it is inappropriate to use the in-sample estimated covariance to evaluate the out-of-sample Sharpe, especially in the high dimensional settings. In this paper, we propose a novel method to es…
▽ More
Portfolio optimization aims at constructing a realistic portfolio with significant out-of-sample performance, which is typically measured by the out-of-sample Sharpe ratio. However, due to in-sample optimism, it is inappropriate to use the in-sample estimated covariance to evaluate the out-of-sample Sharpe, especially in the high dimensional settings. In this paper, we propose a novel method to estimate the out-of-sample Sharpe ratio using only in-sample data, based on random matrix theory. Furthermore, portfolio managers can use the estimated out-of-sample Sharpe as a criterion to decide the best tuning for constructing their portfolios. Specifically, we consider the classical framework of Markowits mean-variance portfolio optimization {under} high dimensional regime of $p/n \to c \in (0,\infty)$, where $p$ is the portfolio dimension and $n$ is the number of samples or time points. We propose to correct the sample covariance by a regularization matrix and provide a consistent estimator of its Sharpe ratio. The new estimator works well under either of the following conditions: (1) bounded covariance spectrum, (2) arbitrary number of diverging spikes when $c < 1$, and (3) fixed number of diverging spikes with weak requirement on their diverging speed when $c \ge 1$. We can also extend the results to construct global minimum variance portfolio and correct out-of-sample efficient frontier. We demonstrate the effectiveness of our approach through comprehensive simulations and real data experiments. Our results highlight the potential of this methodology as a useful tool for portfolio optimization in high dimensional settings.
△ Less
Submitted 9 July, 2025; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Tropical Expressivity of Neural Networks
Authors:
Paul Lezeau,
Thomas Walker,
Yueqi Cao,
Shiv Bhatia,
Anthea Monod
Abstract:
We propose an algebraic geometric framework to study the expressivity of linear activation neural networks. A particular quantity of neural networks that has been actively studied is the number of linear regions, which gives a quantification of the information capacity of the architecture. To study and evaluate information capacity and expressivity, we work in the setting of tropical geometry - a…
▽ More
We propose an algebraic geometric framework to study the expressivity of linear activation neural networks. A particular quantity of neural networks that has been actively studied is the number of linear regions, which gives a quantification of the information capacity of the architecture. To study and evaluate information capacity and expressivity, we work in the setting of tropical geometry - a combinatorial and polyhedral variant of algebraic geometry - where there are known connections between tropical rational maps and feedforward neural networks. Our work builds on and expands this connection to capitalize on the rich theory of tropical geometry to characterize and study various architectural aspects of neural networks. Our contributions are threefold: we provide a novel tropical geometric approach to selecting sampling domains among linear regions; an algebraic result allowing for a guided restriction of the sampling domain for network architectures with symmetries; and a new open source OSCAR library to analyze neural networks symbolically using their tropical representations, where we present a new algorithm that computes the exact number of their linear regions. We provide a comprehensive set of proof-of-concept numerical experiments demonstrating the breadth of neural network architectures to which tropical geometric theory can be applied to reveal insights on expressivity characteristics of a network. Our work provides the foundations for the adaptation of both theory and existing software from computational tropical geometry and symbolic computation to neural networks and deep learning
△ Less
Submitted 8 October, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation
Authors:
Zhiqiang Cai,
Yu Cao,
Yuanfei Huang,
Xiang Zhou
Abstract:
Sampling invariant distributions from an Itô diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in biased and correlated samples. The current deep learning-based method solves the stationary Fokker--Planck equation to determine the i…
▽ More
Sampling invariant distributions from an Itô diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in biased and correlated samples. The current deep learning-based method solves the stationary Fokker--Planck equation to determine the invariant probability density function in the form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker--Planck equation. Our proposed loss function is based on the weak form of the Fokker--Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from a base distribution. Our randomized test function circumvents the need for min-max optimization in the traditional weak formulation. Our method necessitates neither the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centers related to the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness and scalability of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.
△ Less
Submitted 5 June, 2025; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Self-organizing Multiagent Target Enclosing under Limited Information and Safety Guarantees
Authors:
Praveen Kumar Ranjan,
Abhinav Sinha,
Yongcan Cao
Abstract:
This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents self-organize on the enclosing shape around a fixed target. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. Our…
▽ More
This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents self-organize on the enclosing shape around a fixed target. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. Our approach combines global enclosing behavior and local collision avoidance mechanisms by devising a special potential function and sliding manifold. We rigorously show that an agent does not need to ensure safety with every other agent and put forth a concept of the nearest colliding agent (for any arbitrary agent) with whom ensuring safety is sufficient to avoid collisions in the entire swarm. The proposed control eliminates the need for a fixed or pre-established agent arrangement around the target and requires only relative information between an agent and the target. This makes our design particularly appealing for scenarios with limited global information, hence significantly reducing communication requirements. We finally present simulation results to vindicate the efficacy of the proposed method.
△ Less
Submitted 15 August, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty Propagation
Authors:
Minglei Yang,
Pengjun Wang,
Ming Fan,
Dan Lu,
Yanzhao Cao,
Guannan Zhang
Abstract:
We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to a…
▽ More
We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to auxiliary sampling methods for quantifying inverse uncertainty propagation. In this work, we develop the conditional pseudo-reversible normalizing flow model to directly learn and efficiently generate samples from the conditional probability density functions. The training process utilizes dataset consisting of input-output pairs without requiring prior knowledge about the noise and the function. Our model, once trained, can generate samples from any conditional probability density functions whose high probability regions are covered by the training set. Moreover, the pseudo-reversibility feature allows for the use of fully-connected neural network architectures, which simplifies the implementation and enables theoretical analysis. We provide a rigorous convergence analysis of the conditional pseudo-reversible normalizing flow model, showing its ability to converge to the target conditional probability density function using the Kullback-Leibler divergence. To demonstrate the effectiveness of our method, we apply it to several benchmark tests and a real-world geologic carbon storage problem.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Switching Classes: Characterization and Computation
Authors:
Dhanyamol Antony,
Yixin Cao,
Sagartanu Pal,
R. B. Sandeep
Abstract:
In a graph, the switching operation reverses adjacencies between a subset of vertices and the others. For a hereditary graph class $\mathcal{G}$, we are concerned with the maximum subclass and the minimum superclass of $\mathcal{G}$ that are closed under switching. We characterize the maximum subclass for many important classes $\mathcal{G}$, and prove that it is finite when $\mathcal{G}$ is minor…
▽ More
In a graph, the switching operation reverses adjacencies between a subset of vertices and the others. For a hereditary graph class $\mathcal{G}$, we are concerned with the maximum subclass and the minimum superclass of $\mathcal{G}$ that are closed under switching. We characterize the maximum subclass for many important classes $\mathcal{G}$, and prove that it is finite when $\mathcal{G}$ is minor-closed and omits at least one graph. For several graph classes, we develop polynomial-time algorithms to recognize the minimum superclass. We also show that the recognition of the superclass is NP-complete for $H$-free graphs when $H$ is a sufficiently long path or cycle, and it cannot be solved in subexponential time assuming the Exponential Time Hypothesis.
△ Less
Submitted 14 August, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Characterization of Chordal Circular-arc Graphs: I. Split Graphs
Authors:
Yixin Cao,
Jan Derbisz,
Tomasz Krawczyk
Abstract:
The most elusive problem around the class of circular-arc graphs is identifying all minimal graphs that are not in this class. The main obstacle is the lack of a systematic way of enumerating these minimal graphs. McConnell [FOCS 2001] presented a transformation from circular-arc graphs to interval graphs with certain patterns of representations. We fully characterize these interval patterns for c…
▽ More
The most elusive problem around the class of circular-arc graphs is identifying all minimal graphs that are not in this class. The main obstacle is the lack of a systematic way of enumerating these minimal graphs. McConnell [FOCS 2001] presented a transformation from circular-arc graphs to interval graphs with certain patterns of representations. We fully characterize these interval patterns for circular-arc graphs that are split graphs, thereby building a connection between minimal split graphs that are not circular-arc graphs and minimal non-interval graphs. This connection enables us to identify all minimal split graphs that are not circular-arc graphs. As a byproduct, we develop a linear-time certifying recognition algorithm for circular-arc graphs when the input is a split graph.
△ Less
Submitted 15 March, 2025; v1 submitted 4 March, 2024;
originally announced March 2024.
-
A degeneration formula of Donaldson-Thomas theory on Calabi-Yau 4-folds
Authors:
Yalong Cao,
Gufang Zhao,
Zijun Zhou
Abstract:
We prove a degeneration formula for Donaldson-Thomas theory on Calabi-Yau 4-folds, and apply it to compute zero dimensional invariants on $\mathbb{C}^4$ and on any local curve.
We prove a degeneration formula for Donaldson-Thomas theory on Calabi-Yau 4-folds, and apply it to compute zero dimensional invariants on $\mathbb{C}^4$ and on any local curve.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Edge states in super honeycomb structures with PT-symmetric deformations
Authors:
Ying Cao,
Yi Zhu
Abstract:
The existence of edge states is one of the most vital properties of topological insulators. Although tremendous success has been accomplished in describing and explaining edge states associated with PT symmetry breaking, little work has been done on PT symmetry preserving cases. Two-dimensional Schroedinger operators with super honeycomb lattice potentials always have double Dirac cones at the Gam…
▽ More
The existence of edge states is one of the most vital properties of topological insulators. Although tremendous success has been accomplished in describing and explaining edge states associated with PT symmetry breaking, little work has been done on PT symmetry preserving cases. Two-dimensional Schroedinger operators with super honeycomb lattice potentials always have double Dirac cones at the Gamma point - the zero momentum point on their energy bands due to C6 symmetry, PT symmetry, and the "folding" symmetry - caused by an additional translation symmetry. There are two topologically different ways to deform such a system by PT symmetry preserving but folding symmetry breaking perturbations. Interestingly, there exist two gapped edge states on the interface between such two kinds of perturbed materials. In this paper, we illustrate the existence of such PT preserving edge states rigorously for the first time. We use a domain wall modulated Schroedinger operator to model the phenomenon under small perturbations and rigorously prove the existence of two gapped edge states. We also provide a brief interpretation from the point of view of "topology" by the parities of degenerate bulk modes. Our work thoroughly explains the existence of "helical" like edge states in super honeycomb configurations and lays a foundation for the descriptions of topologies of such systems.
△ Less
Submitted 8 March, 2025; v1 submitted 23 February, 2024;
originally announced February 2024.
-
$C^1$ Pesin (un)stable manifold without domination
Authors:
Yongluo Cao,
Zeya Mi,
Rui Zou
Abstract:
For $C^1$ diffeomorphisms with continuous invariant splitting without domination, we prove the existence of Pesin (un)stable manifold under the hyperbolicity of invariant measures.
For $C^1$ diffeomorphisms with continuous invariant splitting without domination, we prove the existence of Pesin (un)stable manifold under the hyperbolicity of invariant measures.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks
Authors:
Yongchang Hao,
Yanshuai Cao,
Lili Mou
Abstract:
Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the in…
▽ More
Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the inverse of the matrix. These requirements are infeasible even with state-of-the-art hardware. In this work, we propose Ginger, an eigendecomposition for the inverse of the generalized Gauss-Newton matrix. Our method enjoys efficient linear memory and time complexity for each iteration. Instead of approximating the conditioning matrix, we directly maintain its inverse to make the approximation more accurate. We provide the convergence result of Ginger for non-convex objectives. Our experiments on different tasks with different model architectures verify the effectiveness of our method. Our code is publicly available.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
On blow-up to the one-dimensional Navier-Stokes equations with degenerate viscosity and vacuum
Authors:
Yue Cao,
Yachun Li,
Shaojun Yu
Abstract:
In this paper, we consider the Cauchy problem of the isentropic compressible Navier-Stokes equations with degenerate viscosity and vacuum in $\mathbb{R}$, where the viscosity depends on the density in a super-linear power law(i.e., $μ(ρ)=ρ^δ, δ>1$). We first obtain the local existence of the regular solution, then show that the regular solution will blow-up in finite time if initial data has an is…
▽ More
In this paper, we consider the Cauchy problem of the isentropic compressible Navier-Stokes equations with degenerate viscosity and vacuum in $\mathbb{R}$, where the viscosity depends on the density in a super-linear power law(i.e., $μ(ρ)=ρ^δ, δ>1$). We first obtain the local existence of the regular solution, then show that the regular solution will blow-up in finite time if initial data has an isolated mass group, no matter how small and smooth the initial data are. It is worth mentioning that based on the transport structure of some intrinsic variables, we obtain the $L^\infty$ bound of the density, which helps to remove the restriction $δ\leq γ$ in Li-Pan-Zhu[21] and Huang-Wang-Zhu[13].
△ Less
Submitted 16 March, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Renyi Differential Privacy in the Shuffle Model: Enhanced Amplification Bounds
Authors:
E Chen,
Yang Cao,
Yifei Ge
Abstract:
The shuffle model of Differential Privacy (DP) has gained significant attention in privacy-preserving data analysis due to its remarkable tradeoff between privacy and utility. It is characterized by adding a shuffling procedure after each user's locally differentially private perturbation, which leads to a privacy amplification effect, meaning that the privacy guarantee of a small level of noise,…
▽ More
The shuffle model of Differential Privacy (DP) has gained significant attention in privacy-preserving data analysis due to its remarkable tradeoff between privacy and utility. It is characterized by adding a shuffling procedure after each user's locally differentially private perturbation, which leads to a privacy amplification effect, meaning that the privacy guarantee of a small level of noise, say $ε_0$, can be enhanced to $O(ε_0/\sqrt{n})$ (the smaller, the more private) after shuffling all $n$ users' perturbed data. Most studies in the shuffle DP focus on proving a tighter privacy guarantee of privacy amplification. However, the current results assume that the local privacy budget $ε_0$ is within a limited range. In addition, there remains a gap between the tightest lower bound and the known upper bound of the privacy amplification. In this work, we push forward the state-of-the-art by making the following contributions. Firstly, we present the first asymptotically optimal analysis of Renyi Differential Privacy (RDP) in the shuffle model without constraints on $ε_0$. Secondly, we introduce hypothesis testing for privacy amplification through shuffling, offering a distinct analysis technique and a tighter upper bound. Furthermore, we propose a DP-SGD algorithm based on RDP. Experiments demonstrate that our approach outperforms existing methods significantly at the same privacy level.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Topologically mildly mixing of higher orders along generalized polynomials
Authors:
Yang Cao,
Jianjie Zhao
Abstract:
This paper is devoted to studying the multiple recurrent property of topologically mildly mixing systems along generalized polynomials. We show that if a minimal system is topologically mildly mixing, then it is mild mixing of higher orders along generalized polynomials. Precisely, suppose that $(X, T)$ is a topologically mildly mixing minimal system, $d\in \mathbb{N}$, $p_1, \dots, p_d$ are integ…
▽ More
This paper is devoted to studying the multiple recurrent property of topologically mildly mixing systems along generalized polynomials. We show that if a minimal system is topologically mildly mixing, then it is mild mixing of higher orders along generalized polynomials. Precisely, suppose that $(X, T)$ is a topologically mildly mixing minimal system, $d\in \mathbb{N}$, $p_1, \dots, p_d$ are integer-valued generalized polynomials with $(p_1, \dots, p_d)$ non-degenerate. Then for all non-empty open subsets $U , V_1, \dots, V_d $ of $X$, $$\{n\in \Z: U\cap T^{-p_1(n) }V_1 \cap \dots \cap T^{-p_d(n) }V_d \neq \emptyset \}$$ is an IP$^*$-set.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Unconditionally positivity-preserving explicit Euler-type schemes for a generalized Ait-Sahalia model
Authors:
Ruishu Liu,
Yulin Cao,
Xiaojie Wang
Abstract:
The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which…
▽ More
The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size $h >0$. A mean-square convergence rate of order $0.5$ is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. Numerical experiments are finally provided to confirm the theoretical findings.
△ Less
Submitted 25 March, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility
Authors:
E Chen,
Yang Cao,
Yifei Ge
Abstract:
The shuffle model of local differential privacy is an advanced method of privacy amplification designed to enhance privacy protection with high utility. It achieves this by randomly shuffling sensitive data, making linking individual data points to specific individuals more challenging. However, most existing studies have focused on the shuffle model based on $(ε_0,0)$-Locally Differentially Priva…
▽ More
The shuffle model of local differential privacy is an advanced method of privacy amplification designed to enhance privacy protection with high utility. It achieves this by randomly shuffling sensitive data, making linking individual data points to specific individuals more challenging. However, most existing studies have focused on the shuffle model based on $(ε_0,0)$-Locally Differentially Private (LDP) randomizers, with limited consideration for complex scenarios such as $(ε_0,δ_0)$-LDP or personalized LDP (PLDP). This hinders a comprehensive understanding of the shuffle model's potential and limits its application in various settings. To bridge this research gap, we propose a generalized shuffle framework that can be applied to any $(ε_i,δ_i)$-PLDP setting with personalized privacy parameters. This generalization allows for a broader exploration of the privacy-utility trade-off and facilitates the design of privacy-preserving analyses in diverse contexts. We prove that shuffled $(ε_i,δ_i)$-PLDP process approximately preserves $μ$-Gaussian Differential Privacy with μ= \sqrt{\frac{2}{\sum_{i=1}^{n} \frac{1-δ_i}{1+e^{ε_i}}-\max_{i}{\frac{1-δ_{i}}{1+e^{ε_{i}}}}}}. $
This approach allows us to avoid the limitations and potential inaccuracies associated with inequality estimations. To strengthen the privacy guarantee, we improve the lower bound by utilizing hypothesis testing} instead of relying on rough estimations like the Chernoff bound or Hoeffding's inequality. Furthermore, extensive comparative evaluations clearly show that our approach outperforms existing methods in achieving strong central privacy guarantees while preserving the utility of the global model. We have also carefully designed corresponding algorithms for average function, frequency estimation, and stochastic gradient descent.
△ Less
Submitted 1 March, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
A spin analog of the plethystic Murnaghan-Nakayama rule
Authors:
Yue Cao,
Naihuan Jing,
Ning Liu
Abstract:
As a spin analog of the plethystic Murnaghan-Nakayama rule for Schur functions, the plethystic Murnaghan-Nakayama rule for Schur $Q$-functions is established with the help of the vertex operator realization. This generalizes both the Murnaghan-Nakayama rule and the Pieri rule for Schur $Q$-functions. A plethystic Murnaghan-Nakayama rule for Hall-Littlewood functions is also investigated.
As a spin analog of the plethystic Murnaghan-Nakayama rule for Schur functions, the plethystic Murnaghan-Nakayama rule for Schur $Q$-functions is established with the help of the vertex operator realization. This generalizes both the Murnaghan-Nakayama rule and the Pieri rule for Schur $Q$-functions. A plethystic Murnaghan-Nakayama rule for Hall-Littlewood functions is also investigated.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Self-complementary (Pseudo-)Split Graphs
Authors:
Yixin Cao,
Haowei Chen,
Shenghua Wang
Abstract:
We are concerned with split graphs and pseudo-split graphs whose complements are isomorphic to themselves. These special subclasses of self-complementary graphs are actually the core of self-complementary graphs. Indeed, we show that all self-complementary graphs with forcibly self-complementary degree sequences are pseudo-split graphs. We also give formulas to calculate the number of self-complem…
▽ More
We are concerned with split graphs and pseudo-split graphs whose complements are isomorphic to themselves. These special subclasses of self-complementary graphs are actually the core of self-complementary graphs. Indeed, we show that all self-complementary graphs with forcibly self-complementary degree sequences are pseudo-split graphs. We also give formulas to calculate the number of self-complementary (pseudo-)split graphs of a given order, and show that Trotignon's conjecture holds for all self-complementary split graphs.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Sharp Poincaré--Sobolev Inequalities of Choquet--Lorentz Integrals with Respect to Hausdorff Contents on Bounded John Domains
Authors:
Long Huang,
Yuanshou Cao,
Dachun Yang,
Ciqiang Zhuo
Abstract:
Let $Ω$ be a bounded John domain in $\mathbb R^n$ with $n\ge 2$, and let $\mathcal{H}_{\infty }^δ$ denote the Hausdorff content of dimension $δ\in (0,n]$. In this article, the authors prove the Poincaré and the Poincaré--Sobolev inequalities, with sharp ranges of indices, on Choquet--Lorentz integrals with respect to $\mathcal{H}_{\infty }^δ$ for all continuously differentiable functions on $Ω$. T…
▽ More
Let $Ω$ be a bounded John domain in $\mathbb R^n$ with $n\ge 2$, and let $\mathcal{H}_{\infty }^δ$ denote the Hausdorff content of dimension $δ\in (0,n]$. In this article, the authors prove the Poincaré and the Poincaré--Sobolev inequalities, with sharp ranges of indices, on Choquet--Lorentz integrals with respect to $\mathcal{H}_{\infty }^δ$ for all continuously differentiable functions on $Ω$. These results not only extend the recent Poincaré and Poincaré--Sobolev inequalities to the Choquet--Lorentz integrals, but also provide some endpoint estimates (weak type) in the critical case. One of the main novelties exists in that, to achieve the goals, the authors develop some new tools associated with Choquet--Lorentz integrals on $\mathcal{H}_{\infty }^δ$, such as the fractional Hardy--Littlewood maximal inequality and the Hedberg-type pointwise estimate on the Riesz potential. As an application, the authors obtain the sharp boundedness of the Riesz potential on Choquet--Lorentz integrals. Moreover, even for classical Lorentz integrals, these Poincaré and Poincaré--Sobolev inequalities are also new.
△ Less
Submitted 18 December, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
A GPU-Accelerated Moving-Horizon Algorithm for Training Deep Classification Trees on Large Datasets
Authors:
Jiayang Ren,
Valentín Osuna-Enciso,
Morimasa Okamoto,
Qiangqiang Mao,
Chaojie Ji,
Liang Cao,
Kaixun Hua,
Yankai Cao
Abstract:
Decision trees are essential yet NP-complete to train, prompting the widespread use of heuristic methods such as CART, which suffers from sub-optimal performance due to its greedy nature. Recently, breakthroughs in finding optimal decision trees have emerged; however, these methods still face significant computational costs and struggle with continuous features in large-scale datasets and deep tre…
▽ More
Decision trees are essential yet NP-complete to train, prompting the widespread use of heuristic methods such as CART, which suffers from sub-optimal performance due to its greedy nature. Recently, breakthroughs in finding optimal decision trees have emerged; however, these methods still face significant computational costs and struggle with continuous features in large-scale datasets and deep trees. To address these limitations, we introduce a moving-horizon differential evolution algorithm for classification trees with continuous features (MH-DEOCT). Our approach consists of a discrete tree decoding method that eliminates duplicated searches between adjacent samples, a GPU-accelerated implementation that significantly reduces running time, and a moving-horizon strategy that iteratively trains shallow subtrees at each node to balance the vision and optimizer capability. Comprehensive studies on 68 UCI datasets demonstrate that our approach outperforms the heuristic method CART on training and testing accuracy by an average of 3.44% and 1.71%, respectively. Moreover, these numerical studies empirically demonstrate that MH-DEOCT achieves near-optimal performance (only 0.38% and 0.06% worse than the global optimal method on training and testing, respectively), while it offers remarkable scalability for deep trees (e.g., depth=8) and large-scale datasets (e.g., ten million samples).
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Well-posedness and regularity of the Darcy-Boussinesq system in layered porous media
Authors:
Yining Cao,
Weisheng Niu,
Xiaoming Wang
Abstract:
We establish the existence of global weak solution in 2D and 3D, as well as the uniqueness of weak solution in 2D, for the Darcy-Boussinesq model for convection in layered porous media with square integrable initial data. We also derived tangential regularity in the 2D case. In addition, we obtain the existence and uniqueness of regular solution in a novel piecewise $H^2$ space in both 2D and 3D u…
▽ More
We establish the existence of global weak solution in 2D and 3D, as well as the uniqueness of weak solution in 2D, for the Darcy-Boussinesq model for convection in layered porous media with square integrable initial data. We also derived tangential regularity in the 2D case. In addition, we obtain the existence and uniqueness of regular solution in a novel piecewise $H^2$ space in both 2D and 3D under uniform porosity assumption and $H^1$ initial data. This is the first rigorous result for this model in the physically important layered setting.
△ Less
Submitted 18 November, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.