-
New Understandings and Computation on Augmented Lagrangian Methods for Low-Rank Semidefinite Programming
Authors:
Lijun Ding,
Haihao Lu,
Jinwen Yang
Abstract:
Augmented Lagrangian Method (ALM) combined with Burer-Monteiro (BM) factorization, dubbed ALM-BM, offers a powerful approach for solving large-scale low-rank semidefinite programs (SDPs). Despite its empirical success, the theoretical understandings of the resulting non-convex ALM-BM subproblems, particularly concerning their structural properties and efficient subproblem solvability by first-orde…
▽ More
Augmented Lagrangian Method (ALM) combined with Burer-Monteiro (BM) factorization, dubbed ALM-BM, offers a powerful approach for solving large-scale low-rank semidefinite programs (SDPs). Despite its empirical success, the theoretical understandings of the resulting non-convex ALM-BM subproblems, particularly concerning their structural properties and efficient subproblem solvability by first-order methods, still remain limited. This work addresses these notable gaps by providing a rigorous theoretical analysis. We demonstrate that, under appropriate regularity of the original SDP, termed as primal simplicity, ALM subproblems inherit crucial properties such as low-rankness and strict complementarity when the dual variable is localized. Furthermore, ALM subproblems are shown to enjoy a quadratic growth condition, building on which we prove that the non-convex ALM-BM subproblems can be solved to global optimality by gradient descent, achieving linear convergence under conditions of local initialization and dual variable proximity. Through illustrative examples, we further establish the necessity of these local assumptions, revealing them as inherent characteristics of the problem structure. Motivated by these theoretical insights, we propose ALORA, a rank-adaptive augmented Lagrangian method that builds upon the ALM-BM framework, which dynamically adjusts the rank using spectral information and explores negative curvature directions to navigate the nonconvex landscape. Exploiting modern GPU computing architectures, ALORA exhibits strong numerical performance, solving SDPs with tens of millions of dimensions in hundreds of seconds.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Partition-wise Graph Filtering: A Unified Perspective Through the Lens of Graph Coarsening
Authors:
Guoming Li,
Jian Yang,
Yifan Chen
Abstract:
Filtering-based graph neural networks (GNNs) constitute a distinct class of GNNs that employ graph filters to handle graph-structured data, achieving notable success in various graph-related tasks. Conventional methods adopt a graph-wise filtering paradigm, imposing a uniform filter across all nodes, yet recent findings suggest that this rigid paradigm struggles with heterophilic graphs. To overco…
▽ More
Filtering-based graph neural networks (GNNs) constitute a distinct class of GNNs that employ graph filters to handle graph-structured data, achieving notable success in various graph-related tasks. Conventional methods adopt a graph-wise filtering paradigm, imposing a uniform filter across all nodes, yet recent findings suggest that this rigid paradigm struggles with heterophilic graphs. To overcome this, recent works have introduced node-wise filtering, which assigns distinct filters to individual nodes, offering enhanced adaptability. However, a fundamental gap remains: a comprehensive framework unifying these two strategies is still absent, limiting theoretical insights into the filtering paradigms. Moreover, through the lens of Contextual Stochastic Block Model, we reveal that a synthesis of graph-wise and node-wise filtering provides a sufficient solution for classification on graphs exhibiting both homophily and heterophily, suggesting the risk of excessive parameterization and potential overfitting with node-wise filtering. To address the limitations, this paper introduces Coarsening-guided Partition-wise Filtering (CPF). CPF innovates by performing filtering on node partitions. The method begins with structure-aware partition-wise filtering, which filters node partitions obtained via graph coarsening algorithms, and then performs feature-aware partition-wise filtering, refining node embeddings via filtering on clusters produced by $k$-means clustering over features. In-depth analysis is conducted for each phase of CPF, showing its superiority over other paradigms. Finally, benchmark node classification experiments, along with a real-world graph anomaly detection application, validate CPF's efficacy and practical utility.
△ Less
Submitted 22 May, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
Stability and convergence of multi-product expansion splitting methods with negative weights for semilinear parabolic equations
Authors:
Xianglong Duan,
Chaoyu Quan,
Jiang Yang,
Zijing Zhu
Abstract:
The operator splitting method has been widely used to solve differential equations by splitting the equation into more manageable parts. In this work, we resolves a long-standing problem -- how to establish the stability of multi-product expansion (MPE) splitting methods with negative weights. The difficulty occurs because negative weights in high-order MPE method cause the sum of the absolute val…
▽ More
The operator splitting method has been widely used to solve differential equations by splitting the equation into more manageable parts. In this work, we resolves a long-standing problem -- how to establish the stability of multi-product expansion (MPE) splitting methods with negative weights. The difficulty occurs because negative weights in high-order MPE method cause the sum of the absolute values of weights larger than one, making standard stability proofs fail. In particular, we take the semilinear parabolic equation as a typical model and establish the stability of arbitrarily high-order MPE splitting methods with positive time steps but possibly negative weights. Rigorous convergence analysis is subsequently obtained from the stability result. Extensive numerical experiments validate the stability and accuracy of various high-order MPE splitting methods, highlighting their efficiency and robustness.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
A Convergent Inexact Abedin-Kitagawa Iteration Method for Monge-Ampère Eigenvalue Problems
Authors:
Liang Chen,
Youyicun Lin,
Junqi Yang,
Wenfan Yi
Abstract:
In this paper, we propose an inexact Aleksandrov solution based Abedin-Kitagawa iteration (AKI) method for solving (real) Monge-Amp{è}re eigenvalue problems. The proposed approach utilizes the convergent Rayleigh inverse iterative formulation introduced by Abedin and Kitagawa as the prototype. More importantly, it employs an error tolerance criterion of inexact Aleksandrov solutions to approximate…
▽ More
In this paper, we propose an inexact Aleksandrov solution based Abedin-Kitagawa iteration (AKI) method for solving (real) Monge-Amp{è}re eigenvalue problems. The proposed approach utilizes the convergent Rayleigh inverse iterative formulation introduced by Abedin and Kitagawa as the prototype. More importantly, it employs an error tolerance criterion of inexact Aleksandrov solutions to approximately solve the subproblems without spoiling the convergence, which becomes the most crucial issue for the efficient implementation of the iterative method. For the two-dimensional case, by properly taking advantage of the flexibility rendered by the proposed inexact approach and a convergent fixed-point-based approach to solve the subproblems, considerable advancements in computational efficiency can be achieved by the inexact AKI method with its convergence under the ${\cal C}^{2,α}$ boundary condition being rigorously established. Numerical experiments are conducted to demonstrate the efficiency of the proposed inexact AKI method. The numerical results suggest that the inexact AKI method can be more than eight times faster than the original AKI method, at least for all the tested problems.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Causal Discovery in Symmetric Dynamic Systems with Convergent Cross Mapping
Authors:
Yiting Duan,
Yi Guo,
Jack Yang,
Ming Yin
Abstract:
This paper systematically discusses how the inherent properties of chaotic attractors influence the results of discovering causality from time series using convergent cross mapping, particularly how convergent cross mapping misleads bidirectional causality as unidirectional when the chaotic attractor exhibits symmetry. We propose a novel method based on the k-means clustering method to address the…
▽ More
This paper systematically discusses how the inherent properties of chaotic attractors influence the results of discovering causality from time series using convergent cross mapping, particularly how convergent cross mapping misleads bidirectional causality as unidirectional when the chaotic attractor exhibits symmetry. We propose a novel method based on the k-means clustering method to address the challenges when the chaotic attractor exhibits two-fold rotation symmetry. This method is demonstrated to recover the symmetry of the latent chaotic attractor and discover the correct causality between time series without introducing information from other variables. We validate the accuracy of this method using time series derived from low-dimension and high-dimensional chaotic symmetric attractors for which convergent cross mapping may conclude erroneous results.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Numerical Reconstruction and Analysis of Backward Semilinear Subdiffusion Problems
Authors:
Xu Wu,
Jiang Yang,
Zhi Zhou
Abstract:
This paper aims to develop and analyze a numerical scheme for solving the backward problem of semilinear subdiffusion equations. We establish the existence, uniqueness, and conditional stability of the solution to the inverse problem by applying the smoothing and asymptotic properties of solution operators and constructing a fixed-point iteration. This derived conditional stability further inspire…
▽ More
This paper aims to develop and analyze a numerical scheme for solving the backward problem of semilinear subdiffusion equations. We establish the existence, uniqueness, and conditional stability of the solution to the inverse problem by applying the smoothing and asymptotic properties of solution operators and constructing a fixed-point iteration. This derived conditional stability further inspires a numerical reconstruction scheme. To address the mildly ill-posed nature of the problem, we employ the quasi-boundary value method for regularization. A fully discrete scheme is proposed, utilizing the finite element method for spatial discretization and convolution quadrature for temporal discretization. A thorough error analysis of the resulting discrete system is provided for both smooth and nonsmooth data. This analysis relies on the smoothing properties of discrete solution operators, some nonstandard error estimates optimal with respect to data regularity in the direct problem, and the arguments used in stability analysis. The derived a priori error estimate offers guidance for selecting the regularization parameter and discretization parameters based on the noise level. Moreover, we propose an easy-to-implement iterative algorithm for solving the fully discrete scheme and prove its linear convergence. Numerical examples are provided to illustrate the theoretical estimates and demonstrate the necessity of the assumption required in the analysis.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Maximum likelihood estimation for the $λ$-exponential family
Authors:
Xiwei Tian,
Ting-Kam Leonard Wong,
Jiaowen Yang,
Jun Zhang
Abstract:
The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood es…
▽ More
The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood estimation under i.i.d.~sampling, and prove using the duality that the likelihood is monotone along the iterations. We illustrate the algorithm with the $q$-Gaussian distribution and the Dirichlet perturbation.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization
Authors:
Youran Dong,
Junfeng Yang,
Wei Yao,
Jin Zhang
Abstract:
Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods for bilevel optimization. In this work, we propose a computationally efficient technique for incorporating curvature information into the approximation of hype…
▽ More
Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods for bilevel optimization. In this work, we propose a computationally efficient technique for incorporating curvature information into the approximation of hypergradients and present a novel algorithmic framework based on the resulting enhanced hypergradient computation. We provide convergence rate guarantees for the proposed framework in both deterministic and stochastic scenarios, particularly showing improved computational complexity over popular gradient-based methods in the deterministic setting. This improvement in complexity arises from a careful exploitation of the hypergradient structure and the inexact Newton method. In addition to the theoretical speedup, numerical experiments demonstrate the significant practical performance benefits of incorporating curvature information.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
EW D-optimal Designs for Experiments with Mixed Factors
Authors:
Siting Lin,
Yifei Huang,
Jie Yang
Abstract:
We characterize EW D-optimal designs as robust designs against unknown parameter values for experiments under a general parametric model with discrete and continuous factors. When a pilot study is available, we recommend sample-based EW D-optimal designs for subsequent experiments. Otherwise, we recommend EW D-optimal designs under a prior distribution for model parameters. We propose an EW ForLio…
▽ More
We characterize EW D-optimal designs as robust designs against unknown parameter values for experiments under a general parametric model with discrete and continuous factors. When a pilot study is available, we recommend sample-based EW D-optimal designs for subsequent experiments. Otherwise, we recommend EW D-optimal designs under a prior distribution for model parameters. We propose an EW ForLion algorithm for finding EW D-optimal designs with mixed factors, and justify that the designs found by our algorithm are EW D-optimal. To facilitate potential users in practice, we also develop a rounding algorithm that converts an approximate design with mixed factors to exact designs with prespecified grid points and the number of experimental units. By applying our algorithms for real experiments under multinomial logistic models or generalized linear models, we show that our designs are highly efficient with respect to locally D-optimal designs and more robust against parameter value misspecifications.
△ Less
Submitted 22 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Entrywise Approximate Matrix Inversion
Authors:
Mehrdad Ghadiri,
Junzhao Yang
Abstract:
We study the bit complexity of inverting diagonally dominant matrices, which are associated with random walk quantities such as hitting times and escape probabilities. Such quantities can be exponentially small, even on undirected unit-weighted graphs. However, their nonnegativity suggests that they can be approximated entrywise, leading to a stronger notion of approximation than vector norm-based…
▽ More
We study the bit complexity of inverting diagonally dominant matrices, which are associated with random walk quantities such as hitting times and escape probabilities. Such quantities can be exponentially small, even on undirected unit-weighted graphs. However, their nonnegativity suggests that they can be approximated entrywise, leading to a stronger notion of approximation than vector norm-based error.
Under this notion of error, existing Laplacian solvers and fast matrix multiplication approaches have bit complexities of $mn^2$ and $n^{ω+1}$, respectively, where $m$ is the number of nonzero entries in the matrix, $n$ is its size, and $ω$ is the matrix multiplication exponent.
We present algorithms that compute entrywise $\exp(ε)$-approximate inverses of row diagonally dominant $L$-matrices (RDDL) in two settings: (1) when the matrix entries are given in floating-point representation; (2) when they are given in fixed-point representation.
For floating-point inputs, we present a cubic-time algorithm and show that it has an optimal running time under the all-pairs shortest paths (APSP) conjecture.
For fixed-point inputs, we present several algorithms for solving linear systems and inverting RDDL and SDDM matrices, all with high probability.
Omitting logarithmic factors:
(1) For SDDM matrices, we provide an algorithm for solving a linear system with entrywise approximation guarantees using $\tilde{O}(m\sqrt{n})$ bit operations, and another for computing an entrywise approximate inverse using $\tilde{O}(mn)$ bit operations.
(2) For RDDL matrices, we present an algorithm for solving a linear system using $\tilde{O}(mn^{1+o(1)})$ bit operations, and two algorithms for computing an entrywise approximate inverse: one using $\tilde{O}(n^{ω+0.5})$ bit operations, and the other using $\tilde{O}(mn^{1.5+o(1)})$ bit operations.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels
Authors:
Jia-Qi Yang,
Lei Shi
Abstract:
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and…
▽ More
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and regularization parameters, and a finite-horizon setting with constant step sizes and regularization parameters. We introduce regularity conditions on the structure and smoothness of the target operator and the input random variables. Under these conditions, we provide a dimension-free convergence analysis for the prediction and estimation errors, deriving both expectation and high-probability error bounds. Our analysis demonstrates that these convergence rates are nearly optimal. Furthermore, we present a new technique for deriving bounds with high probability for general SGD schemes, which also ensures almost-sure convergence. Finally, we discuss potential extensions to more general operator-valued kernels and the encoder-decoder framework.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Adaptive sieving with semismooth Newton proximal augmented Lagrangian algorithm for multi-task Lasso problems
Authors:
Lanyu Lin,
Yong-Jin Liu,
Bo Wang,
Junfeng Yang
Abstract:
Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each…
▽ More
Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each subproblem along the path is solved via an inexact semismooth Newton proximal augmented Lagrangian ({\sc Ssnpal}) algorithm, achieving an asymptotically superlinear convergence rate. By exploiting the Karush-Kuhn-Tucker (KKT) conditions and the inherent sparsity of multi-task Lasso solutions, the {\sc Ssnpal} algorithm solves a sequence of reduced subproblems with small dimensions. This approach enables our method to scale effectively to large problems. Numerical experiments on synthetic and real-world datasets demonstrate the superior efficiency and robustness of our algorithm compared to state-of-the-art solvers.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Enhanced Data-driven Topology Design Methodology with Multi-level Mesh and Correlation-based Mutation for Stress-related Multi-objective Optimization
Authors:
Jun Yang,
Shintaro Yamasaki
Abstract:
Topology optimization (TO) serves as a widely applied structural design approach to tackle various engineering problems. Nevertheless, sensitivity-based TO methods usually struggle with solving strongly nonlinear optimization problems. By leveraging high capacity of deep generative model, which is an influential machine learning technique, the sensitivity-free data-driven topology design (DDTD) me…
▽ More
Topology optimization (TO) serves as a widely applied structural design approach to tackle various engineering problems. Nevertheless, sensitivity-based TO methods usually struggle with solving strongly nonlinear optimization problems. By leveraging high capacity of deep generative model, which is an influential machine learning technique, the sensitivity-free data-driven topology design (DDTD) methodology is regarded as an effective means of overcoming these issues. The DDTD methodology depends on initial dataset with a certain regularity, making its results highly sensitive to initial dataset quality. This limits its effectiveness and generalizability, especially for optimization problems without priori information. In this research, we proposed a multi-level mesh DDTD-based method with correlation-based mutation module to escape from the limitation of the quality of the initial dataset on the results and enhance computational efficiency. The core is to employ a correlation-based mutation module to assign new geometric features with physical meaning to the generated data, while utilizing a multi-level mesh strategy to progressively enhance the refinement of the structural representation, thus avoiding the maintenance of a high degree-of-freedom (DOF) representation throughout the iterative process. The proposed multi-level mesh DDTD-based method can be driven by a low quality initial dataset without the need for time-consuming construction of a specific dataset, thus significantly increasing generality and reducing application difficulty, while further lowering computational cost of DDTD methodology. Various comparison experiments with the traditional sensitivity-based TO methods on stress-related strongly nonlinear problems demonstrate the generality and effectiveness of the proposed method.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Logarithmic Crystalline Representations
Authors:
Zhenmou Liu,
Jinbang Yang,
Kang Zuo
Abstract:
In 1989, Faltings proved the comparison theorem between étale cohomology and crystalline cohomology by studying Fontaine-Faltings modules and crystalline representations. In his paper, he mentioned these modules and representations can be extended to the logarithmic context, but without detail. This note aims to explicitly present the construction of logarithmic Fontaine-Faltings modules and logar…
▽ More
In 1989, Faltings proved the comparison theorem between étale cohomology and crystalline cohomology by studying Fontaine-Faltings modules and crystalline representations. In his paper, he mentioned these modules and representations can be extended to the logarithmic context, but without detail. This note aims to explicitly present the construction of logarithmic Fontaine-Faltings modules and logarithmic crystalline representations.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Gaussian Mean Testing under Truncation
Authors:
Clément L. Canonne,
Themis Gouleakis,
Yuhao Wang,
Joy Qiping Yang
Abstract:
We consider the task of Gaussian mean testing, that is, of testing whether a high-dimensional vector perturbed by white noise has large magnitude, or is the zero vector. This question, originating from the signal processing community, has recently seen a surge of interest from the machine learning and theoretical computer science community, and is by now fairly well understood. What is much less u…
▽ More
We consider the task of Gaussian mean testing, that is, of testing whether a high-dimensional vector perturbed by white noise has large magnitude, or is the zero vector. This question, originating from the signal processing community, has recently seen a surge of interest from the machine learning and theoretical computer science community, and is by now fairly well understood. What is much less understood, and the focus of our work, is how to perform this task under truncation: that is, when the observations (i.i.d.\ samples from the underlying high-dimensional Gaussian) are only observed when they fall in an given subset of the domain $\R^d$. This truncation model, previously studied in the context of learning (instead of testing) the mean vector, has a range of applications, in particular in Economics and Social Sciences. As our work shows, sample truncations affect the complexity of the testing task in a rather subtle and surprising way.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
Surgeries between lens spaces of type $L(n,1)$ and the Heegaard Floer $d$-invariant
Authors:
Zhongtao Wu,
Jingling Yang
Abstract:
We establish a $d$-invariant surgery formula for $L$-space knots that provides an effective tool for studying surgeries between lens spaces. Using this formula, we classify distance one surgeries between lens spaces of the form $L(n,1)$. This classification has direct applications to band surgeries between torus links $T(2,n)$, with connections to DNA topology. In particular, we show that chirally…
▽ More
We establish a $d$-invariant surgery formula for $L$-space knots that provides an effective tool for studying surgeries between lens spaces. Using this formula, we classify distance one surgeries between lens spaces of the form $L(n,1)$. This classification has direct applications to band surgeries between torus links $T(2,n)$, with connections to DNA topology. In particular, we show that chirally cosmetic banding of torus links can possibly occur only when $n=1,5,9$ or $10$.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
An Improved Climenhaga-Thompson Criterion for Locally Maximal Sets
Authors:
Maria Jose Pacifico,
Fan Yang,
Jiagang Yang,
Gongran Yao
Abstract:
We study the existence and uniqueness of equilibrium states for continuous flows on a compact, locally maximal invariant set under weak, non-uniform versions of specification, expansivity, and the Bowen property, further improving the Climenhaga-Thompson Criterion.
We study the existence and uniqueness of equilibrium states for continuous flows on a compact, locally maximal invariant set under weak, non-uniform versions of specification, expansivity, and the Bowen property, further improving the Climenhaga-Thompson Criterion.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
An Improved Upper Bound on the Threshold Bias of the Oriented-cycle game
Authors:
Anita Liebenau,
Abdallah Saffidine,
Jeffrey Yang
Abstract:
We study the $b$-biased Oriented-cycle game where two players, OMaker and OBreaker, take turns directing the edges of $K_n$ (the complete graph on $n$ vertices). In each round, OMaker directs one previously undirected edge followed by OBreaker directing between one and $b$ previously undirected edges. The game ends once all edges have been directed, and OMaker wins if and only if the resulting tou…
▽ More
We study the $b$-biased Oriented-cycle game where two players, OMaker and OBreaker, take turns directing the edges of $K_n$ (the complete graph on $n$ vertices). In each round, OMaker directs one previously undirected edge followed by OBreaker directing between one and $b$ previously undirected edges. The game ends once all edges have been directed, and OMaker wins if and only if the resulting tournament contains a directed cycle. Bollobás and Szabó asked the following question: what is the largest value of the bias $b$ for which OMaker has a winning strategy? Ben-Eliezer, Krivelevich and Sudakov proved that OMaker has a winning strategy for $b \leq n/2 - 2$. In the other direction, Clemens and Liebenau proved that OBreaker has a winning strategy for $b \geq 5n/6+2$. Inspired by their approach, we propose a significantly stronger strategy for OBreaker which we prove to be winning for $b \geq 0.7845n + O(1)$.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
The Atiyah-Schmid formula for reductive groups
Authors:
Jun Yang
Abstract:
We give the generalized Atiyah-Schmid formula for projective tempered representations. Then we prove the Atiyah-Schmid formula for arithmetic subgroups of real reductive groups.
We give the generalized Atiyah-Schmid formula for projective tempered representations. Then we prove the Atiyah-Schmid formula for arithmetic subgroups of real reductive groups.
△ Less
Submitted 10 April, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Normal and non-normal Cayley digraphs on cyclic and dihedral groups
Authors:
Jun-Feng Yang,
Yan-Quan Feng,
Fu-Gang Yin,
Jin-Xin Zhou
Abstract:
A Cayley digraph on a group $G$ is called NNN if the Cayley digraph is normal and its automorphism group contains a non-normal regular subgroup isomorphic to $G$. A group is called NNND-group or NNN-group if there is an NNN Cayley digraph or graph on the group, respectively. In this paper, it is shown that there is no cyclic NNND-group, and hence no cyclic NNN-group. Furthermore, a dihedral group…
▽ More
A Cayley digraph on a group $G$ is called NNN if the Cayley digraph is normal and its automorphism group contains a non-normal regular subgroup isomorphic to $G$. A group is called NNND-group or NNN-group if there is an NNN Cayley digraph or graph on the group, respectively. In this paper, it is shown that there is no cyclic NNND-group, and hence no cyclic NNN-group. Furthermore, a dihedral group of order $2n$ is an NNND-group or an NNN-group if and only if $n\ge 6$ is even and $n\not=8$.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Numerical study on hyper parameter settings for neural network approximation to partial differential equations
Authors:
Hee Jun Yang,
Alexander Heinlein,
Hyea Hyun Kim
Abstract:
Approximate solutions of partial differential equations (PDEs) obtained by neural networks are highly affected by hyper parameter settings. For instance, the model training strongly depends on loss function design, including the choice of weight factors for different terms in the loss function, and the sampling set related to numerical integration; other hyper parameters, like the network architec…
▽ More
Approximate solutions of partial differential equations (PDEs) obtained by neural networks are highly affected by hyper parameter settings. For instance, the model training strongly depends on loss function design, including the choice of weight factors for different terms in the loss function, and the sampling set related to numerical integration; other hyper parameters, like the network architecture and the optimizer settings, also impact the model performance. On the other hand, suitable hyper parameter settings are known to be different for different model problems and currently no universal rule for the choice of hyper parameters is known.
In this paper, for second order elliptic model problems, various hyper parameter settings are tested numerically to provide a practical guide for efficient and accurate neural network approximation. While a full study of all possible hyper parameter settings is not possible, we focus on studying the formulation of the PDE loss as well as the incorporation of the boundary conditions, the choice of collocation points associated with numerical integration schemes, and various approaches for dealing with loss imbalances will be extensively studied on various model problems; in addition to various Poisson model problems, also a nonlinear and an eigenvalue problem are considered.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Limit cycles appearing from the perturbation of a cubic isochronous center
Authors:
Jihua Yang,
Qipeng Zhang
Abstract:
For a polynomial differential system $$\dot{x}=-y+\sum\limits_{i+j=3}α_{i,j}x^iy^j,\quad \dot{y}=x+\sum\limits_{i+j=3}β_{i,j}x^iy^j,$$
Pleshkan (Differ. Equations, 1969) proved that the origin is an isochronous center of this system iff it can be brought to one of $S^*_1$, $S^*_2$, $S^*_3$ or $S^*_4$. The bifurcation of limit cycles for these four types of isochronous differential systems have n…
▽ More
For a polynomial differential system $$\dot{x}=-y+\sum\limits_{i+j=3}α_{i,j}x^iy^j,\quad \dot{y}=x+\sum\limits_{i+j=3}β_{i,j}x^iy^j,$$
Pleshkan (Differ. Equations, 1969) proved that the origin is an isochronous center of this system iff it can be brought to one of $S^*_1$, $S^*_2$, $S^*_3$ or $S^*_4$. The bifurcation of limit cycles for these four types of isochronous differential systems have not yet been studied, except for $S^*_1$. This paper is devoted to study the limit cycle problem of $S^*_2$ when we perturb it with an arbitrary polynomial vector field. An upper bound of the number of limit cycles is obtained using the Abelian integral.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Bifurcation of limit cycles from a cubic reversible isochrone
Authors:
Jihua Yang,
Qipeng Zhang
Abstract:
This paper is devoted to study the limit cycle problem of a cubic reversible system with an isochronous center, when it is perturbed inside a class of polynomials. An upper bound of the number of limit cycles is obtained using the Abelian integral. The algebraic structure of the Abelian integral is acquired thanks to some iterative formulas, which differs in many aspects from other methods. Some n…
▽ More
This paper is devoted to study the limit cycle problem of a cubic reversible system with an isochronous center, when it is perturbed inside a class of polynomials. An upper bound of the number of limit cycles is obtained using the Abelian integral. The algebraic structure of the Abelian integral is acquired thanks to some iterative formulas, which differs in many aspects from other methods. Some numerical simulations verify the existence of limit cycles.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Persistence of hyperbolic solutions of ODE's under functional perturbations: Applications to the motion of relativistic charged particles
Authors:
Joan Gimeno,
Rafael de la Llave,
Jiaqi Yang
Abstract:
We rigorously construct a variety of orbits for certain delay differential equations, including the electrodynamic equations formulated by Wheeler and Feynman in 1949. These equations involve delays and advances that depend on the trajectory itself, making it unclear how to formulate them as evolution equations in a conventional phase space. Despite their fundamental significance in physics, their…
▽ More
We rigorously construct a variety of orbits for certain delay differential equations, including the electrodynamic equations formulated by Wheeler and Feynman in 1949. These equations involve delays and advances that depend on the trajectory itself, making it unclear how to formulate them as evolution equations in a conventional phase space. Despite their fundamental significance in physics, their mathematical treatment remains limited.
Our method applies broadly to various functional differential equations that have appeared in the literature, including advanced/delayed equations, neutral or state-dependent delay equations, and nested delay equations, under appropriate regularity assumptions.
Rather than addressing the notoriously difficult problem of proving the existence of solutions for all the initial conditions in a set, we focus on the direct construction of a diverse collection of solutions. This approach is often sufficient to describe physical phenomena. For instance, in certain models, we establish the existence of families of solutions exhibiting symbolic dynamics.
Our method is based on the assumption that the system is, in a weak sense, close to an ordinary differential equation (ODE) with "hyperbolic" solutions as defined in dynamical systems. We then derive functional equations to obtain space-time corrections.
As a byproduct of the method, we obtain that the solutions constructed depend very smoothly on parameters of the model. Also, we show that many formal approximations currently used in physics are valid with explicit error terms. Several of the relations between different orbits of the ODE persist qualitatively in the full problem.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Boundary determination for the Schrödinger equation with unknown embedded obstacles by local data
Authors:
Chengyu Wu,
Jiaqing Yang
Abstract:
In this paper, we consider the inverse boundary value problem of the elliptic operator $Δ+q$ in a fixed region $Ω\subset\mathbb{R}^3$ with unknown embedded obstacles $D$. In particular, we give a new and simple proof to uniquely determine $q$ and all of its derivatives at the boundary from the knowledge of the local Dirichlet-to-Neumann map on $\partialΩ$, disregarding the unknown obstacle, where…
▽ More
In this paper, we consider the inverse boundary value problem of the elliptic operator $Δ+q$ in a fixed region $Ω\subset\mathbb{R}^3$ with unknown embedded obstacles $D$. In particular, we give a new and simple proof to uniquely determine $q$ and all of its derivatives at the boundary from the knowledge of the local Dirichlet-to-Neumann map on $\partialΩ$, disregarding the unknown obstacle, where in fact only the local Cauchy data of the fundamental solution is used. Our proof mainly depends on the rigorous singularity analysis on certain singular solutions and the volume potentials of fundamental solution, which is easy to extend to many other cases.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
A Conservative Partially Hyperbolic Dichotomy: Hyperbolicity versus Nonhyperbolic Measures
Authors:
Lorenzo J. Díaz,
Jiagang Yang,
Jinhua Zhang
Abstract:
In a conservative and partially hyperbolic three-dimensional setting, we study three representative classes of diffeomorphisms: those homotopic to Anosov (or Derived from Anosov diffeomorphisms), diffeomorphisms in neighborhoods of the time-one map of the geodesic flow on a surface of negative curvature, and accessible and dynamically coherent skew products with circle fibers. In any of these clas…
▽ More
In a conservative and partially hyperbolic three-dimensional setting, we study three representative classes of diffeomorphisms: those homotopic to Anosov (or Derived from Anosov diffeomorphisms), diffeomorphisms in neighborhoods of the time-one map of the geodesic flow on a surface of negative curvature, and accessible and dynamically coherent skew products with circle fibers. In any of these classes, we establish the following dichotomy: either the diffeomorphism is Anosov, or it possesses nonhyperbolic ergodic measures. Our approach is perturbation-free and combines recent advances in the study of stably ergodic diffeomorphisms with a variation of the periodic approximation method to obtain ergodic measures.
A key result in our construction, independent of conservative hypotheses, is the construction of nonhyperbolic ergodic measures for sets with a minimal strong unstable foliation that satisfy the mostly expanding property. This approach enables us to obtain nonhyperbolic ergodic measures in other contexts, including some subclasses of the so-called anomalous partially hyperbolic diffeomorphisms that are not dynamically coherent.
△ Less
Submitted 17 April, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Domain Overlapping Algorithm with Nonlinear Mapping for Collocation-Based Solutions of Eigenvalue Problems
Authors:
Jinwei Yang,
Vinod Srinivasan
Abstract:
This paper presents four novel domain decomposition algorithms integrated with nonlinear mapping techniques to address collocation-based solutions of eigenvalue problems involving sharp interfaces or steep gradients. The proposed methods leverage the spectral accuracy of Chebyshev polynomials while overcoming limitations of existing tools like Chebfun, particularly in preserving higher-order deriv…
▽ More
This paper presents four novel domain decomposition algorithms integrated with nonlinear mapping techniques to address collocation-based solutions of eigenvalue problems involving sharp interfaces or steep gradients. The proposed methods leverage the spectral accuracy of Chebyshev polynomials while overcoming limitations of existing tools like Chebfun, particularly in preserving higher-order derivative continuity and enabling flexible node clustering near discontinuities. Key findings include the following: for algorithm Performance: The one-point overlap method demonstrated significant improvements over global mapping approaches, reducing required grid points by orders of magnitude while maintaining spectral convergence. The two-point overlap method further methods generalized the approach, allowing arbitrary node distributions and nonlinear mappings. These achieved exponential error reduction for Burgers equation) by combining Taylor expansions with Chebyshev derivatives in overlap regions. While Chebfun splitting strategy automates domain decomposition, it enforces only C0 continuity, leading to discontinuous higher derivatives. In contrast, the proposed algorithms preserved smoothness up to CN continuity, critical for eigenvalue problems in hydrodynamic stability and nonlinear BVPs. Validation on 3D channel flow with viscosity stratification and Burgers equation highlighted the methods robustness. For instance, eigenvalue calculations for miscible core-annular flows matched prior results while resolving sharp viscosity gradients with fewer nodes.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
ICE-closed subcategories and epibricks over recollements
Authors:
Jinrui Yang,
Yongyun Qin
Abstract:
Let $( \mathcal{A^{'}},\mathcal{A},\mathcal{A^{''}},i^\ast,i_\ast,i_!,j_!,j^\ast,j_\ast)$ be a recollement of abelian categories. We proved that every ICE-closed subcategory (resp. epibrick, monobrick) in $\mathcal{A^{'}}$ or $\mathcal{A^{''}}$ can be extended to an ICE-closed subcategories (resp. epibrick, monobrick) in $\mathcal{A}$, and the assignment $\mathcal{C}\mapsto j^*(\mathcal{C})$ defin…
▽ More
Let $( \mathcal{A^{'}},\mathcal{A},\mathcal{A^{''}},i^\ast,i_\ast,i_!,j_!,j^\ast,j_\ast)$ be a recollement of abelian categories. We proved that every ICE-closed subcategory (resp. epibrick, monobrick) in $\mathcal{A^{'}}$ or $\mathcal{A^{''}}$ can be extended to an ICE-closed subcategories (resp. epibrick, monobrick) in $\mathcal{A}$, and the assignment $\mathcal{C}\mapsto j^*(\mathcal{C})$ defines a bijection between certain ICE-closed subcategories in $\mathcal{A}$ and those in $\mathcal{A}''$. Moreover, the ICE-closed subcategory $\mathcal{C}$ of $\mathcal{A}$ containing $i_\ast(\mathcal{A^{'}})$ admits a new recollement relative to ICE-closed subcategories $\mathcal{A^{'}}$ and $j^\ast(\mathcal{C})$ which induced from the original recollement when $j_!{j^\ast(\mathcal{C})}\subset\mathcal{C}$.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Randomized Kaczmarz Methods with Beyond-Krylov Convergence
Authors:
Michał Dereziński,
Deanna Needell,
Elizaveta Rebrova,
Jiaming Yang
Abstract:
Randomized Kaczmarz methods form a family of linear system solvers which converge by repeatedly projecting their iterates onto randomly sampled equations. While effective in some contexts, such as highly over-determined least squares, Kaczmarz methods are traditionally deemed secondary to Krylov subspace methods, since this latter family of solvers can exploit outliers in the input's singular valu…
▽ More
Randomized Kaczmarz methods form a family of linear system solvers which converge by repeatedly projecting their iterates onto randomly sampled equations. While effective in some contexts, such as highly over-determined least squares, Kaczmarz methods are traditionally deemed secondary to Krylov subspace methods, since this latter family of solvers can exploit outliers in the input's singular value distribution to attain fast convergence on ill-conditioned systems.
In this paper, we introduce Kaczmarz++, an accelerated randomized block Kaczmarz algorithm that exploits outlying singular values in the input to attain a fast Krylov-style convergence. Moreover, we show that Kaczmarz++ captures large outlying singular values provably faster than popular Krylov methods, for both over- and under-determined systems. We also develop an optimized variant for positive semidefinite systems, called CD++, demonstrating empirically that it is competitive in arithmetic operations with both CG and GMRES on a collection of benchmark problems. To attain these results, we introduce several novel algorithmic improvements to the Kaczmarz framework, including adaptive momentum acceleration, Tikhonov-regularized projections, and a memoization scheme for reusing information from previously sampled equation~blocks.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
A second-order dynamical low-rank mass-lumped finite element method for the Allen-Cahn equation
Authors:
Jun Yang,
Nianyu Yi,
Peimeng Yin
Abstract:
In this paper, we propose a novel second-order dynamical low-rank mass-lumped finite element method for solving the Allen-Cahn (AC) equation, a semilinear parabolic partial differential equation. The matrix differential equation of the semi-discrete mass-lumped finite element scheme is decomposed into linear and nonlinear components using the second-order Strang splitting method. The linear compon…
▽ More
In this paper, we propose a novel second-order dynamical low-rank mass-lumped finite element method for solving the Allen-Cahn (AC) equation, a semilinear parabolic partial differential equation. The matrix differential equation of the semi-discrete mass-lumped finite element scheme is decomposed into linear and nonlinear components using the second-order Strang splitting method. The linear component is solved analytically within a low-rank manifold, while the nonlinear component is discretized using a second-order augmented basis update & Galerkin (BUG) integrator, in which the $S$-step matrix equation is solved by the explicit 2-stage strong stability-preserving Runge-Kutta method. The algorithm has lower computational complexity than the full-rank mass-lump finite element method. The dynamical low-rank finite element solution is shown to conserve mass up to a truncation tolerance for the conservative Allen-Cahn equation. Meanwhile, the modified energy is dissipative up to a high-order error and is hence stable. Numerical experiments validate the theoretical results. Symmetry-preserving tests highlight the robustness of the proposed method for long-time simulations and demonstrate its superior performance compared to existing methods.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
The Terwilliger algebras of the group association schemes of non-abelian finite groups admitting an abelian subgroup of index 2
Authors:
Jing Yang,
Qinghong Guo,
Weijun Liu,
Lihua Feng
Abstract:
In this paper, we determine the dimension of the Terwilliger algebras of non-abelian finite groups admitting an abelian subgroup of index 2 by showing that they are triply transitive. Moreover, we give a complete characterization of the Wedderburn components of the Terwilliger algebras of these groups.
In this paper, we determine the dimension of the Terwilliger algebras of non-abelian finite groups admitting an abelian subgroup of index 2 by showing that they are triply transitive. Moreover, we give a complete characterization of the Wedderburn components of the Terwilliger algebras of these groups.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
ERGNN: Spectral Graph Neural Network With Explicitly-Optimized Rational Graph Filters
Authors:
Guoming Li,
Jian Yang,
Shangsong Liang
Abstract:
Approximation-based spectral graph neural networks, which construct graph filters with function approximation, have shown substantial performance in graph learning tasks. Despite their great success, existing works primarily employ polynomial approximation to construct the filters, whereas another superior option, namely ration approximation, remains underexplored. Although a handful of prior work…
▽ More
Approximation-based spectral graph neural networks, which construct graph filters with function approximation, have shown substantial performance in graph learning tasks. Despite their great success, existing works primarily employ polynomial approximation to construct the filters, whereas another superior option, namely ration approximation, remains underexplored. Although a handful of prior works have attempted to deploy the rational approximation, their implementations often involve intensive computational demands or still resort to polynomial approximations, hindering full potential of the rational graph filters. To address the issues, this paper introduces ERGNN, a novel spectral GNN with explicitly-optimized rational filter. ERGNN adopts a unique two-step framework that sequentially applies the numerator filter and the denominator filter to the input signals, thus streamlining the model paradigm while enabling explicit optimization of both numerator and denominator of the rational filter. Extensive experiments validate the superiority of ERGNN over state-of-the-art methods, establishing it as a practical solution for deploying rational-based GNNs.
△ Less
Submitted 20 May, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
$N=1$ super Virasoro tensor categories
Authors:
Thomas Creutzig,
Robert McRae,
Florencia Orosz Hunziker,
Jinwei Yang
Abstract:
We show that the category of $C_1$-cofinite modules for the universal $N=1$ super Virasoro vertex operator superalgebra $\mathcal{S}(c,0)$ at any central charge $c$ is locally finite and admits the vertex algebraic braided tensor category structure of Huang-Lepowsky-Zhang. For central charges $c^{\mathfrak{ns}}(t)=\frac{15}{2}-3(t+t^{-1})$ with $t\notin\mathbb{Q}$, we show that this tensor categor…
▽ More
We show that the category of $C_1$-cofinite modules for the universal $N=1$ super Virasoro vertex operator superalgebra $\mathcal{S}(c,0)$ at any central charge $c$ is locally finite and admits the vertex algebraic braided tensor category structure of Huang-Lepowsky-Zhang. For central charges $c^{\mathfrak{ns}}(t)=\frac{15}{2}-3(t+t^{-1})$ with $t\notin\mathbb{Q}$, we show that this tensor category is semisimple, rigid, and slightly degenerate, and we determine its fusion rules. For central charge $c^{\mathfrak{ns}}(1)=\frac{3}{2}$, we show that this tensor category is rigid and that its simple modules have the same fusion rules as $\mathrm{Rep}\,\mathfrak{osp}(1\vert 2)$, in agreement with earlier fusion rule calculations of Milas. Finally, for the remaining central charges $c^{\mathfrak{ns}}(t)$ with $t\in\mathbb{Q}^\times$, we show that the simple $\mathcal{S}(c^{\mathfrak{ns}}(t),0)$-module $\mathcal{S}_{2,2}$ of lowest conformal weight $h^{\mathfrak{ns}}_{2,2}(t)=\frac{3(t-1)^2}{8t}$ is rigid and self-dual, except possibly when $t^{\pm 1}$ is a negative integer or when $c^{\mathfrak{ns}}(t)$ is the central charge of a rational $N=1$ superconformal minimal model.
As $\mathcal{S}_{2,2}$ is expected to generate the category of $C_1$-cofinite $\mathcal{S}(c^{\mathfrak{ns}}(t),0)$-modules under fusion, rigidity of $\mathcal{S}_{2,2}$ is the first key step to proving rigidity of this category for general $t\in\mathbb{Q}^\times$.
△ Less
Submitted 19 January, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
On uniqueness of inverse conductive scattering problem with unknown embedded obstacles
Authors:
Chengyu Wu,
Jiaqing Yang
Abstract:
This paper is concerning the inverse conductive scattering of acoustic waves by a bounded inhomogeneous object with possibly embedded obstacles inside. A new uniqueness theorem is proved that the conductive object is uniquely determined by the fixed frequency far-field measurements, ignoring its contents. Meanwhile, the boundary informations of several related physical coefficients are also unique…
▽ More
This paper is concerning the inverse conductive scattering of acoustic waves by a bounded inhomogeneous object with possibly embedded obstacles inside. A new uniqueness theorem is proved that the conductive object is uniquely determined by the fixed frequency far-field measurements, ignoring its contents. Meanwhile, the boundary informations of several related physical coefficients are also uniquely determined. The proof is mainly based on a detailed singularity analysis of solutions near the interface associated with a family of point sources or hypersingular point sources, which is deduced by the potential theory. Moreover, the other key ingredient in the proof is the well-posedness of the interior transmission problem with the conductivity boundary condition in the L^2 sense, where several sufficient conditions depending on the domain and physical coefficients are provided.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Volume-surface systems with sub-quadratic intermediate sum on the surface: Global existence and boundedness
Authors:
Juan Yang,
Bao Quoc Tang
Abstract:
The global existence and boundedness of solutions to volume-surface reaction diffusion systems with a mass control condition are investigated. Such systems arise typically in e.g. cell biology, ecology or fluid mechanics, when some concentrations or densities are inside a domain and some others are on its boundary. Comparing to previous works, the difficulty of systems under consideration here is…
▽ More
The global existence and boundedness of solutions to volume-surface reaction diffusion systems with a mass control condition are investigated. Such systems arise typically in e.g. cell biology, ecology or fluid mechanics, when some concentrations or densities are inside a domain and some others are on its boundary. Comparing to previous works, the difficulty of systems under consideration here is that the nonlinearities on the surface can have a sub-quadratic growth rates in all dimensions. To overcome this, we first use the Moser iteration to get some uniform bounds of the time integration of the solutions. Then by combining these bounds with an $L^p$-energy method and a duality argument, we obtain the global existence of solutions. Moreover, under mass dissipation conditions, the solution is shown to be bounded uniformly in time.
△ Less
Submitted 17 December, 2024; v1 submitted 15 December, 2024;
originally announced December 2024.
-
DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces
Authors:
Jacob F. Pettit,
Chak Shing Lee,
Jiachen Yang,
Alex Ho,
Daniel Faissol,
Brenden Petersen,
Mikel Landajuela
Abstract:
We consider the challenge of black-box optimization within hybrid discrete-continuous and variable-length spaces, a problem that arises in various applications, such as decision tree learning and symbolic regression. We propose DisCo-DSO (Discrete-Continuous Deep Symbolic Optimization), a novel approach that uses a generative model to learn a joint distribution over discrete and continuous design…
▽ More
We consider the challenge of black-box optimization within hybrid discrete-continuous and variable-length spaces, a problem that arises in various applications, such as decision tree learning and symbolic regression. We propose DisCo-DSO (Discrete-Continuous Deep Symbolic Optimization), a novel approach that uses a generative model to learn a joint distribution over discrete and continuous design variables to sample new hybrid designs. In contrast to standard decoupled approaches, in which the discrete and continuous variables are optimized separately, our joint optimization approach uses fewer objective function evaluations, is robust against non-differentiable objectives, and learns from prior samples to guide the search, leading to significant improvement in performance and sample efficiency. Our experiments on a diverse set of optimization tasks demonstrate that the advantages of DisCo-DSO become increasingly evident as the complexity of the problem increases. In particular, we illustrate DisCo-DSO's superiority over the state-of-the-art methods for interpretable reinforcement learning with decision trees.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
MPAX: Mathematical Programming in JAX
Authors:
Haihao Lu,
Zedong Peng,
Jinwen Yang
Abstract:
This paper presents MPAX (Mathematical Programming in JAX), a versatile and efficient toolbox for integrating linear programming (LP) into machine learning workflows. MPAX implemented the state-of-the-art first-order methods, restarted average primal-dual hybrid gradient and reflected restarted Halpern primal-dual hybrid gradient, to solve LPs in JAX. This provides native support for hardware acce…
▽ More
This paper presents MPAX (Mathematical Programming in JAX), a versatile and efficient toolbox for integrating linear programming (LP) into machine learning workflows. MPAX implemented the state-of-the-art first-order methods, restarted average primal-dual hybrid gradient and reflected restarted Halpern primal-dual hybrid gradient, to solve LPs in JAX. This provides native support for hardware accelerations along with features like batch solving, auto-differentiation, and device parallelism. Extensive numerical experiments demonstrate the advantages of MPAX over existing solvers. The solver is available at https://github.com/MIT-Lu-Lab/MPAX.
△ Less
Submitted 6 February, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
Effective Rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics
Authors:
Jiang Yang,
Yuxiang Zhao,
Quanhui Zhu
Abstract:
In recent years, deep learning, powered by neural networks, has achieved widespread success in solving high-dimensional problems, particularly those with low-dimensional feature structures. This success stems from their ability to identify and learn low dimensional features tailored to the problems. Understanding how neural networks extract such features during training dynamics remains a fundamen…
▽ More
In recent years, deep learning, powered by neural networks, has achieved widespread success in solving high-dimensional problems, particularly those with low-dimensional feature structures. This success stems from their ability to identify and learn low dimensional features tailored to the problems. Understanding how neural networks extract such features during training dynamics remains a fundamental question in deep learning theory. In this work, we propose a novel perspective by interpreting the neurons in the last hidden layer of a neural network as basis functions that represent essential features. To explore the linear independence of these basis functions throughout the deep learning dynamics, we introduce the concept of 'effective rank'. Our extensive numerical experiments reveal a notable phenomenon: the effective rank increases progressively during the learning process, exhibiting a staircase-like pattern, while the loss function concurrently decreases as the effective rank rises. We refer to this observation as the 'staircase phenomenon'. Specifically, for deep neural networks, we rigorously prove the negative correlation between the loss function and effective rank, demonstrating that the lower bound of the loss function decreases with increasing effective rank. Therefore, to achieve a rapid descent of the loss function, it is critical to promote the swift growth of effective rank. Ultimately, we evaluate existing advanced learning methodologies and find that these approaches can quickly achieve a higher effective rank, thereby avoiding redundant staircase processes and accelerating the rapid decline of the loss function.
△ Less
Submitted 9 January, 2025; v1 submitted 6 December, 2024;
originally announced December 2024.
-
On global existence and large-time behaviour of weak solutions to the compressible barotropic Navier--Stokes Equations on $\mathbb{T}^2$ with density-dependent bulk viscosity: beyond the Vaĭgant--Kazhikhov regime
Authors:
Siran Li,
Jianing Yang
Abstract:
We are concerned with the compressible barotropic Navier--Stokes equations for a $γ$-law gas with density-dependent bulk viscosity coefficient $λ=λ(ρ)=ρ^β$ on the two-dimensional periodic domain $\mathbb{T}^2$. The global existence of weak solutions with initial density bounded away from zero and infinity for $β>3$, $γ>1$ has been established by Vaĭgant--Kazhikhov [Sib. Math. J. 36 (1995), 1283--1…
▽ More
We are concerned with the compressible barotropic Navier--Stokes equations for a $γ$-law gas with density-dependent bulk viscosity coefficient $λ=λ(ρ)=ρ^β$ on the two-dimensional periodic domain $\mathbb{T}^2$. The global existence of weak solutions with initial density bounded away from zero and infinity for $β>3$, $γ>1$ has been established by Vaĭgant--Kazhikhov [Sib. Math. J. 36 (1995), 1283--1316]. When $γ=β>3$, the large-time behaviour of the weak solutions and, in particular, the absence of formation of vacuum and concentration of density as $t \to \infty$, has been proved by Perepelitsa [SIAM J. Math. Anal. 39 (2007/08), 1344--1365]. Huang--Li [J. Math. Pures Appl. 106 (2016), 123--154] extended these results by establishing the global existence of weak solutions and large-time behaviour under the assumptions $β>3/2$, $1< γ<4β-3$, and that the initial density stays away from infinity (but may contain vacuum).
Improving upon the works listed above, we prove that in the regime of parameters as in Huang--Li, namely that $β>3/2$ and $1< γ<4β-3$, if the density has no vacuum or concentration at $t=0$, then it stays uniformly away from zero and infinity at all later time $t \in ]0,\infty]$. Moreover, under the mere assumption that $β>1$ and $γ>1$, we establish the global existence of weak solutions, thus pushing the global existence theory of the barotropic Navier--Stokes equations on $\mathbb{T}^2$ to the most general setting to date. One of the key ingredients of our proof is a novel application -- motivated by the recent work due to Danchin--Mucha [Comm. Pure Appl. Math. 76 (2023), 3437--3492] -- of Desjardins' logarithmic interpolation inequality.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Contrasting the optimal resource allocation to cybersecurity and cyber insurance using prospect theory versus expected utility theory
Authors:
Chaitanya Joshi,
Jinming Yang,
Sergeja Slapnicar,
Ryan K L Ko
Abstract:
Protecting against cyber-threats is vital for every organization and can be done by investing in cybersecurity controls and purchasing cyber insurance. However, these are interlinked since insurance premiums could be reduced by investing more in cybersecurity controls. The expected utility theory and the prospect theory are two alternative theories explaining decision-making under risk and uncerta…
▽ More
Protecting against cyber-threats is vital for every organization and can be done by investing in cybersecurity controls and purchasing cyber insurance. However, these are interlinked since insurance premiums could be reduced by investing more in cybersecurity controls. The expected utility theory and the prospect theory are two alternative theories explaining decision-making under risk and uncertainty, which can inform strategies for optimizing resource allocation. While the former is considered a rational approach, research has shown that most people make decisions consistent with the latter, including on insurance uptakes. We compare and contrast these two approaches to provide important insights into how the two approaches could lead to different optimal allocations resulting in differing risk exposure as well as financial costs. We introduce the concept of a risk curve and show that identifying the nature of the risk curve is a key step in deriving the optimal resource allocation.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
A weighted scalar auxiliary variable method for solving gradient flows: bridging the nonlinear energy-based and Lagrange multiplier approaches
Authors:
Qiong-Ao Huang,
Wei Jiang,
Jerry Zhijian Yang,
Cheng Yuan
Abstract:
Two primary scalar auxiliary variable (SAV) approaches are widely applied for simulating gradient flow systems, i.e., the nonlinear energy-based approach and the Lagrange multiplier approach. The former guarantees unconditional energy stability through a modified energy formulation, whereas the latter preserves original energy stability but requires small time steps for numerical solutions. In thi…
▽ More
Two primary scalar auxiliary variable (SAV) approaches are widely applied for simulating gradient flow systems, i.e., the nonlinear energy-based approach and the Lagrange multiplier approach. The former guarantees unconditional energy stability through a modified energy formulation, whereas the latter preserves original energy stability but requires small time steps for numerical solutions. In this paper, we introduce a novel weighted SAV method which integrates these two approaches for the first time. Our method leverages the advantages of both approaches: (i) it ensures the existence of numerical solutions for any time step size with a sufficiently large weight coefficient; (ii) by using a weight coefficient smaller than one, it achieves a discrete energy closer to the original, potentially ensuring stability under mild conditions; and (iii) it maintains consistency in computational cost by utilizing the same time/spatial discretization formulas. We present several theorems and numerical experiments to validate the accuracy, energy stability and superiority of our proposed method.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
A Priori Bounds for Hénon-like Renormalization
Authors:
Sylvain Crovisier,
Mikhail Lyubich,
Enrique Pujals,
Jonguk Yang
Abstract:
We formulate and prove $\textit{a priori}$ bounds for the renormalization of Hénon-like maps (under certain regularity assumptions). This provides a certain uniform control on the small-scale geometry of the dynamics, and ensures pre-compactness of the renormalization sequence. In a sequel to this paper, a priori bounds are used in the proof of the main results, including renormalization convergen…
▽ More
We formulate and prove $\textit{a priori}$ bounds for the renormalization of Hénon-like maps (under certain regularity assumptions). This provides a certain uniform control on the small-scale geometry of the dynamics, and ensures pre-compactness of the renormalization sequence. In a sequel to this paper, a priori bounds are used in the proof of the main results, including renormalization convergence, finite-time checkability of the required regularity conditions and regular unicriticality of the dynamics.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Nonlinear Assimilation via Score-based Sequential Langevin Sampling
Authors:
Zhao Ding,
Chenguang Duan,
Yuling Jiao,
Jerry Zhijian Yang,
Cheng Yuan,
Pingwen Zhang
Abstract:
This paper presents score-based sequential Langevin sampling (SSLS), a novel approach to nonlinear data assimilation within a recursive Bayesian framework. The proposed method decomposes the assimilation process into alternating prediction and update steps, leveraging dynamic models for state prediction while incorporating observational data through score-based Langevin Monte Carlo during updates.…
▽ More
This paper presents score-based sequential Langevin sampling (SSLS), a novel approach to nonlinear data assimilation within a recursive Bayesian framework. The proposed method decomposes the assimilation process into alternating prediction and update steps, leveraging dynamic models for state prediction while incorporating observational data through score-based Langevin Monte Carlo during updates. To address challenges in posterior sampling, we introduce an annealing strategy within the update mechanism. We provide theoretical guarantees for SSLS convergence in total variation (TV) distance under certain conditions, providing insights into error behavior with respect to key hyper-parameters. Our numerical experiments across challenging scenarios -- including high-dimensional systems, strong nonlinearity, and sparse observations -- demonstrate the robust performance of the proposed method. Furthermore, SSLS effectively quantifies the uncertainty associated with the estimated states, making it particularly valuable for the error calibration.
△ Less
Submitted 23 February, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Quantitative Estimates on Invariant Manifolds for Surface Diffeomorphisms
Authors:
Sylvain Crovisier,
Mikhail Lyubich,
Enrique Pujals,
Jonguk Yang
Abstract:
We carry out a detailed quantitative analysis on the geometry of invariant manifolds for smooth dissipative systems in dimension two. We begin by quantifying the regularity of any orbit (finite or infinite) in the phase space with a set of explicit inequalities. Then we relate this directly to the quasi-linearization of the local dynamics on regular neighborhoods of this orbit. The parameters of r…
▽ More
We carry out a detailed quantitative analysis on the geometry of invariant manifolds for smooth dissipative systems in dimension two. We begin by quantifying the regularity of any orbit (finite or infinite) in the phase space with a set of explicit inequalities. Then we relate this directly to the quasi-linearization of the local dynamics on regular neighborhoods of this orbit. The parameters of regularity explicitly determine the sizes of the regular neighborhoods and the smooth norms of the corresponding regular charts. As a corollary, we establish the existence of smooth stable and center manifolds with uniformly bounded geometries for regular orbits independently of any pre-existing invariant measure. This provides us with the technical background for the renormalization theory of Hénon-like maps developed in the sequel papers.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Ribbon categories of weight modules for affine $\mathfrak{sl}_2$ at admissible levels
Authors:
Thomas Creutzig,
Robert McRae,
Jinwei Yang
Abstract:
We show that the braided tensor category of finitely-generated weight modules for the simple affine vertex operator algebra $L_k(\mathfrak{sl}_2)$ of $\mathfrak{sl}_2$ at any admissible level $k$ is rigid and hence a braided ribbon category. The proof uses a recent result of the first two authors with Shimizu and Yadav on embedding a braided Grothendieck-Verdier category $\mathcal{C}$ into the Dri…
▽ More
We show that the braided tensor category of finitely-generated weight modules for the simple affine vertex operator algebra $L_k(\mathfrak{sl}_2)$ of $\mathfrak{sl}_2$ at any admissible level $k$ is rigid and hence a braided ribbon category. The proof uses a recent result of the first two authors with Shimizu and Yadav on embedding a braided Grothendieck-Verdier category $\mathcal{C}$ into the Drinfeld center of the category of modules for a suitable commutative algebra $A$ in $\mathcal{C}$, in situations where the braided tensor category of local $A$-modules is rigid. Here, the commutative algebra $A$ is Adamović's inverse quantum Hamiltonian reduction of $L_k(\mathfrak{sl}_2)$, which is the simple rational Virasoro vertex operator algebra at central charge $1-\frac{6(k+1)^2}{k+2}$ tensored with a half-lattice conformal vertex algebra. As a corollary, we also show that the category of finitely-generated weight modules for the $N = 2$ super Virasoro vertex operator superalgebra at central charge $-6\ell-3$ is rigid for $\ell$ such that $(\ell+1)(k+2) = 1$.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Dynamic Programming: From Local Optimality to Global Optimality
Authors:
John Stachurski,
Jingni Yang,
Ziyue Yang
Abstract:
In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality from a single state imply optimality from every state? Working in a general setting, we provide sufficient conditions for this property that relate to reachability…
▽ More
In the theory of dynamic programming, an optimal policy is a policy whose lifetime value dominates that of all other policies from every possible initial condition in the state space. This raises a natural question: when does optimality from a single state imply optimality from every state? Working in a general setting, we provide sufficient conditions for this property that relate to reachability and irreducibility. Our results have significant implications for modern policy-based algorithms used to solve large-scale dynamic programs. We illustrate our findings by applying them to an optimal savings problem via an algorithm that implements gradient ascent in a policy space constructed from neural networks.
△ Less
Submitted 11 May, 2025; v1 submitted 17 November, 2024;
originally announced November 2024.
-
A note on the development of singularities on solutions to the Navier-Stokes equations under super critical forcing terms
Authors:
Hugo Beirão da Veiga,
Jiaqi Yang
Abstract:
Recently Qi S. Zhang provides examples of solutions to the Navier-Stokes equations which, under suitable hypothesis, blow up in finite time. He considers axially symmetric solutions in a cylinder $D\,$ under appropriate boundary conditions and under the effect of super critical external forces $f\,.$ The loss of boundedness for the velocity field, as $t\rightarrow T\,,$ is the basic case of blow u…
▽ More
Recently Qi S. Zhang provides examples of solutions to the Navier-Stokes equations which, under suitable hypothesis, blow up in finite time. He considers axially symmetric solutions in a cylinder $D\,$ under appropriate boundary conditions and under the effect of super critical external forces $f\,.$ The loss of boundedness for the velocity field, as $t\rightarrow T\,,$ is the basic case of blow up. However a more general situation is considered below, as explained in the preamble.\par In his main result Zhang exhibits, for each $q<\infty\,,$ a blow up solution with an external force $f\in L^q(0,T;L^1(D))\,.$ Following Zhang, we construct blow up solutions with forcing terms in $L^q(0,T;L^p(D))\,,$ for suitable pairs $(q,p)\,.$ In particular our results contain Zhang's result. A significant particular case is the existence of external forces $f \in L^1(0,T;L^p(D))\,,$ for every $p<2$, for which the velocity blows up in a finite time. The significant case $p=2$ remains open.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
On Regular Hénon-like Renormalization
Authors:
Jonguk Yang
Abstract:
We develop a renormalization theory of non-perturbative dissipative Hénon-like maps with combinatorics of bounded type. The main novelty of our approach is the incorporation of Pesin theoretic ideas to the renormalization method, which enables us to control the small-scale geometry of dynamics in the higher-dimensional setting. In a prequel to this paper, it is shown that, under certain regularity…
▽ More
We develop a renormalization theory of non-perturbative dissipative Hénon-like maps with combinatorics of bounded type. The main novelty of our approach is the incorporation of Pesin theoretic ideas to the renormalization method, which enables us to control the small-scale geometry of dynamics in the higher-dimensional setting. In a prequel to this paper, it is shown that, under certain regularity conditions on the return maps, renormalizations of Hénon-like maps have $\textit{a priori}$ bounds. The current paper is devoted to the applications of this critical estimate. First, we prove that Hénon-like maps converge under renormalization to the same renormalization attractor as for 1D unimodal maps. Second, we show that the necessary and sufficient conditions for renormalization convergence are finite-time checkable. Lastly, we show that every infinitely renormalizable Hénon-like map is $\textit{regularly unicritical}$: there exists a unique orbit of tangencies between strong-stable and center manifolds, and outside a slow-exponentially shrinking neighborhood of this orbit, the dynamics behaves as a uniformly partially hyperbolic system.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Generic properties of vector fields identical on a compact set and codimension one partially hyperbolic dynamics
Authors:
Shaobo Gan,
Ruibin Xi,
Jiagang Yang,
Rusong Zheng
Abstract:
Let $\mathscr{X}^r(M)$ be the set of $C^r$ vector fields on a boundaryless compact Riemannian manifold $M$. Given a vector field $X_0\in\mathscr{X}^r(M)$ and a compact invariant set $Γ$ of $X_0$, we consider the closed subset $\mathscr{X}^r(M,Γ)$ of $\mathscr{X}^r(M)$, consisting of all $C^r$ vector fields which coincide with $X_0$ on $Γ$. Study of such a set naturally arises when one needs to per…
▽ More
Let $\mathscr{X}^r(M)$ be the set of $C^r$ vector fields on a boundaryless compact Riemannian manifold $M$. Given a vector field $X_0\in\mathscr{X}^r(M)$ and a compact invariant set $Γ$ of $X_0$, we consider the closed subset $\mathscr{X}^r(M,Γ)$ of $\mathscr{X}^r(M)$, consisting of all $C^r$ vector fields which coincide with $X_0$ on $Γ$. Study of such a set naturally arises when one needs to perturb a system while keeping part of the dynamics untouched.
A vector field $X\in\mathscr{X}^r(M,Γ)$ is called $Γ$-avoiding Kupka-Smale, if the dynamics away from $Γ$ is Kupka-Smale. We show that a generic vector field in $\mathscr{X}^r(M,Γ)$ is $Γ$-avoiding Kupka-Smale. In the $C^1$ topology, we obtain more generic properties for $\mathscr{X}^1(M,Γ)$. With these results, we further study codimension one partially hyperbolic dynamics for generic vector fields in $\mathscr{X}^1(M,Γ)$, giving a dichotomy of hyperbolicity and Newhouse phenomenon.
As an application, we obtain that $C^1$ generically in $\mathscr{X}^1(M)$, a non-trivial Lyapunov stable chain recurrence class of a singularity which admits a codimension 2 partially hyperbolic splitting with respect to the tangent flow is a homoclinic class.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Level of Regions for Deformed Braid Arrangements
Authors:
Yanru Chen,
Houshan Fu,
Suijie Wang,
Jinxing Yang
Abstract:
This paper primarily investigates a specific type of deformation of the braid arrangement $\mathcal{B}_n$ in $\mathbb{R}^n$, denoted by $\mathcal{B}_n^A$ and defined in (1.2). Let $r_l(\mathcal{B}_n^A)$ be the number of regions of level $l$ in $\mathcal{B}_n^A$ with the corresponding exponential generating function $R_l(A;x)$. Using the weighted digraph model introduced by Hetyei [11], we establis…
▽ More
This paper primarily investigates a specific type of deformation of the braid arrangement $\mathcal{B}_n$ in $\mathbb{R}^n$, denoted by $\mathcal{B}_n^A$ and defined in (1.2). Let $r_l(\mathcal{B}_n^A)$ be the number of regions of level $l$ in $\mathcal{B}_n^A$ with the corresponding exponential generating function $R_l(A;x)$. Using the weighted digraph model introduced by Hetyei [11], we establish a bijection between regions of level $l$ in $\mathcal{B}_n^A$ and valid $m$-acyclic weighted digraphs on the vertex set $[n]$ with exactly $l$ strong components. Based on this bijection, we obtain a property analogous to a polynomial sequence of binomial type, that is, $R_l(A;x)$ satisfies the relation
\[
R_l(A;x)=\big(R_1(A;x)\big)^l=R_k(A;x)R_{l-k}(A;x).
\]
Furthermore, the values $r_l(\mathcal{B}_n^A)$ yield a combinatorial interpretation for the coefficients in the expansion of the characteristic polynomial $χ_{\mathcal{B}_n^A}(t)$ in the basis elements $\binom{t}{l}$, that is, \[χ_{\mathcal{B}_n^A}(t)=\sum_{l=0}^n(-1)^{n-l}r_l(\mathcal{B}_n^A)\binom{t}{l}.\] If $n$, $a$ and $b$ are non-negative integers with $n\ge 2$ and $b-a\ge n-1$, for the deformation $\mathcal{B}_n^{[-a,b]}$ defined in (1.3), its characteristic polynomial has a single real root $0$ of multiplicity one when $n$ is odd, and has one more real root $\frac{n(a+b+1)}{2}$ of multiplicity one when $n$ is even.
△ Less
Submitted 18 November, 2024; v1 submitted 5 November, 2024;
originally announced November 2024.