-
Bialgebras induced by special left Alia algebras
Authors:
Tianshui Ma,
Chan Zhao,
Huihui Zheng
Abstract:
Special left Alia algebras were introduced by Dzhumadil'daev in [J. Math. Sci. (N.Y.) 161(2009), 11-30] when studying the classification of algebras with skew-symmetric identity of degree 3. A special left Alia algebra (resp. coalgebra) $(A, [,]_{(f,g)})$ (resp. $(A, Δ_{(F,G)})$) is constructed by a commutative associative algebra (resp. cocommutative coassociative coalgebra) $(A, \cdot)$ (resp.…
▽ More
Special left Alia algebras were introduced by Dzhumadil'daev in [J. Math. Sci. (N.Y.) 161(2009), 11-30] when studying the classification of algebras with skew-symmetric identity of degree 3. A special left Alia algebra (resp. coalgebra) $(A, [,]_{(f,g)})$ (resp. $(A, Δ_{(F,G)})$) is constructed by a commutative associative algebra (resp. cocommutative coassociative coalgebra) $(A, \cdot)$ (resp. $(A, δ)$) together with two linear maps $f, g: A\longrightarrow A$ (resp. $F, G: A\longrightarrow A$). We find that if $((A, \cdot), f)$ (resp. $((A, δ), F)$) is a Nijenhuis associative algebra (resp. coassociative coalgebra) such that $f\circ g=g\circ f$ (resp. $F\circ G=G\circ F$), then $((A, [,]_{(f,g)}), f)$ (resp. $((A, Δ_{(F,G)}), F)$) is a Nijenhuis left Alia algebra (resp. coalgebra). A bialgebraic structure, named Nijenhuis associative D-bialgebra and denoted by $((A, \cdot, δ), f, F)$, for $((A, \cdot), f)$ and $((A, δ), F)$ was presented in [J. Algebra 639(2024), 150-186]. In this paper, we investigate the bialgebraic structure, named Nijenhuis left Alia bialgebra and denoted by $((A, [,], Δ), N, S)$, for a Nijenhuis left Alia algebra $((A, [,]), N)$ and a Nijenhuis left Alia coalgebra $((A, Δ), S)$, such that Nijenhuis special left Alia bialgebra $((A, [,]_{(f,g)}, Δ_{(F,G)}), f, F)$ can be induced by Nijenhuis commutative cocommutative associative D-bialgebra $((A, \cdot, δ), f, F)$. We also provide a method to construct Nijenhuis operators on a left Alia algebra (resp. coalgebra).
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Integral Representations of Sobolev Spaces via ReLU$^k$ Activation Function and Optimal Error Estimates for Linearized Networks
Authors:
Xinliang Liu,
Tong Mao,
Jinchao Xu
Abstract:
This paper presents two main theoretical results concerning shallow neural networks with ReLU$^k$ activation functions. We establish a novel integral representation for Sobolev spaces, showing that every function in $H^{\frac{d+2k+1}{2}}(Ω)$ can be expressed as an $L^2$-weighted integral of ReLU$^k$ ridge functions over the unit sphere. This result mirrors the known representation of Barron spaces…
▽ More
This paper presents two main theoretical results concerning shallow neural networks with ReLU$^k$ activation functions. We establish a novel integral representation for Sobolev spaces, showing that every function in $H^{\frac{d+2k+1}{2}}(Ω)$ can be expressed as an $L^2$-weighted integral of ReLU$^k$ ridge functions over the unit sphere. This result mirrors the known representation of Barron spaces and highlights a fundamental connection between Sobolev regularity and neural network representations. Moreover, we prove that linearized shallow networks -- constructed by fixed inner parameters and optimizing only the linear coefficients -- achieve optimal approximation rates $O(n^{-\frac{1}{2}-\frac{2k+1}{2d}})$ in Sobolev spaces.
△ Less
Submitted 12 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Deep learning with missing data
Authors:
Tianyi Ma,
Tengyao Wang,
Richard J. Samworth
Abstract:
In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs…
▽ More
In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions. Our main theoretical result exploits an assumption that the observation patterns can be partitioned into cells on which the Bayes regression function behaves similarly, and belongs to a compositional Hölder class. It provides a finite-sample excess risk bound that holds for an arbitrary missingness mechanism, and in combination with a complementary minimax lower bound, demonstrates that our PENN estimator attains in typical cases the minimax rate of convergence as if the cells of the partition were known in advance, up to a poly-logarithmic factor in the sample size. Numerical experiments on simulated, semi-synthetic and real data confirm that the PENN estimator consistently improves, often dramatically, on standard neural networks without pattern embedding. Code to reproduce our experiments, as well as a tutorial on how to apply our method, is publicly available.
△ Less
Submitted 29 April, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Some new functionals related to free boundary minimal submanifolds
Authors:
Tianyu Ma,
Vladimir Medvedev
Abstract:
The metrics induced on free boundary minimal surfaces in geodesic balls in the upper unit hemisphere and hyperbolic space can be characterized as critical metrics for the functionals $Θ_{r,i}$ and $Ω_{r,i}$, introduced recently by Lima, Menezes and the second author. In this paper, we generalize this characterization to free boundary minimal submanifolds of higher dimension in the same spaces. We…
▽ More
The metrics induced on free boundary minimal surfaces in geodesic balls in the upper unit hemisphere and hyperbolic space can be characterized as critical metrics for the functionals $Θ_{r,i}$ and $Ω_{r,i}$, introduced recently by Lima, Menezes and the second author. In this paper, we generalize this characterization to free boundary minimal submanifolds of higher dimension in the same spaces. We also introduce some functionals of the form different from $Θ_{r,i}$ and show that the critical metrics for them are the metrics induced by free boundary minimal immersions into a geodesic ball in the upper unit hemisphere. In the case of surfaces, these functionals are bounded from above and not bounded from below. Moreover, the canonical metric on a geodesic disk in a 3-ball in the upper unit hemisphere is maximal for this functional on the set of all Riemannian metric of the topological disk.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Spectral condition for $k$-factor-criticality in $t$-connected graphs
Authors:
Tingyan Ma,
Edwin R. van Dam,
Ligong Wang
Abstract:
A graph $G$ is called $k$-factor-critical if $G-S$ has a perfect matching for every $S\subseteq G$ with $|S|=k$. A connected graph $G$ is called $t$-connected if it has more than $t$ vertices and remains connected whenever fewer than $t$ vertices are removed. We give a condition on the number of edges and a condition on the spectral radius for $k$-factor-criticality in $t$-connected graphs.
A graph $G$ is called $k$-factor-critical if $G-S$ has a perfect matching for every $S\subseteq G$ with $|S|=k$. A connected graph $G$ is called $t$-connected if it has more than $t$ vertices and remains connected whenever fewer than $t$ vertices are removed. We give a condition on the number of edges and a condition on the spectral radius for $k$-factor-criticality in $t$-connected graphs.
△ Less
Submitted 2 July, 2025; v1 submitted 28 March, 2025;
originally announced March 2025.
-
Optimal mixed fleet and charging infrastructure planning to electrify demand responsive feeder services with target CO2 emission constraints
Authors:
Haruko Nakao,
Tai-Yu Ma,
Richard D. Connors,
Francesco Viti
Abstract:
Electrifying demand-responsive transport systems need to plan the charging infrastructure carefully, considering the trade-offs of charging efficiency and charging infrastructure costs. Earlier studies assume a fully electrified fleet and overlook the planning issue in the transition period. This study addresses the joint fleet size and charging infrastructure planning for a demand-responsive feed…
▽ More
Electrifying demand-responsive transport systems need to plan the charging infrastructure carefully, considering the trade-offs of charging efficiency and charging infrastructure costs. Earlier studies assume a fully electrified fleet and overlook the planning issue in the transition period. This study addresses the joint fleet size and charging infrastructure planning for a demand-responsive feeder service under stochastic demand, given a user-defined targeted CO2 emission reduction policy. We propose a bi-level optimization model where the upper-level determines charging station configuration given stochastic demand patterns, whereas the lower-level solves a mixed fleet dial-a-ride routing problem under the CO2 emission and capacitated charging station constraints. An efficient deterministic annealing metaheuristic is proposed to solve the CO2-constrained mixed fleet routing problem. The performance of the algorithm is validated by a series of numerical test instances with up to 500 requests. We apply the model for a real-world case study in Bettembourg, Luxembourg, with different demand and customised CO2 reduction targets. The results show that the proposed method provides a flexible tool for joint charging infrastructure and fleet size planning under different levels of demand and CO2 emission reduction targets.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
On the coefficients of Tutte polynomials with one variable at 1
Authors:
Tianlong Ma,
Xiaxia Guan,
Xian'an Jin
Abstract:
Denote the Tutte polynomial of a graph $G$ and a matroid $M$ by $T_G(x,y)$ and $T_M(x,y)$ respectively. $T_G(x,1)$ and $T_G(1,y)$ were generalized to hypergraphs and further extended to integer polymatroids by Kálmán \cite{Kalman} in 2013, called interior and exterior polynomials respectively. Let $G$ be a $(k+1)$-edge connected graph of order $n$ and size $m$, and let $g=m-n+1$. Guan et al. (2023…
▽ More
Denote the Tutte polynomial of a graph $G$ and a matroid $M$ by $T_G(x,y)$ and $T_M(x,y)$ respectively. $T_G(x,1)$ and $T_G(1,y)$ were generalized to hypergraphs and further extended to integer polymatroids by Kálmán \cite{Kalman} in 2013, called interior and exterior polynomials respectively. Let $G$ be a $(k+1)$-edge connected graph of order $n$ and size $m$, and let $g=m-n+1$. Guan et al. (2023) \cite{Guan} obtained the coefficients of $T_G(1,y)$: \[[y^j]T_G(1,y)=\binom{m-j-1}{n-2} \text{ for } g-k\leq j\leq g,\] which was deduced from coefficients of the exterior polynomial of polymatroids. Recently, Chen and Guo (2025) \cite{Chen} further obtained \[[y^j]T_G(1,y)=\binom{m-j-1}{n-2}-\sum_{i=k+1}^{g-j}\binom{m-j-i-1}{n-2}|\mathcal{EC}_i(G)|\] for $g-3(k+1)/2< j\leq g$, where $\mathcal{EC}_i(G)$ denotes the set of all minimal edge cuts with $i$ edges. In this paper, for any matroid $M=(X,rk)$ we first obtain \[[y^j]T_M(1,y)=\sum_{t=j}^{|X|-r}(-1)^{t-j}\binom{t}{j}σ_{r+t}(M),\] where $σ_{r+t}(M)$ denotes the number of spanning sets with $r+t$ elements in $M$ and $r=rk(M)$. Moveover, the expression of $[x^i]T_M(x,1)$ is obtained immediately from the duality of the Tutte polynomial. As applications of our results, we generalize the two aforementioned results on graphs to the setting of matroids. This not only resolves two open problems posed by Chen and Guo in \cite{Chen} but also provides a purely combinatorial proof that is significantly simpler than their original proofs.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Classical Yang-Baxter equations and Nijenhuis operators for Lie algebras
Authors:
Haiying Li,
Tianshui Ma
Abstract:
In this paper the conditions that when a Lie algebra is Nijenhuis are investigated. Furthermore all the Nijenhuis operators on $\mathfrak{sl}_2$ under the standard Cartan-Weyl basis are given. On the other hand, the notion of classical $P$-Nijenhuis Yang-Baxter equations is introduced by means of the bialgebraic theory for Nijenhuis Lie algebras.
In this paper the conditions that when a Lie algebra is Nijenhuis are investigated. Furthermore all the Nijenhuis operators on $\mathfrak{sl}_2$ under the standard Cartan-Weyl basis are given. On the other hand, the notion of classical $P$-Nijenhuis Yang-Baxter equations is introduced by means of the bialgebraic theory for Nijenhuis Lie algebras.
△ Less
Submitted 26 June, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
Stability conditions on the canonical line bundle of $\mathbb{P}^3$
Authors:
Tianle Mao
Abstract:
We study the space of stability conditions on the total space of the canonical line bundle over the three dimensional projective space. We construct a family of geometric stability conditions and some subset of the boudary of them, which are algebraic. We also use spherical twists to construct some other stability conditions.
We study the space of stability conditions on the total space of the canonical line bundle over the three dimensional projective space. We construct a family of geometric stability conditions and some subset of the boudary of them, which are algebraic. We also use spherical twists to construct some other stability conditions.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Nijenhuis operators and mock-Lie bialgebras
Authors:
Tianshui Ma,
Sami Mabrouk,
Abdenacer Makhlouf,
Feiyan Song
Abstract:
A Nijenhuis mock-Lie algebra is a mock-Lie algebra equipped with a Nijenhuis operator. The purpose of this paper is to extend the well-known results about Nijenhuis mock-Lie algebras to the realm of mock-Lie bialgebras. It aims to characterize Nijenhuis mock-Lie bialgebras by generalizing the concepts of matched pairs and Manin triples of mock-Lie algebras to the context of Nijenhuis mock-Lie alge…
▽ More
A Nijenhuis mock-Lie algebra is a mock-Lie algebra equipped with a Nijenhuis operator. The purpose of this paper is to extend the well-known results about Nijenhuis mock-Lie algebras to the realm of mock-Lie bialgebras. It aims to characterize Nijenhuis mock-Lie bialgebras by generalizing the concepts of matched pairs and Manin triples of mock-Lie algebras to the context of Nijenhuis mock-Lie algebras. Moreover, we discuss formal deformation theory and explore infinitesimal formal deformations of Nijenhuis mock-Lie algebras, demonstrating that the associated cohomology corresponds to a deformation cohomology. Moreover, we define abelian extensions of Nijenhuis mock-Lie algebras and show that equivalence classes of such extensions are linked to cohomology groups. The coboundary case leads to the introduction of an admissible mock-Lie-Yang-Baxter equation (mLYBe) in Nijenhuis mock-Lie algebras, for which the antisymmetric solutions give rise to Nijenhuis mock-Lie bialgebras. Furthermore, the notion of $\mathcal O$-operator on Nijenhuis mock-Lie algebras is introduced and connected to mock-Lie-Yang-Baxter equation.
△ Less
Submitted 19 January, 2025;
originally announced January 2025.
-
Coordinated vehicle dispatching and charging scheduling for an electric ride-hailing fleet under charging congestion and dynamic prices
Authors:
Tai-Yu Ma,
Richard D. Connors,
Francesco Viti
Abstract:
Effective utilization of charging station capacity plays an important role in enhancing the profitability of ride-hailing systems using electric vehicles. Existing studies assume constant energy prices and uncapacitated charging stations or do not explicitly consider vehicle queueing at charging stations, resulting in over-optimistic charging infrastructure utilization. In this study, we develop a…
▽ More
Effective utilization of charging station capacity plays an important role in enhancing the profitability of ride-hailing systems using electric vehicles. Existing studies assume constant energy prices and uncapacitated charging stations or do not explicitly consider vehicle queueing at charging stations, resulting in over-optimistic charging infrastructure utilization. In this study, we develop a dynamic charging scheduling method (named CongestionAware) that anticipates vehicles' energy needs and coordinates their charging operations with real-time energy prices to avoid long waiting time at charging stations and increase the total profit of the system. A sequential mixed integer linear programming model is proposed to devise vehicles' day-ahead charging plans based on their experienced charging waiting times and energy consumption. The obtained charging plans are adapted within the day in response to vehicles' energy needs and charging station congestion. The developed charging policy is tested using NYC yellow taxi data in a Manhattan-like study area with a fleet size of 100 vehicles given the scenarios of 3000 and 4000 customers per day. The computational results show that our CongestionAware policy outperforms different benchmark policies with up to +15.06% profit and +19.16% service rate for 4000 customers per day. Sensitivity analysis is conducted with different system parameters and managerial insights are discussed.
△ Less
Submitted 10 April, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
Estimation of the Adjusted Standard-deviatile for Extreme Risks
Authors:
Haoyu Chen,
Tiantian Mao,
Fan Yang
Abstract:
In this paper, we modify the Bayes risk for the expectile, the so-called variantile risk measure, to better capture extreme risks. The modified risk measure is called the adjusted standard-deviatile. First, we derive the asymptotic expansions of the adjusted standard-deviatile. Next, based on the first-order asymptotic expansion, we propose two efficient estimation methods for the adjusted standar…
▽ More
In this paper, we modify the Bayes risk for the expectile, the so-called variantile risk measure, to better capture extreme risks. The modified risk measure is called the adjusted standard-deviatile. First, we derive the asymptotic expansions of the adjusted standard-deviatile. Next, based on the first-order asymptotic expansion, we propose two efficient estimation methods for the adjusted standard-deviatile at intermediate and extreme levels. By using techniques from extreme value theory, the asymptotic normality is proved for both estimators. Simulations and real data applications are conducted to examine the performance of the proposed estimators.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Estimation beyond Missing (Completely) at Random
Authors:
Tianyi Ma,
Kabir A. Verchand,
Thomas B. Berrett,
Tengyao Wang,
Richard J. Samworth
Abstract:
We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary $ε$-contamination model. For mean estimation with respect to squared Euclidean error loss, we show that the minimax quantiles decompose as a sum of the corresponding minimax quantil…
▽ More
We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary $ε$-contamination model. For mean estimation with respect to squared Euclidean error loss, we show that the minimax quantiles decompose as a sum of the corresponding minimax quantiles under a heterogeneous, MCAR assumption, and a robust error term, depending on $ε$, that reflects the additional error incurred by departure from MCAR.
We next introduce natural classes of realisable $ε$-contamination models, where an MCAR version of a base distribution $P$ is contaminated by an arbitrary missing not at random (MNAR) version of $P$. These classes are rich enough to capture various notions of biased sampling and sensitivity conditions, yet we show that they enjoy improved minimax performance relative to our earlier arbitrary contamination classes for both parametric and nonparametric classes of base distributions. For instance, with a univariate Gaussian base distribution, consistent mean estimation over realisable $ε$-contamination classes is possible even when $ε$ and the proportion of missingness converge (slowly) to 1. Finally, we extend our results to the setting of departures from missing at random (MAR) in normal linear regression with a realisable missing response.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Admissible Yang-Baxter equation for Nijenhuis perm algebras
Authors:
Tianshui Ma,
Feiyan Song
Abstract:
In this paper, on one hand, based on the classical perm Yang-Baxter equation, we investigate under what conditions a perm algebra must be a Nijenhuis perm algebra. On the other hand, we derive the compatible conditions between classical perm Yang-Baxter equation and Nijenhuis operator by a class of Nijenhuis perm bialgebras.
In this paper, on one hand, based on the classical perm Yang-Baxter equation, we investigate under what conditions a perm algebra must be a Nijenhuis perm algebra. On the other hand, we derive the compatible conditions between classical perm Yang-Baxter equation and Nijenhuis operator by a class of Nijenhuis perm bialgebras.
△ Less
Submitted 27 October, 2024; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Representations of non-finitely graded Heisenberg-Virasoro type Lie algebras
Authors:
Chunguang Xia,
Tianyu Ma,
Wei Wang,
Mingjing Zhang
Abstract:
We construct and study non-finitely graded Lie algebras $\mathcal{HV}(a,b;ε)$ related to Heisenberg-Virasoro type Lie algebras, where $a,b$ are complex numbers, and $ε= \pm 1$. Using combinatorial techniques, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{HV}(a,b;ε)$. It turns out that these modules are more varied and complex than those over non-fini…
▽ More
We construct and study non-finitely graded Lie algebras $\mathcal{HV}(a,b;ε)$ related to Heisenberg-Virasoro type Lie algebras, where $a,b$ are complex numbers, and $ε= \pm 1$. Using combinatorial techniques, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{HV}(a,b;ε)$. It turns out that these modules are more varied and complex than those over non-finitely graded Virasoro algebras, and in particular admit infinitely many free parameters if $b=1$ and $ε=-1$. Meanwhile, we also determine the simplicity and isomorphism classes of these modules.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Maximum Ideal Likelihood Estimator: An New Estimation and Inference Framework for Latent Variable Models
Authors:
Yizhou Cai,
Ting Fung Ma
Abstract:
In this paper, a new estimation framework, Maximum Ideal Likelihood Estimator (MILE), is proposed for general parametric models with latent variables and missing values. Instead of focusing on the marginal likelihood of the observed data as in many traditional approaches, the MILE directly considers the joint distribution of the complete dataset by treating the latent variables as parameters (the…
▽ More
In this paper, a new estimation framework, Maximum Ideal Likelihood Estimator (MILE), is proposed for general parametric models with latent variables and missing values. Instead of focusing on the marginal likelihood of the observed data as in many traditional approaches, the MILE directly considers the joint distribution of the complete dataset by treating the latent variables as parameters (the ideal likelihood). The MILE framework remains valid, even when traditional methods are not applicable, e.g., non-finite conditional expectation of the marginal likelihood function, via different optimization techniques and algorithms. The statistical properties of the MILE, such as the asymptotic equivalence to the Maximum Likelihood Estimation (MLE), are proved under some mild conditions, which facilitate statistical inference and prediction. Simulation studies illustrate that MILE outperforms traditional approaches with computational feasibility and scalability using existing and our proposed algorithms.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform
Authors:
Tong Mao,
Jonathan W. Siegel,
Jinchao Xu
Abstract:
Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a v…
▽ More
Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
High-probability minimax lower bounds
Authors:
Tianyi Ma,
Kabir A. Verchand,
Richard J. Samworth
Abstract:
The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the…
▽ More
The minimax risk is often considered as a gold standard against which we can compare specific statistical procedures. Nevertheless, as has been observed recently in robust and heavy-tailed estimation problems, the inherent reduction of the (random) loss to its expectation may entail a significant loss of information regarding its tail behaviour. In an attempt to avoid such a loss, we introduce the notion of a minimax quantile, and seek to articulate its dependence on the quantile level. To this end, we develop high-probability variants of the classical Le Cam and Fano methods, as well as a technique to convert local minimax risk lower bounds to lower bounds on minimax quantiles. To illustrate the power of our framework, we deploy our techniques on several examples, recovering recent results in robust mean estimation and stochastic convex optimisation, as well as obtaining several new results in covariance matrix estimation, sparse linear regression, nonparametric density estimation and isotonic regression. Our overall goal is to argue that minimax quantiles can provide a finer-grained understanding of the difficulty of statistical problems, and that, in wide generality, lower bounds on these quantities can be obtained via user-friendly tools.
△ Less
Submitted 4 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus
Authors:
Ting Ma,
Lifei Wang,
Huanyu Yang
Abstract:
In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for…
▽ More
In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $α-δ$ and the temporal order of $α/{6}-δ$ in $\mathcal C^{-α}$ for $α\in(0,1/3)$ and $δ>0$ arbitrarily small.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Representations of non-finitely graded Lie algebras related to Virasoro algebra
Authors:
Chunguang Xia,
Tianyu Ma,
Xiao Dong,
Mingjing Zhang
Abstract:
In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and is…
▽ More
In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and isomorphism classes of these modules.
△ Less
Submitted 3 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
On eigenvalues and eigenfunctions of the operators defining multidimensional scaling on some symmetric spaces
Authors:
Tianyu Ma,
Eugene Stepanov
Abstract:
We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show th…
▽ More
We study asymptotics of the eigenvalues and eigenfunctions of the operators used for constructing multidimensional scaling (MDS) on compact connected Riemannian manifolds, in particular on closed connected symmetric spaces. They are the limits of eigenvalues and eigenvectors of squared distance matrices of an increasing sequence of finite subsets covering the space densely in the limit. We show that for products of spheres and real projective spaces, the numbers of positive and negative eigenvalues of these operators are both infinite. We also find a class of spaces (namely $\mathbb{RP}^n$ with odd $n>1$) whose MDS defining operators are not trace class, and original distances cannot be reconstructed from the eigenvalues and eigenfunctions of these operators.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Optimized electrified meeting-point-based feeder bus services with capacitated charging stations and partial recharges
Authors:
Tai-Yu Ma,
Yumeng Fang,
Richard D. Connors,
Francesco Viti,
Haruko Nakao
Abstract:
Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid m…
▽ More
Meeting-point-based feeder services using EVs have good potential to achieve an efficient and clean on-demand mobility service. However, customer-to-meeting-point, vehicle routing, and charging scheduling need to be jointly optimized to achieve the best system performance. To this aim, we assess the effect of different system parameters and configure them based on our previously developed hybrid metaheuristic algorithm. A set of test instances based on morning peak hour commuting scenarios between the cities of Arlon and Luxembourg are used to evaluate the impact of the set parameters on the optimal solutions. The experimental results suggest that higher meeting point availability can achieve better system performance. By jointly configuring different system parameters, the overall system performance can be significantly improved (-10.8% total kilometers traveled by vehicles compared to the benchmark) to serve all requests. Our experimental results show that the meeting-point-based system can reduce up to 70.2% the fleet size, 6.4% the in-vehicle travel time and 49.4% the kilometers traveled when compared to a traditional door-to-door dial-a-ride system.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
A hybrid metaheuristic to optimize electric first-mile feeder services with charging synchronization constraints and customer rejections
Authors:
Tai-Yu Ma,
Yumeng Fang,
Richard D. Connors,
Francesco Viti,
Haruko Nakao
Abstract:
This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-int…
▽ More
This paper addresses the on-demand meeting-point-based feeder electric bus routing and charging scheduling problem under charging synchronization constraints. The problem considered exhibits the structure of the location routing problem, which is more difficult to solve than many electric vehicle routing problems with capacitated charging stations. We propose to model the problem using a mixed-integer linear programming approach based on a layered graph structure. An efficient hybrid metaheuristic solution algorithm is proposed. A mixture of random and greedy partial charging scheduling strategies is used to find feasible charging schedules under the synchronization constraints. The algorithm is tested on instances with up to 100 customers and 49 bus stops/meeting points. The results show that the proposed algorithm provides near-optimal solutions within less one minute on average compared with the best solutions found by a mixed-integer linear programming solver set with a 4-hour computation time limit. A case study on a larger sized case with 1000 customers and 111 meeting points shows the proposed method is applicable to real-world situations.
△ Less
Submitted 8 February, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation
Authors:
Juncai He,
Tong Mao,
Jinchao Xu
Abstract:
In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial…
▽ More
In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$^k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$^k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$^k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$^k$ networks for shallow networks, we reveal that deep ReLU$^k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$^k$ activation function. This finding demonstrates the adaptability of deep ReLU$^k$ networks in approximating functions within various variation spaces.
△ Less
Submitted 10 January, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Distance spectral conditions for $ID$-factor-critical and fractional $[a, b]$-factor of graphs
Authors:
Tingyan Ma,
Ligong Wang
Abstract:
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vert…
▽ More
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. A graph is $ID$-factor-critical if for every independent set $I$ of $G$ whose size has the same parity as $|V(G)|$, $G-I$ has a perfect matching. For two positive integers $a$ and $b$ with $a\leq b$, let $h$: $E(G)\rightarrow [0, 1]$ be a function on $E(G)$ satisfying $a\leq\sum _{e\in E_{G}(v_{i})}h(e)\leq b$ for any vertex $v_{i}\in V(G)$. Then the spanning subgraph with edge set $E_{h}$, denoted by $G[E_{h}]$, is called a fractional $[a, b]$-factor of $G$ with indicator function $h$, where $E_{h}=\{e\in E(G)\mid h(e)>0\}$ and $E_{G}(v_{i})=\{e\in E(G)\mid e$ is incident with $v_{i}$ in $G$\}. A graph is defined as a fractional $[a, b]$-deleted graph if for any $e\in E(G)$, $G-e$ contains a fractional $[a, b]$-factor. For any integer $k\geq 1$, a graph has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper, we firstly give a distance spectral radius condition of $G$ to guarantee that $G$ is $ID$-factor-critical. Furthermore, we provide sufficient conditions in terms of distance spectral radius and distance signless Laplacian spectral radius for a graph to contain a fractional $[a, b]$-factor, fractional $[a, b]$-deleted-factor and $k$-factor.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
When Leibniz algebras are Nijenhuis?
Authors:
Haiying Li,
Tianshui Ma,
Shuanhong Wang
Abstract:
Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leib…
▽ More
Leibniz algebras can be seen as a ``non-commutative" analogue of Lie algebras. Nijenhuis operators on Leibniz algebras introduced by Cariñena, Grabowski, and Marmo in [J. Phys. A: Math. Gen. 37(2004)] are (1, 1)-tensors with vanishing Nijenhuis torsion. Recently triangular Leibniz bialgebras were introduced by Tang and Sheng in [J. Noncommut. Geom. 16(2022)] via the twisting theory of twilled Leibniz algebras. In this paper we find that Leibniz algebras are very closely related to Nijenhuis operators, and prove that a triangular symplectic Leibniz bialgebra together with a dual triangular structure must possess Nijenhuis operators, which makes it possible to study the applications of Nijehhuis operators from the perspective of Leibniz algebras. At the same time, we regain the classical Leibniz Yang-Baxter equation by using the tensor form of classical $r$-matrics. At last we give the classification of triangular Leibniz bialgebras of low dimensions.
△ Less
Submitted 17 June, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.
-
Infinitesimal (BiHom-)bialgebras of any weight (II): Representations
Authors:
Tianshui Ma,
Abdenacer Makhlouf
Abstract:
The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-copr…
▽ More
The aim of this paper is to investigate representation theory of infinitesimal (BiHom-)bialgebras of any weight $ł$ (abbr. $ł$-inf(BH)-bialgebras). Firstly, inspired by the well-known Majid-Radford's bosonization theory in Hopf algebra theory, we present a class of $ł$-inf(BH)-bialgebras, named $ł$-inf(BH)-biproduct bialgebras, consisting of an inf(BH)-product algebra structure and an inf(BH)-coproduct coalgebra structure, which induces a structure of a $ł$-inf(BH)-Hopf bimodule over a $ł$-inf(BH)-bialgebra. Secondly, we explore relationships among $ł$-inf(BH)-Hopf bimodules, $ł$-Rota-Baxter (BiHom-)bimodules, (BiHom-)dendriform bimodules and (BiHom-)pre-Lie bimodules. Finally, we provide two kinds of general Gelfand-Dorfman theorems related to BiHom-Novikov algebras.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Fundamentals of thermoelasticity for curved beams
Authors:
Marcio A. Jorge Silva,
To Fu Ma
Abstract:
The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-str…
▽ More
The purpose of this paper is twofold. Firstly, we conduct an in-depth analysis of mathematical modeling concerning thermal-mechanical curved beams, by taking into consideration three primary forces widely accepted in the literature: axial load, shear force, and bending moment. Additionally, we examine their appropriate thermal couplings, shedding light on the intricate interplay between stress-strain relationships and temperature variations. This analysis is situated within the well-recognized context of the Bresse governing model for arched beams. Secondly, drawing upon distinguished constitutive laws for heat flux of conduction, we compile a comprehensive list of thermoelastic curved beam systems in various scenarios. We introduce new categories of problems that exhibit specific features from the thermal point of view.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Convolution formulas for multivariate arithmetic Tutte polynomials
Authors:
Tianlong Ma,
Xian'an Jin,
Weiling Yang
Abstract:
The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtai…
▽ More
The multivariate arithmetic Tutte polynomial of arithmetic matroids is a generalization of the multivariate Tutte polynomial of matroids. In this note, we give the convolution formulas for the multivariate arithmetic Tutte polynomial of the product of two arithmetic matroids. In particular, the convolution formulas for the multivariate arithmetic Tutte polynomial of an arithmetic matroid are obtained. Applying our results, several known convolution formulas including [5, Theorem 10.9 and Corollary 10.10] and [1, Theorems 1 and 4] are proved by a purely combinatorial proof. The proofs presented here are significantly shorter than the previous ones. In addition, we obtain a convolution formula for the characteristic polynomial of an arithmetic matroid.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
On the maximum local mean order of sub-k-trees of a k-tree
Authors:
Zhuo Li,
Tianlong Ma,
Fengming Dong,
Xian'an Jin
Abstract:
For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem:…
▽ More
For a k-tree T, a generalization of a tree, the local mean order of sub-k-trees of T is the average order of sub-k-trees of T containing a given k-clique. The problem whether the largest local mean order of a tree (i.e., a 1-tree) at a vertex always takes on at a leaf was asked by Jamison in 1984 and was answered by Wagner and Wang in 2016. In 2018, Stephens and Oellermann asked a similar problem: for any k-tree T, does the maximum local mean order of sub-k-trees containing a given k-clique occur at a k-clique that is not a major k-clique of T? In this paper, we give it an affirmative answer.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Infinitesimal (BiHom-)bialgebras of any weight (I): Basic definitions and properties
Authors:
Tianshui Ma,
Abdenacer Makhlouf
Abstract:
The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introd…
▽ More
The purpose of this paper is to introduce and study $λ$-infinitesimal BiHom-bialgebras (abbr. $ł$-infBH-bialgebra) and some related structures. They can be seen as an extension of $ł$-infinitesimal bialgebras considered by Ebrahimi-Fard, including Joni and Rota's infinitesimal bialgebras as well as Loday and Ronco's infinitesimal bialgebras, and including also infinitesimal BiHom-bialgebras introduced by Liu, Makhlouf, Menini, Panaite. In this paper, we provide various relevant constructions and new concepts. Two ways are provided for a unitary (resp. counitary) algebra (coalgebra) to be a $ł$-infBH-bialgebra and the notion of $ł$-infBH-Hopf module is introduced and discussed. It is proved, in connexion with nonhomogeneous (co)associative BiHom-Yang-Baxter equation, that every (left BiHom-)module (resp. comodule) over a (anti-)quasitriangular (resp. (anti-)coquasitriangular) $ł$-infBH-bialgebra carries a structure of $ł$-infBH-Hopf module. Moreover, two approaches to construct BiHom-pre-Lie (co)algebras from $ł$-infBH-bialgebras are presented.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Tractability of approximation by general shallow networks
Authors:
Hrushikesh Mhaskar,
Tong Mao
Abstract:
In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form…
▽ More
In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)dτ( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.
△ Less
Submitted 10 December, 2023; v1 submitted 6 August, 2023;
originally announced August 2023.
-
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Authors:
Kaiyue Wen,
Zhiyuan Li,
Tengyu Ma
Abstract:
Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investi…
▽ More
Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
△ Less
Submitted 22 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Extreme coefficients of multiplicity Tutte polynomials
Authors:
Xian'an Jin,
Tianlong Ma,
Weiling Yang
Abstract:
The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme…
▽ More
The multiplicity Tutte polynomial, which includes the arithmetic Tutte polynomial, is a generalization of the classical Tutte polynomial of matroids. In this paper, we obtain an expression of the general coefficient and the expressions of six extreme coefficients of multiplicity Tutte polynomials. In particular, an expression of the general coefficient and the expressions of corresponding extreme coefficients of classical Tutte polynomial of matroids are deduced.
△ Less
Submitted 5 February, 2024; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Authors:
Hong Liu,
Zhiyuan Li,
David Hall,
Percy Liang,
Tengyu Ma
Abstract:
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped St…
▽ More
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss.
△ Less
Submitted 5 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Rota-Baxter operators on cocommutative Hopf algebras and Hopf braces
Authors:
Huihui Zheng,
Li Guo,
Tianshui Ma,
Liangyun Zhang
Abstract:
This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the…
▽ More
This paper studies the relationship of Rota-Baxter operators on cocommutative Hopf algebras with Hopf braces and the Yang-Baxter equation, with emphasis on the embedding of cocommutative Hopf braces into Rota-Baxter Hopf algebras. Through Hopf braces, we establish a connection between relative Rota-Baxter operators on cocommutative Hopf algebras and bijective 1-cocycles. Finally, we introduce the notion of symmetric Hopf braces, and establish the relationship between symmetric Hopf braces and Rota-Baxter Hopf algebras.
△ Less
Submitted 25 June, 2024; v1 submitted 15 April, 2023;
originally announced April 2023.
-
The sufficient conditions for $k$-leaf-connected graphs in terms of several topological indices
Authors:
Tingyan Ma,
Ligong Wang,
Yang Hu
Abstract:
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected…
▽ More
Let $G=(V(G), E(G))$ be a graph with vertex set $V(G)$ and edge set $E(G)$. For $k\geq2$ and given any subset $S\subseteq|V(G)|$ with $|S|=k$, if a graph $G$ of order $|V(G)|\geq k+1$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T$, then the graph $G$ is a $k$-leaf-connected graph. A graph $G$ is called Hamilton-connected if any two vertices of $G$ are connected by a Hamilton path. Based on the definitions of $k$-leaf-connected and Hamilton-connected, we known that a graph is $2$-leaf-connected if and only if it is Hamilton-connected. During the past decades, there have been many results of sufficient conditions for Hamilton-connected with respect to topological indices. In this paper, we present sufficient conditions for a graph $G$ to be $k$-leaf-connected in terms of the Zagreb index, the reciprocal degree distance or the hyper-Zagreb index. Furthermore, we use the first Zagreb index and hyper-Zagreb index of the complement graph $\overline{G}$ to give sufficient conditions for a graph $G$ to be $k$-leaf-connected.
△ Less
Submitted 7 April, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Extremal trees, unicyclic and bicyclic graphs with respect to $p$-Sombor spectral radii
Authors:
Ruiling Zheng,
Tianlong Ma,
Xian'an Jin
Abstract:
For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of…
▽ More
For a graph $G=(V,E)$ and $v_{i}\in V$, denote by $d_{v_{i}}$ (or $d_{i}$ for short) the degree of vertex $v_{i}$. The $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$ ($p\neq0$) of a graph $G$ is a square matrix, where the $(i,j)$-entry is equal to $\displaystyle (d_{i}^{p}+d_{j}^{p})^{\frac{1}{p}}$ if the vertices $v_{i}$ and $v_{j}$ are adjacent, and 0 otherwise. The $p$-Sombor spectral radius of $G$, denoted by $\displaystyle ρ(\textbf{S}_{\textbf{p}}(G))$, is the largest eigenvalue of the $p$-Sombor matrix $\textbf{S}_{\textbf{p}}(G)$. In this paper, we consider the extremal trees, unicyclic and bicyclic graphs with respect to the $p$-Sombor spectral radii. We characterize completely the extremal graphs with the first three maximum Sombor spectral radii, which answers partially a problem posed by Liu et al. in [MATCH Commun. Math. Comput. Chem. 87 (2022) 59-87].
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Canonical curves and Kropina metrics in Lagrangian contact geometry
Authors:
T. Ma,
K. J. Flood,
V. S. Matveev,
V. Žádník
Abstract:
We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we reali…
▽ More
We present a Fefferman-type construction from Lagrangian contact to conformal structures and examine several related topics. In particular, we concentrate on describing the canonical curves and their correspondence. We show that chains and null-chains of an integrable Lagrangian contact structure are the projections of null-geodesics of the Fefferman space. Employing the Fermat principle, we realize chains as geodesics of Kropina (pseudo-Finsler) metrics. Using recent rigidity results, we show that ``sufficiently many'' chains determine the Lagrangian contact structure. Separately, we comment on Lagrangian contact structures induced by projective structures and the special case of dimension three.
△ Less
Submitted 16 June, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
On Generalization and Regularization via Wasserstein Distributionally Robust Optimization
Authors:
Qinyu Wu,
Jonathan Yu-Meng Li,
Tiantian Mao
Abstract:
Wasserstein distributionally robust optimization (DRO) has gained prominence in operations research and machine learning as a powerful method for achieving solutions with favorable out-of-sample performance. Two compelling explanations for its success are the generalization bounds derived from Wasserstein DRO and its equivalence to regularization schemes commonly used in machine learning. However,…
▽ More
Wasserstein distributionally robust optimization (DRO) has gained prominence in operations research and machine learning as a powerful method for achieving solutions with favorable out-of-sample performance. Two compelling explanations for its success are the generalization bounds derived from Wasserstein DRO and its equivalence to regularization schemes commonly used in machine learning. However, existing results on generalization bounds and regularization equivalence are largely limited to settings where the Wasserstein ball is of a specific type, and the decision criterion takes certain forms of expected functions. In this paper, we show that generalization bounds and regularization equivalence can be obtained in a significantly broader setting, where the Wasserstein ball is of a general type and the decision criterion accommodates any form, including general risk measures. This not only addresses important machine learning and operations management applications but also expands to general decision-theoretical frameworks previously unaddressed by Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a by-product, we show that Wasserstein DRO coincides with the recent max-sliced Wasserstein DRO for {\it any} decision criterion under affine decision rules -- resulting in both being efficiently solvable as convex programs via our general regularization results. These general assurances provide a strong foundation for expanding the application of Wasserstein DRO across diverse domains of data-driven decision problems.
△ Less
Submitted 19 December, 2024; v1 submitted 12 December, 2022;
originally announced December 2022.
-
How Does Sharpness-Aware Minimization Minimize Sharpness?
Authors:
Kaiyue Wen,
Tengyu Ma,
Zhiyuan Li
Abstract:
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient…
▽ More
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.
△ Less
Submitted 5 January, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
An improvement of sufficient condition for $k$-leaf-connected graphs
Authors:
Tingyan Ma,
Guoyan Ao,
Ruifang Liu,
Ligong Wang,
Yang Hu
Abstract:
For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be…
▽ More
For integer $k\geq2,$ a graph $G$ is called $k$-leaf-connected if $|V(G)|\geq k+1$ and given any subset $S\subseteq V(G)$ with $|S|=k,$ $G$ always has a spanning tree $T$ such that $S$ is precisely the set of leaves of $T.$ Thus a graph is $2$-leaf-connected if and only if it is Hamilton-connected. In this paper, we present a best possible condition based upon the size to guarantee a graph to be $k$-leaf-connected, which not only improves the results of Gurgel and Wakabayashi [On $k$-leaf-connected graphs, J. Combin. Theory Ser. B 41 (1986) 1-16] and Ao, Liu, Yuan and Li [Improved sufficient conditions for $k$-leaf-connected graphs, Discrete Appl. Math. 314 (2022) 17-30], but also extends the result of Xu, Zhai and Wang [An improvement of spectral conditions for Hamilton-connected graphs, Linear Multilinear Algebra, 2021]. Our key approach is showing that an $(n+k-1)$-closed non-$k$-leaf-connected graph must contain a large clique if its size is large enough. As applications, sufficient conditions for a graph to be $k$-leaf-connected in terms of the (signless Laplacian) spectral radius of $G$ or its complement are also presented.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Dynamic charging management for electric vehicle demand responsive transport
Authors:
Tai-Yu Ma
Abstract:
With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stag…
▽ More
With the climate change challenges, transport network companies started to electrify their fleet to reduce CO2 emissions. However, such an ecological transition brings new research challenges for dynamic electric fleet charging management under uncertainty. In this study, we address the dynamic charging scheduling management of shared ride-hailing services with public charging stations. A two-stage charging scheduling optimization approach under a rolling horizon framework is proposed to minimize the overall charging operational costs of the fleet, including vehicles' access times, charging times, and waiting times, by anticipating future public charging station availability. The charging station occupancy prediction is based on a hybrid LSTM (Long short-term memory) network approach and integrated into the proposed online vehicle-charger assignment. The proposed methodology is applied to a realistic simulation study in the city of Dundee, UK. The numerical studies show that the proposed approach can reduce the total charging waiting times of the fleet by 48.3% and the total charged the amount of energy of the fleet by 35.3% compared to a need-based charging reference policy.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Favorite Downcrossing Sites of One-Dimensional Simple Random Walk
Authors:
Chen-Xu Hao,
Ze-Chun Hu,
Ting Ma,
Renming Song
Abstract:
Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with p…
▽ More
Random walk is a very important Markov process and has important applications in many fields.For a one-dimensional simple symmetric random walk $(S_n)$, a site $x$ is called a favorite downcrossing site at time $n$ if its downcrossing local time at time $n$ achieves the maximum among all sites. In this paper, we study the cardinality of the favorite downcrossing site set, and will show that with probability 1 there are only finitely many times at which there are at least four favorite downcrossing sites and three favorite downcrossing sites occurs infinitely often. Some related open questions will be introduced.
△ Less
Submitted 29 November, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
A General Wasserstein Framework for Data-driven Distributionally Robust Optimization: Tractability and Applications
Authors:
Jonathan Yu-Meng Li,
Tiantian Mao
Abstract:
Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a…
▽ More
Data-driven distributionally robust optimization is a recently emerging paradigm aimed at finding a solution that is driven by sample data but is protected against sampling errors. An increasingly popular approach, known as Wasserstein distributionally robust optimization (DRO), achieves this by applying the Wasserstein metric to construct a ball centred at the empirical distribution and finding a solution that performs well against the most adversarial distribution from the ball. In this paper, we present a general framework for studying different choices of a Wasserstein metric and point out the limitation of the existing choices. In particular, while choosing a Wasserstein metric of a higher order is desirable from a data-driven perspective, given its less conservative nature, such a choice comes with a high price from a robustness perspective - it is no longer applicable to many heavy-tailed distributions of practical concern. We show that this seemingly inevitable trade-off can be resolved by our framework, where a new class of Wasserstein metrics, called coherent Wasserstein metrics, is introduced. Like Wasserstein DRO, distributionally robust optimization using the coherent Wasserstein metrics, termed generalized Wasserstein distributionally robust optimization (GW-DRO), has all the desirable performance guarantees: finite-sample guarantee, asymptotic consistency, and computational tractability. The worst-case expectation problem in GW-DRO is in general a nonconvex optimization problem, yet we provide new analysis to prove its tractability without relying on the common duality scheme. Our framework, as shown in this paper, offers a fruitful opportunity to design novel Wasserstein DRO models that can be applied in various contexts such as operations management, finance, and machine learning.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Rota-Baxter Lie bialgebras, classical Yang-Baxter equations and special L-dendriform bialgebras
Authors:
Chengming Bai,
Li Guo,
Guilai Liu,
Tianshui Ma
Abstract:
We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE)…
▽ More
We establish a bialgebra structure on Rota-Baxter Lie algebras following the Manin triple approach to Lie bialgebras. Explicitly, Rota-Baxter Lie bialgebras are characterized by generalizing matched pairs of Lie algebras and Manin triples of Lie algebras to the context of Rota-Baxter Lie algebras. The coboundary case leads to the introduction of the admissible classical Yang-Baxter equation (CYBE) in Rota-Baxter Lie algebras, for which the antisymmetric solutions give rise to Rota-Baxter Lie bialgebras. The notions of $\mathcal{O}$-operators on Rota-Baxter Lie algebras and Rota-Baxter pre-Lie algebras are introduced to produce antisymmetric solutions of the admissible CYBE. Furthermore, extending the well-known property that a Rota-Baxter Lie algebra of weight zero induces a pre-Lie algebra, the Rota-Baxter Lie bialgebra of weight zero induces a bialgebra structure of independent interest, namely the special L-dendriform bialgebra, which is equivalent to a Lie group with a left-invariant flat pseudo-metric in geometry. This induction is also characterized as the inductions between the corresponding Manin triples and matched pairs. Finally, antisymmetric solutions of the admissible CYBE in a Rota-Baxter Lie algebra of weight zero give special L-dendriform bialgebras. In particular, both Rota-Baxter algebras of weight zero and Rota-Baxter pre-Lie algebras of weight zero can be used to construct special L-dendriform algebras.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Double crossed biproducts and related structures
Authors:
Tianshui Ma,
Jie Li,
Haiyan Yang,
Shuanhong Wang
Abstract:
Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed prod…
▽ More
Let $H$ be a bialgebra. Let $σ: H\otimes H\to A$ be a linear map, where $A$ is a left $H$-comodule coalgebra, and an algebra with a left $H$-weak action $\triangleright$. Let $τ: H\otimes H\to B$ be a linear map, where $B$ is a right $H$-comodule coalgebra, and an algebra with a right $H$-weak action $\triangleleft$. In this paper, we improve the necessary conditions for the two-sided crossed product algebra $A\#^σ H~{^τ\#} B$ and the two-sided smash coproduct coalgebra $A\times H\times B$ to form a bialgebra (called double crossed biproduct) such that the condition $b_{[1]}\triangleright a_0\otimes b_{[0]}\triangleleft a_{-1}=a\otimes b$ in Majid's double biproduct (or double-bosonization) is one of the necessary conditions. On the other hand, we provide a more general two-sided crossed product algebra structure via Brzezński's crossed product and give some applications.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Tight toughness, isolated toughness and binding number bounds for the $\{K_2,C_n\}$-factors
Authors:
Xiaxia Guan,
Tianlong Ma,
Chao Shi
Abstract:
The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), http…
▽ More
The $\{K_2,C_n\}$-factor of a graph is a spanning subgraph whose each component is either $K_2$ or $C_n$. In this paper, a sufficient condition with regard to tight toughness, isolated toughness and binding number bounds to guarantee the existence of the $\{K_2,C_{2i+1}| i\geq 2 \}$-factor for any graph is obtained, which answers a problem due to Gao and Wang (J. Oper. Res. Soc. China (2021), https://doi.org/10.1007/s40305-021-00357-6).
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
A direct and elementary proof of the well-definedness of the interior and exterior polynomials of hypergraphs
Authors:
Xiaxia Guan,
Xian'an Jin,
Tianlong Ma
Abstract:
T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polyt…
▽ More
T. Kálmán (A version of Tutte's polynomial for hypergraphs, Adv. Math. 244 (2013) 823-873.) introduced the interior and exterior polynomials which are generalizations of the Tutte polynomial $T(x,y)$ on plane points $(1/x,1)$ and $(1,1/y)$ to hypergraphs. The two polynomials are defined under a fixed ordering of hyperedges, and are proved to be independent of the ordering using techniques of polytopes. In this paper, similar to the Tutte's original proof we provide a direct and elementary proof for the well-definedness of the interior and exterior polynomials of hypergraphs.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Transposed BiHom-Poisson algebras
Authors:
Tianshui Ma,
Bei Li
Abstract:
In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras b…
▽ More
In this paper, we introduce the concept of transposed BiHom-Poisson (abbr. TBP) algebras which can be constructed by the BiHom-Novikov-Poisson algebras. Several useful identities for TBP algebras are provided. We also prove that the tensor product of two (T)BP algebras are closed. The notions of BP 3-Lie algebras and TBP 3-Lie algebras are presented and TBP algebras can induce TBP 3-Lie algebras by two approaches. Finally, we give some examples for the TBP algebras of dimension 2.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.