-
Exploration from a Primal-Dual Lens: Value-Incentivized Actor-Critic Methods for Sample-Efficient Online RL
Authors:
Tong Yang,
Bo Dai,
Lin Xiao,
Yuejie Chi
Abstract:
Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practi…
▽ More
Online reinforcement learning (RL) with complex function approximations such as transformers and deep neural networks plays a significant role in the modern practice of artificial intelligence. Despite its popularity and importance, balancing the fundamental trade-off between exploration and exploitation remains a long-standing challenge; in particular, we are still in lack of efficient and practical schemes that are backed by theoretical performance guarantees. Motivated by recent developments in exploration via optimistic regularization, this paper provides an interpretation of the principle of optimism through the lens of primal-dual optimization. From this fresh perspective, we set forth a new value-incentivized actor-critic (VAC) method, which optimizes a single easy-to-optimize objective integrating exploration and exploitation -- it promotes state-action and policy estimates that are both consistent with collected data transitions and result in higher value functions. Theoretically, the proposed VAC method has near-optimal regret guarantees under linear Markov decision processes (MDPs) in both finite-horizon and infinite-horizon settings, which can be extended to the general function approximation setting under appropriate assumptions.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Multiplicative and mining property for stability numbers of graphs
Authors:
Metrose Metsidik,
Lixiao Xiao
Abstract:
$f$-vertex stability number $vs_f(G)=\min\{|X|: X\subseteq V(G) \enspace \text{and} \enspace f(G-X)\neq f(G)\}$, and $f$-edge stability number is defined similarly by setting $X\subseteq E(G)$. In this paper, for multiplicative and mining invariant $f$, we give some general bounds for $f$-vertex/edge stability numbers of graphs and some results about the relations between the $f$-vertex/edge stabi…
▽ More
$f$-vertex stability number $vs_f(G)=\min\{|X|: X\subseteq V(G) \enspace \text{and} \enspace f(G-X)\neq f(G)\}$, and $f$-edge stability number is defined similarly by setting $X\subseteq E(G)$. In this paper, for multiplicative and mining invariant $f$, we give some general bounds for $f$-vertex/edge stability numbers of graphs and some results about the relations between the $f$-vertex/edge stability numbers of graphs and their components.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Asymptotic representations for Spearman's footrule correlation coefficient
Authors:
Liqi Xia,
Sami Ullah,
Li Guan
Abstract:
In order to address the theoretical challenges arising from the dependence structure of ranks in Spearman's footrule correlation coefficient, we propose two asymptotic representations under the null hypothesis of independence. The first representation simplifies the dependence structure by replacing empirical distribution functions with their population counterparts. The second representation leve…
▽ More
In order to address the theoretical challenges arising from the dependence structure of ranks in Spearman's footrule correlation coefficient, we propose two asymptotic representations under the null hypothesis of independence. The first representation simplifies the dependence structure by replacing empirical distribution functions with their population counterparts. The second representation leverages the Hájek projection technique to decompose the initial form into a sum of independent components, thereby rigorously justifying asymptotic normality. Simulation study demonstrates the appropriateness of these asymptotic representations and their potential as a foundation for extending nonparametric inference techniques, such as large-sample hypothesis testing and confidence intervals.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
Whittaker modules for $U_q(\mathfrak{sl}_3)$
Authors:
Xiangqian Guo,
Xuewen Liu,
Limeng Xia
Abstract:
In this paper, we study the Whittaker modules for the quantum enveloping algebra $U_q(\sl_3)$ with respect to a fixed Whittaker function. We construct the universal Whittaker module, find all its Whittaker vectors and investigate the submodules generated by subsets of Whittaker vectors and corresponding quotient modules. We also find Whittaker vectors and determine the irreducibility of these quot…
▽ More
In this paper, we study the Whittaker modules for the quantum enveloping algebra $U_q(\sl_3)$ with respect to a fixed Whittaker function. We construct the universal Whittaker module, find all its Whittaker vectors and investigate the submodules generated by subsets of Whittaker vectors and corresponding quotient modules. We also find Whittaker vectors and determine the irreducibility of these quotient modules and show that they exhaust all irreducible Whittaker modules. Finally, we can determine all maximal submodules of the universal Whittaker module. The Whittaker model of $U_q(\sl_3)$ are quite different from that of $U_q(\sl_2)$ and finite-dimensional simple Lie algebras, since the center of our algebra is not a polynomial algebra.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
An isoperimetric type inequality in De Sitter space
Authors:
Ling Xiao
Abstract:
In this paper, we prove an optimal isoperimetric inequality for spacelike, compact, star-shaped, and $2$-convex hypersurfaces in de Sitter space.
In this paper, we prove an optimal isoperimetric inequality for spacelike, compact, star-shaped, and $2$-convex hypersurfaces in de Sitter space.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Closed minimal hypersurfaces in $\mathbb S^5$ with constant $S$ and $A_3$
Authors:
Joel Spruck,
Ling Xiao
Abstract:
In this paper, we prove that a closed minimally immersed hypersurface $M^4\subset\mathbb S^5$ with constant $S:=\sum\limits_{i=1}^4λ_i^2$ and $A_3:=\sum\limits_{i=1}^4λ_i^3$ whose scalar curvature $R_M$ is nonnegative must be isoparametric. Moreover, $S$ can only be $0, 4,$ and $12.$ That is $M^4$ is either an equatorial $4$-sphere, a clifford torus, or a Cartan's minimal hypersurface.
In this paper, we prove that a closed minimally immersed hypersurface $M^4\subset\mathbb S^5$ with constant $S:=\sum\limits_{i=1}^4λ_i^2$ and $A_3:=\sum\limits_{i=1}^4λ_i^3$ whose scalar curvature $R_M$ is nonnegative must be isoparametric. Moreover, $S$ can only be $0, 4,$ and $12.$ That is $M^4$ is either an equatorial $4$-sphere, a clifford torus, or a Cartan's minimal hypersurface.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Policy Optimization and Multi-agent Reinforcement Learning for Mean-variance Team Stochastic Games
Authors:
Junkai Hu,
Li Xia
Abstract:
We study a long-run mean-variance team stochastic game (MV-TSG), where each agent shares a common mean-variance objective for the system and takes actions independently to maximize it. MV-TSG has two main challenges. First, the variance metric is neither additive nor Markovian in a dynamic setting. Second, simultaneous policy updates of all agents lead to a non-stationary environment for each indi…
▽ More
We study a long-run mean-variance team stochastic game (MV-TSG), where each agent shares a common mean-variance objective for the system and takes actions independently to maximize it. MV-TSG has two main challenges. First, the variance metric is neither additive nor Markovian in a dynamic setting. Second, simultaneous policy updates of all agents lead to a non-stationary environment for each individual agent. Both challenges make dynamic programming inapplicable. In this paper, we study MV-TSGs from the perspective of sensitivity-based optimization. The performance difference and performance derivative formulas for joint policies are derived, which provide optimization information for MV-TSGs. We prove the existence of a deterministic Nash policy for this problem. Subsequently, we propose a Mean-Variance Multi-Agent Policy Iteration (MV-MAPI) algorithm with a sequential update scheme, where individual agent policies are updated one by one in a given order. We prove that the MV-MAPI algorithm converges to a first-order stationary point of the objective function. By analyzing the local geometry of stationary points, we derive specific conditions for stationary points to be (local) Nash equilibria, and further, strict local optima. To solve large-scale MV-TSGs in scenarios with unknown environmental parameters, we extend the idea of trust region methods to MV-MAPI and develop a multi-agent reinforcement learning algorithm named Mean-Variance Multi-Agent Trust Region Policy Optimization (MV-MATRPO). We derive a performance lower bound for each update of joint policies. Finally, numerical experiments on energy management in multiple microgrid systems are conducted.
△ Less
Submitted 12 June, 2025; v1 submitted 28 March, 2025;
originally announced March 2025.
-
PARQ: Piecewise-Affine Regularized Quantization
Authors:
Lisa Jin,
Jianhao Ma,
Zechun Liu,
Andrey Gromov,
Aaron Defazio,
Lin Xiao
Abstract:
We develop a principled method for quantization-aware training (QAT) of large-scale machine learning models. Specifically, we show that convex, piecewise-affine regularization (PAR) can effectively induce the model parameters to cluster towards discrete values. We minimize PAR-regularized loss functions using an aggregate proximal stochastic gradient method (AProx) and prove that it has last-itera…
▽ More
We develop a principled method for quantization-aware training (QAT) of large-scale machine learning models. Specifically, we show that convex, piecewise-affine regularization (PAR) can effectively induce the model parameters to cluster towards discrete values. We minimize PAR-regularized loss functions using an aggregate proximal stochastic gradient method (AProx) and prove that it has last-iterate convergence. Our approach provides an interpretation of the straight-through estimator (STE), a widely used heuristic for QAT, as the asymptotic form of PARQ. We conduct experiments to demonstrate that PARQ obtains competitive performance on convolution- and transformer-based vision tasks.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Authors:
Tong Yang,
Bo Dai,
Lin Xiao,
Yuejie Chi
Abstract:
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However,…
▽ More
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However, existing sample-efficient approaches either require tailored uncertainty estimation under function approximation, or careful coordination of the players. In this paper, we propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters towards those with a higher collective best-response values of all the players when fixing the other players' policies, thus encouraging the policy to deviate from its current equilibrium for more exploration. VMG is oblivious to different forms of function approximation, and permits simultaneous and uncoupled policy updates of all players. Theoretically, we also establish that VMG achieves a near-optimal regret for finding both the NEs of two-player zero-sum Markov games and CCEs of multi-player general-sum Markov games under linear function approximation in an online environment, which nearly match their counterparts with sophisticated uncertainty quantification.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Service Deployment in the On-Demand Economy: Employees, Contractors, or Both?
Authors:
Lijian Lu,
Xin Weng,
Li Xiao
Abstract:
The recent advancements in mobile/data technology have fostered a widespread adoption of on-demand or gig service platforms. The increasingly available data and independent contractors have enabled these platforms to design customized services and a cost-efficient workforce to effectively match demand and supply. In practice, a diverse landscape of the workforce has been observed: some rely solely…
▽ More
The recent advancements in mobile/data technology have fostered a widespread adoption of on-demand or gig service platforms. The increasingly available data and independent contractors have enabled these platforms to design customized services and a cost-efficient workforce to effectively match demand and supply. In practice, a diverse landscape of the workforce has been observed: some rely solely on either employees or contractors, others use a blended workforce with both types of workers. In this paper, we consider a profit-maximizing service provider (SP) that decides to offer a single service or two differentiated services, along with the pricing and staffing of the workforce with employees and/or contractors, to price- and waiting-sensitive customers. Contractors independently determine whether or not to participate in the marketplace based on private reservation rates and per-service wage offered by the SP, while it controls the number of employees who receive per-hour wage. Under a single service, we show that the SP relies on either employees or contractors and identify sufficient and necessary conditions in which one workforce is better than the other. Under the optimal service deployment, we show that the SP offers either a single service relying solely on employees or contractors, or two differentiated services with a hybrid workforce depending on the service value and cost efficiencies of employees and contractors. Our analysis suggests that proliferating services with a blended workforce could improve the SP's profit significantly, and identifies conditions in which this value is significant. Our results provide an in-depth understanding and insightful guidance to on-demand platforms on the design of service differentiation and workforce models.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Quasihyperbolic metric and Gromov hyperbolic spaces I
Authors:
Hongjun Liu,
Ling Xia,
Shasha Yan
Abstract:
In this paper, we introduce the concepts of short arc and length map in quasihyperbolic metric spaces, and obtain some geometric characterizations of Gromov hyperbolicity for quasihyperbolic metric spaces in terms of the properties of short arc and length map.
In this paper, we introduce the concepts of short arc and length map in quasihyperbolic metric spaces, and obtain some geometric characterizations of Gromov hyperbolicity for quasihyperbolic metric spaces in terms of the properties of short arc and length map.
△ Less
Submitted 9 November, 2024; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Consistent complete independence test in high dimensions based on Chatterjee correlation coefficient
Authors:
Liqi Xia,
Ruiyuan Cao,
Jiang Du,
Jun Dai
Abstract:
In this article, we consider the complete independence test of high-dimensional data. Based on Chatterjee coefficient, we pioneer the development of quadratic test and extreme value test which possess good testing performance for oscillatory data, and establish the corresponding large sample properties under both null hypotheses and alternative hypotheses. In order to overcome the shortcomings of…
▽ More
In this article, we consider the complete independence test of high-dimensional data. Based on Chatterjee coefficient, we pioneer the development of quadratic test and extreme value test which possess good testing performance for oscillatory data, and establish the corresponding large sample properties under both null hypotheses and alternative hypotheses. In order to overcome the shortcomings of quadratic statistic and extreme value statistic, we propose a testing method termed as power enhancement test by adding a screening statistic to the quadratic statistic. The proposed method do not reduce the testing power under dense alternative hypotheses, but can enhance the power significantly under sparse alternative hypotheses. Three synthetic data examples and two real data examples are further used to illustrate the performance of our proposed methods.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
More results on stack-sorting for set partitions
Authors:
Samanyu Ganesh,
Lanxuan Xia,
Bole Ying
Abstract:
Let a sock be an element of an ordered finite alphabet A and a sequence of these elements be a sock sequence. In 2023, Xia introduced a deterministic version of Defant and Kravitz's stack-sorting map by defining the $φ_σ$ and $φ_{\overlineσ}$ pattern-avoidance stack-sorting maps for sock sequences. Xia showed that the $φ_{aba}$ map is the only one that eventually sorts all set partitions; in this…
▽ More
Let a sock be an element of an ordered finite alphabet A and a sequence of these elements be a sock sequence. In 2023, Xia introduced a deterministic version of Defant and Kravitz's stack-sorting map by defining the $φ_σ$ and $φ_{\overlineσ}$ pattern-avoidance stack-sorting maps for sock sequences. Xia showed that the $φ_{aba}$ map is the only one that eventually sorts all set partitions; in this paper, we prove deeper results regarding $φ_{aba}$ and $φ_{\overline{aba}}$ as a natural next step. We newly define two algorithms with time complexity $O(n^3)$ that determine if any given sock sequence is in the image of $φ_{aba}$ or $φ_{\overline{aba}}$ respectively. We also show that the maximum number of preimages that a sock sequence of length $n$ has grows at least exponentially under both the $φ_{aba}$ and $φ_{\overline{aba}}$ maps. Additionally, we prove results regarding fertility numbers (introduced by Defant) in the context of set partitions and multiple-pattern-avoiding stacks.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Authors:
Antonio Orvieto,
Lin Xiao
Abstract:
We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. In the context of most machine learning applications, each loss function is non-negative and thus can be expressed as the composition of a square and its real-valued square root. This reformulation allows us to apply the Gauss-Newton method, or the Levenberg-Marquardt method when adding…
▽ More
We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. In the context of most machine learning applications, each loss function is non-negative and thus can be expressed as the composition of a square and its real-valued square root. This reformulation allows us to apply the Gauss-Newton method, or the Levenberg-Marquardt method when adding a quadratic regularization. The resulting algorithm, while being computationally as efficient as the vanilla stochastic gradient method, is highly adaptive and can automatically warmup and decay the effective stepsize while tracking the non-negative loss landscape. We provide a tight convergence analysis, leveraging new techniques, in the stochastic convex and non-convex settings. In particular, in the convex case, the method does not require access to the gradient Lipshitz constant for convergence, and is guaranteed to never diverge. The convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method as well as to several other adaptive methods.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Iwasawa's main conjecture for Rankin-Selberg motives in the anticyclotomic case
Authors:
Yifeng Liu,
Yichao Tian,
Liang Xiao
Abstract:
In this article, we study the Iwasawa theory for cuspidal automorphic representations of $\mathrm{GL}(n)\times\mathrm{GL}(n+1)$ over CM fields along anticyclotomic directions, in the framework of the Gan--Gross--Prasad conjecture for unitary groups. We prove one-side divisibility of the corresponding Iwasawa main conjecture: when the global root number is $1$, the $p$-adic $L$-function belongs to…
▽ More
In this article, we study the Iwasawa theory for cuspidal automorphic representations of $\mathrm{GL}(n)\times\mathrm{GL}(n+1)$ over CM fields along anticyclotomic directions, in the framework of the Gan--Gross--Prasad conjecture for unitary groups. We prove one-side divisibility of the corresponding Iwasawa main conjecture: when the global root number is $1$, the $p$-adic $L$-function belongs to the characteristic ideal of the Iwasawa Bloch--Kato Selmer group; when the global root number is $-1$, the square of the characteristic ideal of a certain Iwasawa module is contained in the characteristic ideal of the torsion part of the Iwasawa Bloch--Kato Selmer group (analogous to Perrin-Riou's Heegner point main conjecture).
△ Less
Submitted 25 December, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
4+3 Phases of Compute-Optimal Neural Scaling Laws
Authors:
Elliot Paquette,
Courtney Paquette,
Lechao Xiao,
Jeffrey Pennington
Abstract:
We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves…
▽ More
We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget.
△ Less
Submitted 18 April, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Existence of solutions for a class of Kirchhoff-type equations with indefinite potential
Authors:
Linlian Xiao,
Jiaqian Yuan,
Jian Zhou,
Yunshun Wu
Abstract:
In this paper, we consider the existence of solutions of the following Kirchhoff-type problem \[
\left\{
\begin{array}
[c]{ll}
-\left(a+b\int_{\mathbb{R}^3}|\nabla u|^2dx\right)Δu+ V(x)u=f(x,u),~{\rm{in}}~ \mathbb{R}^{3},\\
u\in H^1(\mathbb{R}^3),
\end{array} \right. \] where $a,b$ are postive constants, and the potential $V(x)$ is continuous and indefinite in sign. Under some suitable…
▽ More
In this paper, we consider the existence of solutions of the following Kirchhoff-type problem \[
\left\{
\begin{array}
[c]{ll}
-\left(a+b\int_{\mathbb{R}^3}|\nabla u|^2dx\right)Δu+ V(x)u=f(x,u),~{\rm{in}}~ \mathbb{R}^{3},\\
u\in H^1(\mathbb{R}^3),
\end{array} \right. \] where $a,b$ are postive constants, and the potential $V(x)$ is continuous and indefinite in sign. Under some suitable assumptions on $V(x)$ and $f$, we obtain the existence of solutions by the Symmetric Mountain Pass Theorem.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Non-convexity of level sets for $k$-Hessian equations in convex ring
Authors:
Zhizhang Wang,
Ling Xiao
Abstract:
In this paper we construct explicit examples that show the sublevel sets of the solution of a $k$-Hessian equation defined on a convex ring do not have to be convex.
In this paper we construct explicit examples that show the sublevel sets of the solution of a $k$-Hessian equation defined on a convex ring do not have to be convex.
△ Less
Submitted 6 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Whittaker modules and hyperbolic Toda lattices
Authors:
Limeng Xia
Abstract:
Let $\sg$ be a complex finite-dimensional simple Lie algebra and let $\sg_l$ be the corresponding generalized Takiff algebra. This paper studies the affine variety $\ssf+\sb_l$ where $\ssf$ is similar to a principal nilpotent element of $\sg$ and $\sb_l$ is a subalgebra corresponding to the Borel subalgebra $\sb$ of $\sg$. Inspired by Kostant's work then we deal with two questions. One of them is…
▽ More
Let $\sg$ be a complex finite-dimensional simple Lie algebra and let $\sg_l$ be the corresponding generalized Takiff algebra. This paper studies the affine variety $\ssf+\sb_l$ where $\ssf$ is similar to a principal nilpotent element of $\sg$ and $\sb_l$ is a subalgebra corresponding to the Borel subalgebra $\sb$ of $\sg$. Inspired by Kostant's work then we deal with two questions. One of them is to construct the Whittaker model for the $G_l$-invariants of symmetric algebra $S(\sg_l)$ where $G_l$ is the adjoint group of $\sg_l$ and $G_l$ acts on $S(\sg_l)$ by coadjoint action, and then to classify all nonsingular Whittaker modules over $\sg_l$. Another one is to describe the symplectic structure of the manifold $Z\subseteq\ssf+\sb_l$ of normalized Jacobi elements. Then the Hamiltonian corresponding to a fundamental invariant provides a class of hyperbolic Toda lattices. In particular, a simplest example describes the state of a dynamical system consisting of a positive mass particle and a negative mass particle.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
AdamL: A fast adaptive gradient method incorporating loss function
Authors:
Lu Xia,
Stefano Massei
Abstract:
Adaptive first-order optimizers are fundamental tools in deep learning, although they may suffer from poor generalization due to the nonuniform gradient scaling. In this work, we propose AdamL, a novel variant of the Adam optimizer, that takes into account the loss function information to attain better generalization results. We provide sufficient conditions that together with the Polyak-Lojasiewi…
▽ More
Adaptive first-order optimizers are fundamental tools in deep learning, although they may suffer from poor generalization due to the nonuniform gradient scaling. In this work, we propose AdamL, a novel variant of the Adam optimizer, that takes into account the loss function information to attain better generalization results. We provide sufficient conditions that together with the Polyak-Lojasiewicz inequality, ensure the linear convergence of AdamL. As a byproduct of our analysis, we prove similar convergence properties for the EAdam, and AdaBelief optimizers. Experimental results on benchmark functions show that AdamL typically achieves either the fastest convergence or the lowest objective function values when compared to Adam, EAdam, and AdaBelief. These superior performances are confirmed when considering deep learning tasks such as training convolutional neural networks, training generative adversarial networks using vanilla convolutional neural networks, and long short-term memory networks. Finally, in the case of vanilla convolutional neural networks, AdamL stands out from the other Adam's variants and does not require the manual adjustment of the learning rate during the later stage of the training.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
On the Maximization of Long-Run Reward CVaR for Markov Decision Processes
Authors:
Li Xia,
Zhihui Yu,
Peter W. Glynn
Abstract:
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directio…
▽ More
This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directions, we prove that the maximum of long-run CVaR of MDPs over the set of history-dependent randomized policies can be found within the class of stationary randomized policies. In contrast to classical MDPs, we find that there may not exist an optimal stationary deterministic policy for maximizing CVaR. Instead, we prove the existence of an optimal stationary randomized policy that requires randomizing over at most two actions. Via a convex optimization representation of CVaR, we convert the long-run CVaR maximization MDP into a minimax problem, where we prove the interchangeability of minimum and maximum and the related existence of saddle point solutions. Furthermore, we propose an algorithm that finds the saddle point solution by solving two linear programs. These results are then extended to objectives that involve maximizing some combination of mean and CVaR of rewards simultaneously. Finally, we conduct numerical experiments to demonstrate the main results.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
A relaxation method for binary optimizations on constrained Stiefel manifold
Authors:
Lianghai Xiao,
Yitian Qian,
Shaohua Pan
Abstract:
This paper focuses on a class of binary orthogonal optimization problems frequently arising in semantic hashing. Consider that this class of problems may have an empty feasible set, rendering them not well-defined. We introduce an equivalent model involving a restricted Stiefel manifold and a matrix box set, and then investigate its penalty problems induced by the $\ell_1$-distance from the box se…
▽ More
This paper focuses on a class of binary orthogonal optimization problems frequently arising in semantic hashing. Consider that this class of problems may have an empty feasible set, rendering them not well-defined. We introduce an equivalent model involving a restricted Stiefel manifold and a matrix box set, and then investigate its penalty problems induced by the $\ell_1$-distance from the box set and its Moreau envelope. The two penalty problems are always well-defined. Moreover, they serve as the global exact penalties provided that the original feasible set is non-empty. Notably, the penalty problem induced by the Moreau envelope is a smooth optimization over an embedded submanifold with a favorable structure. We develop a retraction-based line-search Riemannian gradient method to address the penalty problem. Finally, the proposed method is applied to supervised and unsupervised hashing tasks and is compared with several popular methods on the MNIST and CIFAR-10 datasets. The numerical comparisons reveal that our algorithm is significantly superior to other solvers in terms of feasibility violation, and it is comparable even superior to others in terms of evaluation metrics related to the Hamming distance.
△ Less
Submitted 7 July, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Smooth modules over the N=1 Bondi-Metzner-Sachs superalgebra
Authors:
Dong Liu,
Yufeng Pei,
Limeng Xia,
Kaiming Zhao
Abstract:
In this paper, we present a determinant formula for the contravariant form on Verma modules over the N=1 Bondi-Metzner-Sachs (BMS) superalgebra. This formula establishes a necessary and sufficient condition for the irreducibility of the Verma modules. We then introduce and characterize a class of simple smooth modules that generalize both Verma and Whittaker modules over the N=1 BMS superalgebra.…
▽ More
In this paper, we present a determinant formula for the contravariant form on Verma modules over the N=1 Bondi-Metzner-Sachs (BMS) superalgebra. This formula establishes a necessary and sufficient condition for the irreducibility of the Verma modules. We then introduce and characterize a class of simple smooth modules that generalize both Verma and Whittaker modules over the N=1 BMS superalgebra. We also utilize the Heisenberg-Clifford vertex superalgebra to construct a free field realization for the N=1 BMS superalgebra. This free field realization allows us to obtain a family of natural smooth modules over the N=1 BMS superalgebra, which includes Fock modules and certain Whittaker modules.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Simple smooth modules over the superconformal current algebra
Authors:
Dong Liu,
Yufeng Pei,
Limeng Xia,
Kaiming Zhao
Abstract:
In this paper, we classify simple smooth modules over the superconformal current algebra $\frak g$. More precisely, we first classify simple smooth modules over the Heisenberg-Clifford algebra, and then prove that any simple smooth $\frak g$-module is a tensor product of such modules for the super Virasoro algebra and the Heisenberg-Clifford algebra, or an induced module from a simple module over…
▽ More
In this paper, we classify simple smooth modules over the superconformal current algebra $\frak g$. More precisely, we first classify simple smooth modules over the Heisenberg-Clifford algebra, and then prove that any simple smooth $\frak g$-module is a tensor product of such modules for the super Virasoro algebra and the Heisenberg-Clifford algebra, or an induced module from a simple module over some finite-dimensional solvable Lie superalgebras. As a byproduct, we provide characterizations for both simple highest weight $\frak g$-modules and simple Whittaker $\frak g$-modules. Additionally, we present several examples of simple smooth $\frak g$-modules that are not tensor product of modules over the super Virasoro algebra and the Heisenberg-Clifford algebra.
△ Less
Submitted 28 May, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Noisy recovery from random linear observations: Sharp minimax rates under elliptical constraints
Authors:
Reese Pathak,
Martin J. Wainwright,
Lin Xiao
Abstract:
Estimation problems with constrained parameter spaces arise in various settings. In many of these problems, the observations available to the statistician can be modelled as arising from the noisy realization of the image of a random linear operator; an important special case is random design regression. We derive sharp rates of estimation for arbitrary compact elliptical parameter sets and demons…
▽ More
Estimation problems with constrained parameter spaces arise in various settings. In many of these problems, the observations available to the statistician can be modelled as arising from the noisy realization of the image of a random linear operator; an important special case is random design regression. We derive sharp rates of estimation for arbitrary compact elliptical parameter sets and demonstrate how they depend on the distribution of the random linear operator. Our main result is a functional that characterizes the minimax rate of estimation in terms of the noise level, the law of the random operator, and elliptical norms that define the error metric and the parameter space. This nonasymptotic result is sharp up to an explicit universal constant, and it becomes asymptotically exact as the radius of the parameter space is allowed to grow. We demonstrate the generality of the result by applying it to both parametric and nonparametric regression problems, including those involving distribution shift or dependent covariates.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Global Algorithms for Mean-Variance Optimization in Markov Decision Processes
Authors:
Li Xia,
Shuai Ma
Abstract:
Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to find the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP. By introducing the concepts of pseudo mean and pseudo variance, we convert…
▽ More
Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to find the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP. By introducing the concepts of pseudo mean and pseudo variance, we convert the original problem to a bilevel MDP problem, where the inner one is a standard MDP optimizing pseudo mean-variance and the outer one is a single parameter selection problem optimizing pseudo mean. We use the sensitivity analysis of MDPs to derive the properties of this bilevel problem. By solving inner standard MDPs for pseudo mean-variance optimization, we can identify worse policy spaces dominated by optimal policies of the pseudo problems. We propose an optimization algorithm which can find the globally optimal policy by repeatedly removing worse policy spaces. The convergence and complexity of the algorithm are studied. Another policy dominance property is also proposed to further improve the algorithm efficiency. Numerical experiments demonstrate the performance and efficiency of our algorithms. To the best of our knowledge, our algorithm is the first that efficiently finds the globally optimal policy of mean-variance optimization in MDPs. These results are also valid for solely minimizing the variance metrics in MDPs.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Slopes of modular forms and geometry of eigencurves
Authors:
Ruochuan Liu,
Nha Xuan Truong,
Liang Xiao,
Bin Zhao
Abstract:
Under a stronger genericity condition, we prove the local analogue of ghost conjecture of Bergdall and Pollack. As applications, we deduce in this case (a) a folklore conjecture of Breuil--Buzzard--Emerton on the crystalline slopes of Kisin's crystabelian deformation spaces, (b) Gouvea's $\lfloor\frac{k-1}{p+1}\rfloor$-conjecture on slopes of modular forms, and (c) the finiteness of irreducible co…
▽ More
Under a stronger genericity condition, we prove the local analogue of ghost conjecture of Bergdall and Pollack. As applications, we deduce in this case (a) a folklore conjecture of Breuil--Buzzard--Emerton on the crystalline slopes of Kisin's crystabelian deformation spaces, (b) Gouvea's $\lfloor\frac{k-1}{p+1}\rfloor$-conjecture on slopes of modular forms, and (c) the finiteness of irreducible components of the eigencurve. In addition, applying combinatorial arguments by Bergdall and Pollack, and by Ren, we deduce as corollaries in the reducible and strongly generic case, (d) Gouvea--Mazur conjecture, (e) a variant of Gouvea's conjecture on slope distributions, and (f) a refined version of Coleman's spectral halo conjecture.
△ Less
Submitted 26 January, 2025; v1 submitted 15 February, 2023;
originally announced February 2023.
-
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Authors:
Lu Xia,
Michiel E. Hochstenbach,
Stefano Massei
Abstract:
When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding e…
▽ More
When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.
△ Less
Submitted 18 January, 2025; v1 submitted 23 January, 2023;
originally announced January 2023.
-
$U(\frak h)$-free modules over the Lie algebras of differential operators
Authors:
Munayim Dilxat,
Shoulan Gao,
Dong Liu,
Limeng Xia
Abstract:
In this paper, we consider some non-weight modules over the Lie algebra of Weyl type. First, we determine the modules whose restriction to $U(\frak h)$ are free of rank $1$ over the Lie algebra of differential operators on the circle. Then we determine the necessary and sufficient conditions for the tensor products of quasi-finite highest weight modules and $U(\frak h)$-free modules to be irreduci…
▽ More
In this paper, we consider some non-weight modules over the Lie algebra of Weyl type. First, we determine the modules whose restriction to $U(\frak h)$ are free of rank $1$ over the Lie algebra of differential operators on the circle. Then we determine the necessary and sufficient conditions for the tensor products of quasi-finite highest weight modules and $U(\frak h)$-free modules to be irreducible, and obtain that any two such tensor products are isomorphic if and only if the corresponding highest weight modules and $U(\frak h)$-free modules are isomorphic. Finally, we extend such results to the Lie algebras of differential operators in the general case.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Heisenberg double of the generalized quantum euclidean group and its representations
Authors:
Limeng Xia
Abstract:
The generalized quantum Euclidean group $\oq(\frak{b}_{m,n})$ is a natural generalization of the quantum Euclidean group $\oq(\frak{b}_{1,1})$. The Heisenberg double $\od(\frak{b}_{m,n})$ of $\oq(\frak{b}_{m,n})$ is the smash product of $\oq(\frak{b}_{m,n})$ with its Hopf dual $\ou(\frak{b}_{m,n})$. In this paper, we study the weight modules, the prime spectrum and the automorphism group of the He…
▽ More
The generalized quantum Euclidean group $\oq(\frak{b}_{m,n})$ is a natural generalization of the quantum Euclidean group $\oq(\frak{b}_{1,1})$. The Heisenberg double $\od(\frak{b}_{m,n})$ of $\oq(\frak{b}_{m,n})$ is the smash product of $\oq(\frak{b}_{m,n})$ with its Hopf dual $\ou(\frak{b}_{m,n})$. In this paper, we study the weight modules, the prime spectrum and the automorphism group of the Heisenberg double $\od(\frak{b}_{m,n})$.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
A Profit-Maximizing Strategy for Advertising on the e-Commerce Platforms
Authors:
Lianghai Xiao,
Yixing Zhao,
Jiwei Chen
Abstract:
The online advertising management platform has become increasingly popular among e-commerce vendors/advertisers, offering a streamlined approach to reach target customers. Despite its advantages, configuring advertising strategies correctly remains a challenge for online vendors, particularly those with limited resources. Ineffective strategies often result in a surge of unproductive ``just lookin…
▽ More
The online advertising management platform has become increasingly popular among e-commerce vendors/advertisers, offering a streamlined approach to reach target customers. Despite its advantages, configuring advertising strategies correctly remains a challenge for online vendors, particularly those with limited resources. Ineffective strategies often result in a surge of unproductive ``just looking'' clicks, leading to disproportionately high advertising expenses comparing to the growth of sales. In this paper, we present a novel profit-maximing strategy for targeting options of online advertising. The proposed model aims to find the optimal set of features to maximize the probability of converting targeted audiences into actual buyers. We address the optimization challenge by reformulating it as a multiple-choice knapsack problem (MCKP). We conduct an empirical study featuring real-world data from Tmall to show that our proposed method can effectively optimize the advertising strategy with budgetary constraints.
△ Less
Submitted 21 August, 2023; v1 submitted 30 October, 2022;
originally announced November 2022.
-
Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion
Authors:
Li Xia,
Peter W. Glynn
Abstract:
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVa…
▽ More
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVaR metric, we derive a CVaR difference formula which quantifies the difference of long-run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to local optima in the mixed policy space. We further discuss some extensions including the mean-CVaR optimization and the maximization of CVaR. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
Authors:
Rui Yuan,
Simon S. Du,
Robert M. Gower,
Alessandro Lazaric,
Lin Xiao
Abstract:
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linea…
▽ More
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/ε^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.
△ Less
Submitted 21 February, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Authors:
Shicong Cen,
Yuejie Chi,
Simon S. Du,
Lin Xiao
Abstract:
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications. While there has been substantial progress on understanding the global convergence of policy optimization methods in single-agent RL, designing and analysis of efficient policy optimization algorithms in the MARL setting pres…
▽ More
Multi-Agent Reinforcement Learning (MARL) -- where multiple agents learn to interact in a shared dynamic environment -- permeates across a wide range of critical applications. While there has been substantial progress on understanding the global convergence of policy optimization methods in single-agent RL, designing and analysis of efficient policy optimization algorithms in the MARL setting present significant challenges, which unfortunately, remain highly inadequately addressed by existing theory. In this paper, we focus on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and study equilibrium finding algorithms in both the infinite-horizon discounted setting and the finite-horizon episodic setting. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method and the value is updated on a slower timescale. We show that, in the full-information tabular setting, the proposed method achieves a finite-time last-iterate linear convergence to the quantal response equilibrium of the regularized problem, which translates to a sublinear last-iterate convergence to the Nash equilibrium by controlling the amount of regularization. Our convergence results improve upon the best known iteration complexities, and lead to a better understanding of policy optimization in competitive Markov games.
△ Less
Submitted 3 October, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models
Authors:
Ye Tian,
Haolei Weng,
Lucy Xia,
Yang Feng
Abstract:
Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. W…
▽ More
Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that effectively utilizes unknown similarities between related tasks and is robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve the minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Additionally, iterative unsupervised multi-task and transfer learning methods may suffer from an initialization alignment problem, and two alignment algorithms are proposed to resolve the issue. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
△ Less
Submitted 2 August, 2024; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Dominant Eigenvalue-Eigenvector Pair Estimation via Graph Infection
Authors:
Kaiyuan Yang,
Li Xia,
Y. C. Tay
Abstract:
We present a novel method to estimate the dominant eigenvalue and eigenvector pair of any non-negative real matrix via graph infection. The key idea in our technique lies in approximating the solution to the first-order matrix ordinary differential equation (ODE) with the Euler method. Graphs, which can be weighted, directed, and with loops, are first converted to its adjacency matrix A. Then by a…
▽ More
We present a novel method to estimate the dominant eigenvalue and eigenvector pair of any non-negative real matrix via graph infection. The key idea in our technique lies in approximating the solution to the first-order matrix ordinary differential equation (ODE) with the Euler method. Graphs, which can be weighted, directed, and with loops, are first converted to its adjacency matrix A. Then by a naive infection model for graphs, we establish the corresponding first-order matrix ODE, through which A's dominant eigenvalue is revealed by the fastest growing term. When there are multiple dominant eigenvalues of the same magnitude, the classical power iteration method can fail. In contrast, our method can converge to the dominant eigenvalue even when same-magnitude counterparts exist, be it complex or opposite in sign. We conduct several experiments comparing the convergence between our method and power iteration. Our results show clear advantages over power iteration for tree graphs, bipartite graphs, directed graphs with periods, and Markov chains with spider-traps. To our knowledge, this is the first work that estimates dominant eigenvalue and eigenvector pair from the perspective of a dynamical system and matrix ODE. We believe our method can be adopted as an alternative to power iteration, especially for graphs.
△ Less
Submitted 7 May, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Generalized Minkowski inequality via degenerate Hessian equations on exterior domains
Authors:
Ling Xiao
Abstract:
In this paper, we prove a generalized Minkowski inequality holds for any smooth, $(k-1)$-convex, starshaped domain $Ω.$ Our proof relies on the solvability of the degenerate $k$-Hessian equation on the exterior domain $\mathbb R^n\setminusΩ.$
In this paper, we prove a generalized Minkowski inequality holds for any smooth, $(k-1)$-convex, starshaped domain $Ω.$ Our proof relies on the solvability of the degenerate $k$-Hessian equation on the exterior domain $\mathbb R^n\setminusΩ.$
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Entire $σ_k$ curvature flow in Minkowski space
Authors:
Zhizhang Wang,
Ling Xiao
Abstract:
In this paper, we study the $σ_k$ curvature flow of noncompact spacelike hypersurfaces in Minkowski space. We prove that if the initial hypersurface satisfies certain conditions, then the flow exists for all time. Moreover, we show that after rescaling, the flow converges to a self-expander.
In this paper, we study the $σ_k$ curvature flow of noncompact spacelike hypersurfaces in Minkowski space. We prove that if the initial hypersurface satisfies certain conditions, then the flow exists for all time. Moreover, we show that after rescaling, the flow converges to a self-expander.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
Simple weight modules for Yangian $\operatorname{Y}(\mathfrak{sl}_{2})$
Authors:
Yikun Zhou,
Yilan Tan,
Limeng Xia
Abstract:
Let $\mathfrak{g}$ be a finite-dimensional simple Lie algebra over $\mathbb{C}$. A $\operatorname{Y}(\mathfrak{g})$-module is said to be weight if it is a weight $\mathfrak{g}$-module. We give a complete classification of simple weight modules for $\operatorname{Y}(\mathfrak{sl}_2)$ which admits a one-dimensional weight space. We prove that there are four classes of such modules: finite, highest w…
▽ More
Let $\mathfrak{g}$ be a finite-dimensional simple Lie algebra over $\mathbb{C}$. A $\operatorname{Y}(\mathfrak{g})$-module is said to be weight if it is a weight $\mathfrak{g}$-module. We give a complete classification of simple weight modules for $\operatorname{Y}(\mathfrak{sl}_2)$ which admits a one-dimensional weight space. We prove that there are four classes of such modules: finite, highest weight, lowest weight and dense modules. Different from the classical $\mathfrak{sl}_{2}$ representation theory, we show that there exist a class of $\operatorname{Y}(\mathfrak{sl}_{2})$ irreducible modules which have uniformly 2-dimensional weight spaces.
△ Less
Submitted 4 August, 2022; v1 submitted 10 July, 2022;
originally announced July 2022.
-
A local analogue of the ghost conjecture of Bergdall-Pollack
Authors:
Ruochuan Liu,
Nha Xuan Truong,
Liang Xiao,
Bin Zhao
Abstract:
We formulate a local analogue of the ghost conjecture of Bergdall and Pollack, which essentially relies purely on the representation theory of GL_2(Q_p). We further study the combinatorial properties of the ghost series as well as its Newton polygon, in particular, giving a characterization of the vertices of the Newton polygon and proving an integrality result of the slopes. In a forthcoming sequ…
▽ More
We formulate a local analogue of the ghost conjecture of Bergdall and Pollack, which essentially relies purely on the representation theory of GL_2(Q_p). We further study the combinatorial properties of the ghost series as well as its Newton polygon, in particular, giving a characterization of the vertices of the Newton polygon and proving an integrality result of the slopes. In a forthcoming sequel, we will prove this local ghost conjecture under some mild hypothesis and give arithmetic applications.
△ Less
Submitted 29 November, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method
Authors:
Aaron Defazio,
Baoyu Zhou,
Lin Xiao
Abstract:
The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients. Because this sum on the denominator is increasing, the method can only decrease step sizes over time, and requires a learning rate scaling hyper-parameter to be carefully tuned. To overcome this restriction, we introduce GradaGrad, a method in the same family that naturally grows or…
▽ More
The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients. Because this sum on the denominator is increasing, the method can only decrease step sizes over time, and requires a learning rate scaling hyper-parameter to be carefully tuned. To overcome this restriction, we introduce GradaGrad, a method in the same family that naturally grows or shrinks the learning rate based on a different accumulation in the denominator, one that can both increase and decrease. We show that it obeys a similar convergence rate as AdaGrad and demonstrate its non-monotone adaptation capability with experiments.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Entire self-expanders for power of $σ_k$ curvature flow in Minkowski space
Authors:
Zhizhang Wang,
Ling Xiao
Abstract:
In [19], we prove that if an entire, spacelike, convex hypersurface $\mathcal{M}_{u_0}$ has bounded principal curvatures, then the $σ_k^{1/α}$ (power of $σ_k$) curvature flow starting from $\mathcal{M}_{u_0}$ admits a smooth convex solution $u$ for $t>0.$ Moreover, after rescaling, the flow converges to a convex self-expander $\tilde{\mathcal{M}}=\{(x, \tilde{u}(x))\mid x\in\mathbb{R}^n\}$ that sa…
▽ More
In [19], we prove that if an entire, spacelike, convex hypersurface $\mathcal{M}_{u_0}$ has bounded principal curvatures, then the $σ_k^{1/α}$ (power of $σ_k$) curvature flow starting from $\mathcal{M}_{u_0}$ admits a smooth convex solution $u$ for $t>0.$ Moreover, after rescaling, the flow converges to a convex self-expander $\tilde{\mathcal{M}}=\{(x, \tilde{u}(x))\mid x\in\mathbb{R}^n\}$ that satisfies $σ_k(κ[\tilde{\mathcal{M}}])=(-\left<X_0, ν_0\right>)^α.$ Unfortunately, the existence of self-expander for power of $σ_k$ curvature flow in Minkowski space has not been studied before. In this paper, we fill the gap.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Entire convex curvature flow in Minkowski space
Authors:
Zhizhang Wang,
Ling Xiao
Abstract:
In this paper, we study fully nonlinear curvature flows of noncompact spacelike hypersurfaces in Minkowski space. We prove that if the initial hypersurface satisfies certain conditions, then the flow exists for all time. Moreover, we show that after rescaling the flow converges to the future timelike hyperboloid, which is a self-expander.
In this paper, we study fully nonlinear curvature flows of noncompact spacelike hypersurfaces in Minkowski space. We prove that if the initial hypersurface satisfies certain conditions, then the flow exists for all time. Moreover, we show that after rescaling the flow converges to the future timelike hyperboloid, which is a self-expander.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
FedShuffle: Recipes for Better Use of Local Work in Federated Learning
Authors:
Samuel Horváth,
Maziar Sanjabi,
Lin Xiao,
Peter Richtárik,
Michael Rabbat
Abstract:
The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). Such methods are usually implemented by having clients perform one or more epochs of local training per round while randomly reshuffling their finite dataset in each epoch. Data imbalance, wher…
▽ More
The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). Such methods are usually implemented by having clients perform one or more epochs of local training per round while randomly reshuffling their finite dataset in each epoch. Data imbalance, where clients have different numbers of local training samples, is ubiquitous in FL applications, resulting in different clients performing different numbers of local updates in each round. In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in this regime encompassing random reshuffling and heterogeneity. FedShuffle is the first local update method with theoretical convergence guarantees that incorporates random reshuffling, data imbalance, and client sampling - features that are essential in large-scale cross-device FL. We present a comprehensive theoretical analysis of FedShuffle and show, both theoretically and empirically, that it does not suffer from the objective function mismatch that is present in FL methods that assume homogeneous updates in heterogeneous FL setups, such as FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. Similar to Mime (Karimireddy et al., 2020), we show that FedShuffle with momentum variance reduction (Cutkosky & Orabona, 2019) improves upon non-local methods under a Hessian similarity assumption.
△ Less
Submitted 27 September, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Federated Learning with Partial Model Personalization
Authors:
Krishna Pillutla,
Kshitiz Malik,
Abdelrahman Mohamed,
Michael Rabbat,
Maziar Sanjabi,
Lin Xiao
Abstract:
We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms…
▽ More
We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm by a small but consistent margin.
△ Less
Submitted 15 August, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
On the influence of stochastic roundoff errors and their bias on the convergence of the gradient descent method with low-precision floating-point computation
Authors:
Lu Xia,
Stefano Massei,
Michiel E. Hochstenbach,
Barry Koren
Abstract:
When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of th…
▽ More
When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.
△ Less
Submitted 25 February, 2023; v1 submitted 24 February, 2022;
originally announced February 2022.
-
On the Convergence Rates of Policy Gradient Methods
Authors:
Lin Xiao
Abstract:
We consider infinite-horizon discounted Markov decision problems with finite state and action spaces and study the convergence rates of the projected policy gradient method and a general class of policy mirror descent methods, all with direct parametrization in the policy space. First, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of…
▽ More
We consider infinite-horizon discounted Markov decision problems with finite state and action spaces and study the convergence rates of the projected policy gradient method and a general class of policy mirror descent methods, all with direct parametrization in the policy space. First, we develop a theory of weak gradient-mapping dominance and use it to prove sharper sublinear convergence rate of the projected policy gradient method. Then we show that with geometrically increasing step sizes, a general class of policy mirror descent methods, including the natural policy gradient method and a projected Q-descent method, all enjoy a linear rate of convergence without relying on entropy or other strongly convex regularization. Finally, we also analyze the convergence rate of an inexact policy mirror descent method and estimate its sample complexity under a simple generative model.
△ Less
Submitted 6 March, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Authors:
Shuai Ma,
Xiaoteng Ma,
Li Xia
Abstract:
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders…
▽ More
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Furthermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Non-splitting Neyman-Pearson Classifiers
Authors:
Jingming Wang,
Lucy Xia,
Zhigang Bao,
Xin Tong
Abstract:
The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for applications such as severe disease diagnosis, fraud detection, among others. A series of NP classifiers have been developed to guarantee the type I error control with hig…
▽ More
The Neyman-Pearson (NP) binary classification paradigm constrains the more severe type of error (e.g., the type I error) under a preferred level while minimizing the other (e.g., the type II error). This paradigm is suitable for applications such as severe disease diagnosis, fraud detection, among others. A series of NP classifiers have been developed to guarantee the type I error control with high probability. However, these existing classifiers involve a sample splitting step: a mixture of class 0 and class 1 observations to construct a scoring function and some left-out class 0 observations to construct a threshold. This splitting enables classifier construction built upon independence, but it amounts to insufficient use of data for training and a potentially higher type II error. Leveraging a canonical linear discriminant analysis model, we derive a quantitative CLT for a certain functional of quadratic forms of the inverse of sample and population covariance matrices, and based on this result, develop for the first time NP classifiers without splitting the training sample. Numerical experiments have confirmed the advantages of our new non-splitting parametric strategy.
△ Less
Submitted 4 June, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Error bound and exact penalty method for optimization problems with nonnegative orthogonal constraint
Authors:
Yitian Qian,
Shaohua Pan,
Lianghai Xiao
Abstract:
This paper is concerned with a class of optimization problems with the nonnegative orthogonal constraint, in which the objective function is $L$-smooth on an open set containing the Stiefel manifold ${\rm St}(n,r)$. We derive a locally Lipschitzian error bound for the feasible points without zero rows when $n>r>1$, and when $n>r=1$ or $n=r$ achieve a global Lipschitzian error bound. Then, we show…
▽ More
This paper is concerned with a class of optimization problems with the nonnegative orthogonal constraint, in which the objective function is $L$-smooth on an open set containing the Stiefel manifold ${\rm St}(n,r)$. We derive a locally Lipschitzian error bound for the feasible points without zero rows when $n>r>1$, and when $n>r=1$ or $n=r$ achieve a global Lipschitzian error bound. Then, we show that the penalty problem induced by the elementwise $\ell_1$-norm distance to the nonnegative cone is a global exact penalty, and so is the one induced by its Moreau envelope under a lower second-order calmness of the objective function. A practical penalty algorithm is developed by solving approximately a series of smooth penalty problems with a retraction-based nonmonotone line-search proximal gradient method, and any cluster point of the generated sequence is shown to be a stationary point of the original problem. Numerical comparisons with the ALM \citep{Wen13} and the exact penalty method \citep{JiangM22} indicate that our penalty method has an advantage in terms of the quality of solutions despite taking a little more time.
△ Less
Submitted 4 February, 2025; v1 submitted 5 November, 2021;
originally announced November 2021.