-
A Variational-Calculus Approach to Online Algorithm Design and Analysis
Authors:
Pan Xu
Abstract:
Factor-revealing linear programs (LPs) and policy-revealing LPs arise in various contexts of algorithm design and analysis. They are commonly used techniques for analyzing the performance of approximation and online algorithms, especially when direct performance evaluation is challenging. The main idea is to characterize the worst-case performance as a family of LPs parameterized by an integer…
▽ More
Factor-revealing linear programs (LPs) and policy-revealing LPs arise in various contexts of algorithm design and analysis. They are commonly used techniques for analyzing the performance of approximation and online algorithms, especially when direct performance evaluation is challenging. The main idea is to characterize the worst-case performance as a family of LPs parameterized by an integer $n \ge 1$, representing the size of the input instance. To obtain the best possible bounds on the target ratio (e.g., approximation or competitive ratios), we often need to determine the optimal objective value (and the corresponding optimal solution) of a family of LPs as $n \to \infty$. One common method, called the Primal-Dual approach, involves examining the constraint structure in the primal and dual programs, then developing feasible analytical solutions to both that achieve equal or nearly equal objective values. Another approach, known as \emph{strongly factor-revealing LPs}, similarly requires careful investigation of the constraint structure in the primal program. In summary, both methods rely on \emph{instance-specific techniques}, which is difficult to generalize from one instance to another.
In this paper, we introduce a general variational-calculus approach that enables us to analytically study the optimal value and solution to a family of LPs as their size approaches infinity. The main idea is to first reformulate the LP in the limit, as its size grows infinitely large, as a variational-calculus instance and then apply existing methods, such as the Euler-Lagrange equation and Lagrange multipliers, to solve it. We demonstrate the power of our approach through three case studies of online optimization problems and anticipate broader applications of this method.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Probabilistic degenerate poly-Bell polynomials associated with random variables
Authors:
Pengxiang Xue,
Yuankui Ma,
Taekyun Kim,
Dae San Kim,
Wenpeng Zhang
Abstract:
Let Y be a random variable whose moment generating function exists in a neighborhood of the origin. The aim of this paper is to study the probabilistic degenerate poly-Bell polynomials associated with the random variable Y, arising from the degenerate polyexponential functions, which are probabilistic extensions of degenerate versions of the poly-Bell polynomials. We derive several explicit expres…
▽ More
Let Y be a random variable whose moment generating function exists in a neighborhood of the origin. The aim of this paper is to study the probabilistic degenerate poly-Bell polynomials associated with the random variable Y, arising from the degenerate polyexponential functions, which are probabilistic extensions of degenerate versions of the poly-Bell polynomials. We derive several explicit expressions and some related identities for them. In addition, we consider the special cases that Y is the Bernoulli random variable with probability of success p or the gamma random variable with parameters 1,1.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Explicit formula for the $(\text{GL}_2, \text{GL}_2)$ theta lift via Bruhat decomposition
Authors:
Peter Xu
Abstract:
Using combinations of weight-1 and weight-2 of Kronecker-Eisenstein series to construct currents in the distributional de Rham complex of a squared elliptic curve, we find a simple explicit formula for the type II $(\text{GL}_2, \text{GL}_2)$ theta lift without smoothing, analogous to the classical formula of Siegel for periods of Eisenstein series. For $K$ a CM field, the same technique applies w…
▽ More
Using combinations of weight-1 and weight-2 of Kronecker-Eisenstein series to construct currents in the distributional de Rham complex of a squared elliptic curve, we find a simple explicit formula for the type II $(\text{GL}_2, \text{GL}_2)$ theta lift without smoothing, analogous to the classical formula of Siegel for periods of Eisenstein series. For $K$ a CM field, the same technique applies without change to obtain an analogous formula for the $(\text{GL}_2(K),K^\times)$ theta correspondence.
△ Less
Submitted 19 September, 2024; v1 submitted 10 September, 2024;
originally announced September 2024.
-
Whittaker modules for a subalgebra of N=2 superconformal algebra
Authors:
Naihuan Jing,
Pengfa Xu,
Honglian Zhang
Abstract:
In this paper, Whittaker modules are studied for a subalgebra $\mathfrak{q}_ε$ of the $\emph{N}$=2 superconformal algebra.
The Whittaker modules are classified by central characters.
Additionally, criteria for the irreducibility of the Whittaker modules are given.
In this paper, Whittaker modules are studied for a subalgebra $\mathfrak{q}_ε$ of the $\emph{N}$=2 superconformal algebra.
The Whittaker modules are classified by central characters.
Additionally, criteria for the irreducibility of the Whittaker modules are given.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Statistical inference on kurtosis of elliptical distributions
Authors:
Bowen Zhou,
Peirong Xu,
Cheng Wang
Abstract:
Multivariate elliptically-contoured distributions are widely used for modeling correlated and non-Gaussian data. In this work, we study the kurtosis of the elliptical model, which is an important parameter in many statistical analysis. Based on U-statistics, we develop an estimation method. Theoretically, we show that the proposed estimator is consistent under regular conditions, especially we rel…
▽ More
Multivariate elliptically-contoured distributions are widely used for modeling correlated and non-Gaussian data. In this work, we study the kurtosis of the elliptical model, which is an important parameter in many statistical analysis. Based on U-statistics, we develop an estimation method. Theoretically, we show that the proposed estimator is consistent under regular conditions, especially we relax a moment condition and the restriction that the data dimension and the sample size are of the same order. Furthermore, we derive the asymptotic normality of the estimator and evaluate the asymptotic variance through several examples, which allows us to construct a confidence interval. The performance of our method is validated by extensive simulations and real data analysis.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Classical periods of Eisenstein series and Bernoulli polynomials in the equivariant cohomology of a torus
Authors:
Peter Xu
Abstract:
We find group cochains valued in currents giving explicit representatives for the $\text{GL}_2$-equivariant polylogarithm class of a torus. Based on the construction of weight-$2$ Eisenstein series for $\text{GL}_2$ from this polylogarithm class, we give a geometrically-flavored derivation of the classical formulas for the associated Dedekind-Rademacher homomorphisms, i.e. the periods of…
▽ More
We find group cochains valued in currents giving explicit representatives for the $\text{GL}_2$-equivariant polylogarithm class of a torus. Based on the construction of weight-$2$ Eisenstein series for $\text{GL}_2$ from this polylogarithm class, we give a geometrically-flavored derivation of the classical formulas for the associated Dedekind-Rademacher homomorphisms, i.e. the periods of $E^2_{α,β}$ for various nonzero torsion sections $(α, β)$.
△ Less
Submitted 27 August, 2024; v1 submitted 1 August, 2024;
originally announced August 2024.
-
Optimal Batched Linear Bandits
Authors:
Xuanfei Ren,
Tianyuan Jin,
Pan Xu
Abstract:
We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $O(\log\log T)$ batches, and the asymptotically optimal regret with only $3$ batches as $T\rightarrow\infty$, where $T$ is the time horizon. We furthe…
▽ More
We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $O(\log\log T)$ batches, and the asymptotically optimal regret with only $3$ batches as $T\rightarrow\infty$, where $T$ is the time horizon. We further prove a lower bound on the batch complexity of linear contextual bandits showing that any asymptotically optimal algorithm must require at least $3$ batches in expectation as $T\rightarrow\infty$, which indicates E$^4$ achieves the asymptotic optimality in regret and batch complexity simultaneously. To the best of our knowledge, E$^4$ is the first algorithm for linear bandits that simultaneously achieves the minimax and asymptotic optimality in regret with the corresponding optimal batch complexities. In addition, we show that with another choice of exploration rate E$^4$ achieves an instance-dependent regret bound requiring at most $O(\log T)$ batches, and maintains the minimax optimality and asymptotic optimality. We conduct thorough experiments to evaluate our algorithm on randomly generated instances and the challenging \textit{End of Optimism} instances \citep{lattimore2017end} which were shown to be hard to learn for optimism based algorithms. Empirical results show that E$^4$ consistently outperforms baseline algorithms with respect to regret minimization, batch complexity, and computational efficiency.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Long cycles and spectral radii in planar graphs
Authors:
Ping Xu,
Huiqiu Lin,
Longfei Fang
Abstract:
There is a rich history of studying the existence of cycles in planar graphs. The famous Tutte theorem on the Hamilton cycle states that every 4-connected planar graph contains a Hamilton cycle. Later on, Thomassen (1983), Thomas and Yu (1994) and Sanders (1996) respectively proved that every 4-connected planar graph contains a cycle of length $n-1, n-2$ and $n-3$. Chen, Fan and Yu (2004) further…
▽ More
There is a rich history of studying the existence of cycles in planar graphs. The famous Tutte theorem on the Hamilton cycle states that every 4-connected planar graph contains a Hamilton cycle. Later on, Thomassen (1983), Thomas and Yu (1994) and Sanders (1996) respectively proved that every 4-connected planar graph contains a cycle of length $n-1, n-2$ and $n-3$. Chen, Fan and Yu (2004) further conjectured that every 4-connected planar graph contains a cycle of length $\ell$ for $\ell\in\{n,n-1,\ldots,n-25\}$ and they verified that $\ell\in \{n-4, n-5, n-6\}$. When we remove the ``4-connected" condition, how to guarantee the existence of a long cycle in a planar graph? A natural question asks by adding a spectral radius condition: What is the smallest constant $C$ such that for sufficiently large $n$, every graph $G$ of order $n$ with spectral radius greater than $C$ contains a long cycle in a planar graph? In this paper, we give a stronger answer to the above question. Let $G$ be a planar graph with order $n\geq 1.8\times 10^{17}$ and $k\leq \lfloor\log_2(n-3)\rfloor-8$ be a non-negative integer, we show that if $ρ(G)\geq ρ(K_2\vee(P_{n-2k-4}\cup 2P_{k+1}))$ then $G$ contains a cycle of length $\ell$ for every $\ell\in \{n-k, n-k-1, \ldots, 3\}$ unless $G\cong K_2\vee(P_{n-2k-4}\cup 2P_{k+1})$.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Variable Projection Algorithms: Theoretical Insights and A Novel Approach for Problems with Large Residual
Authors:
Guangyong Chen,
Peng Xue,
Min Gan,
Jing Chen,
Wenzhong Guo,
C. L. Philip. Chen
Abstract:
This paper delves into an in-depth exploration of the Variable Projection (VP) algorithm, a powerful tool for solving separable nonlinear optimization problems across multiple domains, including system identification, image processing, and machine learning. We first establish a theoretical framework to examine the effect of the approximate treatment of the coupling relationship among parameters on…
▽ More
This paper delves into an in-depth exploration of the Variable Projection (VP) algorithm, a powerful tool for solving separable nonlinear optimization problems across multiple domains, including system identification, image processing, and machine learning. We first establish a theoretical framework to examine the effect of the approximate treatment of the coupling relationship among parameters on the local convergence of the VP algorithm and theoretically prove that the Kaufman's VP algorithm can achieve a similar convergence rate as the Golub \& Pereyra's form. These studies fill the gap in the existing convergence theory analysis, and provide a solid foundation for understanding the mechanism of VP algorithm and broadening its application horizons. Furthermore, drawing inspiration from these theoretical revelations, we design a refined VP algorithm for handling separable nonlinear optimization problems characterized by large residual, called VPLR, which boosts the convergence performance by addressing the interdependence of parameters within the separable model and by continually correcting the approximated Hessian matrix to counteract the influence of large residual during the iterative process. The effectiveness of this refined algorithm is corroborated through numerical experimentation.
△ Less
Submitted 6 January, 2025; v1 submitted 21 February, 2024;
originally announced February 2024.
-
A note on the universal supersingular quotients of $U(2,1)$
Authors:
Peng Xu
Abstract:
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ defined over a non-archimedean local field $F$ of residue characteristic $p\neq 2$. In this note, we prove the universal supersingular quotients of $G$ are not irreducible in general.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ defined over a non-archimedean local field $F$ of residue characteristic $p\neq 2$. In this note, we prove the universal supersingular quotients of $G$ are not irreducible in general.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Explicit arithmetic Eisenstein cocycles: the toric case
Authors:
Peter Xu
Abstract:
We construct cocycles for $GL_n(\mathbb{Q})$ valued in cup products of units on the $n$-dimensional algebraic torus $\mathbb{G}_m^n$ over an arbitrary DVR, viewed in Milnor $K$-theory or motivic cohomology. We show how various combinatorially-defined complexes encoding linear algebraic data capture the structure of these cup products inside motivic complexes, and examine some properties of the res…
▽ More
We construct cocycles for $GL_n(\mathbb{Q})$ valued in cup products of units on the $n$-dimensional algebraic torus $\mathbb{G}_m^n$ over an arbitrary DVR, viewed in Milnor $K$-theory or motivic cohomology. We show how various combinatorially-defined complexes encoding linear algebraic data capture the structure of these cup products inside motivic complexes, and examine some properties of the resulting classes. This generalizes work of Sharifi and Venkatesh in the case $n=2$, in which case pullbacks of the resulting cocycle to an arithmetic base are of central importance in the Sharifi conjectures.
△ Less
Submitted 20 May, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Authors:
Tianyuan Jin,
Hao-Lun Hsu,
William Chang,
Pan Xu
Abstract:
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $ρ$ overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local…
▽ More
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $ρ$ overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm \citep{verstraeten2020multiagent} and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the $ε$-exploring Multi-Agent Thompson Sampling ($ε$-MATS) algorithm, which performs MATS exploration with probability $ε$ while adopts a greedy policy otherwise. We prove that $ε$-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of $ε$-MATS compared with existing algorithms in the same setting.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Authors:
Zhen Qin,
Zhishuai Liu,
Pan Xu
Abstract:
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffl…
▽ More
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $σ^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + \|σ\|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log (nT)/\sqrt{nT} + \log (nT)\sqrt{n}/\sqrt{T})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log (n_0T)/\sqrt{n_0T} + \log (n_0T)\sqrt{n_0}/\sqrt{T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.
△ Less
Submitted 27 December, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Differential graded manifolds of finite positive amplitude
Authors:
Kai Behrend,
Hsuan-Yi Liao,
Ping Xu
Abstract:
We prove that dg manifolds of finite positive amplitude, i.e. bundles of positively graded curved $L_\infty[1]$-algebras, form a category of fibrant objects. As a main step in the proof, we obtain a factorization theorem using path spaces. First we construct an infinite-dimensional factorization of a diagonal morphism using actual path spaces motivated by the AKSZ construction. Then we cut down to…
▽ More
We prove that dg manifolds of finite positive amplitude, i.e. bundles of positively graded curved $L_\infty[1]$-algebras, form a category of fibrant objects. As a main step in the proof, we obtain a factorization theorem using path spaces. First we construct an infinite-dimensional factorization of a diagonal morphism using actual path spaces motivated by the AKSZ construction. Then we cut down to finite dimensions using the Fiorenza-Manetti method. The main ingredient in our method is the homotopy transfer theorem for curved $L_\infty[1]$-algebras. As an application, we study the derived intersections of manifolds.
△ Less
Submitted 7 February, 2024; v1 submitted 16 July, 2023;
originally announced July 2023.
-
On the structure of étale fibrations of $L_\infty$-bundles
Authors:
Kai Behrend,
Hsuan-Yi Liao,
Ping Xu
Abstract:
We prove that an étale fibration between $L_\infty$-bundles has local sections made up of several elementary morphisms of particularly simple and accessible type. As applications, we prove an inverse function theorem for $L_\infty$-bundles, and give an elementary proof that every weak equivalence of $L_\infty$-bundles induces a quasi-isomorphism of the differential graded algebras of global functi…
▽ More
We prove that an étale fibration between $L_\infty$-bundles has local sections made up of several elementary morphisms of particularly simple and accessible type. As applications, we prove an inverse function theorem for $L_\infty$-bundles, and give an elementary proof that every weak equivalence of $L_\infty$-bundles induces a quasi-isomorphism of the differential graded algebras of global functions. In addition, we apply this inverse function theorem to prove that the homotopy category of $L_\infty$-bundles has a simple description in terms of homotopy classes of morphisms, when we restrict $L_\infty$-bundles to their germs about their classical loci.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
A Theory on Adam Instability in Large-Scale Machine Learning
Authors:
Igor Molybog,
Peter Albert,
Moya Chen,
Zachary DeVito,
David Esiobu,
Naman Goyal,
Punit Singh Koura,
Sharan Narang,
Andrew Poulton,
Ruan Silva,
Binh Tang,
Diana Liskovich,
Puxin Xu,
Yuchen Zhang,
Melanie Kambadur,
Stephen Roller,
Susan Zhang
Abstract:
We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent…
▽ More
We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent on the training loss landscape, leading to divergence. This artifact is more likely to be observed in the training of a deep model with a large batch size, which is the typical setting of large-scale language model training. To argue the theory, we present observations from the training runs of the language models of different scales: 7 billion, 30 billion, 65 billion, and 546 billion parameters.
△ Less
Submitted 25 April, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
The Space of Closed $G_2$-Structures. I. Connections
Authors:
Pengfei Xu,
Kai Zheng
Abstract:
In this article, we develop foundational theory for geometries of the space of closed $G_2$-structures in a given cohomology class as an infinite-dimensional manifold. We introduce Sobolev-type metrics, construct their Levi-Civita connections, formulate geodesic equations, and analyse the variational structures of torsion free $G_2$-structures under these Sobolev-type metrics.
In this article, we develop foundational theory for geometries of the space of closed $G_2$-structures in a given cohomology class as an infinite-dimensional manifold. We introduce Sobolev-type metrics, construct their Levi-Civita connections, formulate geodesic equations, and analyse the variational structures of torsion free $G_2$-structures under these Sobolev-type metrics.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Authors:
Yizhou Zhang,
Guannan Qu,
Pan Xu,
Yiheng Lin,
Zaiwei Chen,
Adam Wierman
Abstract:
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy us…
▽ More
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its $κ$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $κ$. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing $κ$. Numerical simulations demonstrate the effectiveness of LPI.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
$A_\infty$-Algebras from Lie Pairs
Authors:
Mathieu Stiénon,
Luca Vitagliano,
Ping Xu
Abstract:
Given an inclusion $A\hookrightarrow L$ of Lie algebroids sharing the same base manifold $M$, i.e. a Lie pair, we prove that the space $Γ(Λ^\bullet A^\vee)\otimes_{R} \frac{U(L)}{U(L)\cdotΓ(A)}$, where $R=C^\infty(M)$, admits an $A_\infty$-algebra structure, unique up to $A_\infty$-isomorphisms. As a consequence, the Chevalley-Eilenberg cohomology…
▽ More
Given an inclusion $A\hookrightarrow L$ of Lie algebroids sharing the same base manifold $M$, i.e. a Lie pair, we prove that the space $Γ(Λ^\bullet A^\vee)\otimes_{R} \frac{U(L)}{U(L)\cdotΓ(A)}$, where $R=C^\infty(M)$, admits an $A_\infty$-algebra structure, unique up to $A_\infty$-isomorphisms. As a consequence, the Chevalley-Eilenberg cohomology $H^\bullet_{CE} \big( A, \frac{U(L)}{U(L)\cdotΓ(A)} \big)$ admits a canonical associative algebra structure. This $A_\infty$-algebra can be considered as the universal enveloping algebra of the $L_\infty$-algebroid $A[1]\times_M L/A$. Our construction is based on the homotopy equivalence of the $L_\infty$-algebroid $A[1]\times_M L/A$ and the dg Lie algebroid corresponding to the comma double Lie algebroid of Jotz-Mackenzie.
△ Less
Submitted 7 February, 2025; v1 submitted 30 October, 2022;
originally announced October 2022.
-
Quantization of (-1)-Shifted Derived Poisson Manifolds
Authors:
Kai Behrend,
Matt Peddie,
Ping Xu
Abstract:
We investigate the quantization problem of $(-1)$-shifted derived Poisson manifolds in terms of $\BV_\infty$-operators on the space of Berezinian half-densities. We prove that quantizing such a $(-1)$-shifted derived Poisson manifold is equivalent to the lifting of a consecutive sequences of Maurer-Cartan elements of short exact sequences of differential graded Lie algebras, where the obstruction…
▽ More
We investigate the quantization problem of $(-1)$-shifted derived Poisson manifolds in terms of $\BV_\infty$-operators on the space of Berezinian half-densities. We prove that quantizing such a $(-1)$-shifted derived Poisson manifold is equivalent to the lifting of a consecutive sequences of Maurer-Cartan elements of short exact sequences of differential graded Lie algebras, where the obstruction is a certain class in the second Poisson cohomology. Consequently, a $(-1)$-shifted derived Poisson manifold is quantizable if the second Poisson cohomology group vanishes. We also prove that for any $Ł$-algebroid $\Cc{\aV}$, its corresponding linear $(-1)$-shifted derived Poisson manifold $\Cc{\aV}^\vee[-1]$ admits a canonical quantization. Finally, given a Lie algebroid $A$ and a one-cocycle $s\in \sections{A^\vee}$, the $(-1)$-shifted derived Poisson manifold corresponding to the derived intersection of coisotropic submanifolds determined by the graph of $s$ and the zero section of the Lie Poisson $A^\vee$ is shown to admit a canonical quantization in terms of Evens-Lu-Weinstein module.
△ Less
Submitted 26 June, 2023; v1 submitted 4 June, 2022;
originally announced June 2022.
-
$δ$-$r$-Hyperideals and $φ$-$δ$-$r$-Hyperideals of Commutative Krasner Hyperrings
Authors:
Peng Xu,
Melis Bolat,
Elif Kaya,
Serkan Onar,
Bayram Ali Ersoy,
Kostaq Hila
Abstract:
In this paper, our purpose is to define the expansion of $r$-hyperideals and extend this concept to $φ$-$δ$-$r$-hyperideal. Let $\Re$ be a commutative Krasner hyperring with nonzero identity. Given an expansion $δ$ of hyperideals, a proper hyperideal $N$ of $\Re$ is called $δ$-$r$-hyperideal if $a\cdot b\in N$ with $ann(a)=0$ implies that $b\in δ(N)$, for all $a,b\in\Re$. Therefore, given an expan…
▽ More
In this paper, our purpose is to define the expansion of $r$-hyperideals and extend this concept to $φ$-$δ$-$r$-hyperideal. Let $\Re$ be a commutative Krasner hyperring with nonzero identity. Given an expansion $δ$ of hyperideals, a proper hyperideal $N$ of $\Re$ is called $δ$-$r$-hyperideal if $a\cdot b\in N$ with $ann(a)=0$ implies that $b\in δ(N)$, for all $a,b\in\Re$. Therefore, given an expansion $δ$ of hyperideals and a hyperideal reduction $φ$, a proper hyperideal $N$ of $\Re$ is called $φ$-$δ$-$r$-hyperideal if $a\cdot b\in N-φ(N)$ with $ann(a)=0$ implies that $b\inδ(N)$, for all $a,b\in\Re$. We investigate some of their properties and give some examples.
△ Less
Submitted 5 April, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
$r$-Hyperideals and Generalizations of $r$-Hyperideals in Krasner Hyperrings
Authors:
Peng Xu,
Melis Bolat,
Elif Kaya,
Serkan Onar,
Bayram Ali Ersoy,
Kostaq Hila
Abstract:
In this study, we examine some properties of $r$-hyperideals in the commutative Krasner hyperrings. Some properties of $pr$-hyperideals are also studied. The relation between prime hyperideals and $r$-hyperideals is investigated. We show that the image and the inverse image of an $r$-hyperideal is also an $r$-hyperideal. We also introduce a generalization of r-hyperideals and we prove some propert…
▽ More
In this study, we examine some properties of $r$-hyperideals in the commutative Krasner hyperrings. Some properties of $pr$-hyperideals are also studied. The relation between prime hyperideals and $r$-hyperideals is investigated. We show that the image and the inverse image of an $r$-hyperideal is also an $r$-hyperideal. We also introduce a generalization of r-hyperideals and we prove some properties of them.
△ Less
Submitted 2 February, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Inexact Newton-CG Algorithms With Complexity Guarantees
Authors:
Zhewei Yao,
Peng Xu,
Fred Roosta,
Stephen J. Wright,
Michael W. Mahoney
Abstract:
We consider variants of a recently-developed Newton-CG algorithm for nonconvex problems \citep{royer2018newton} in which inexact estimates of the gradient and the Hessian information are used for various steps. Under certain conditions on the inexactness measures, we derive iteration complexity bounds for achieving $ε$-approximate second-order optimality that match best-known lower bounds. Our ine…
▽ More
We consider variants of a recently-developed Newton-CG algorithm for nonconvex problems \citep{royer2018newton} in which inexact estimates of the gradient and the Hessian information are used for various steps. Under certain conditions on the inexactness measures, we derive iteration complexity bounds for achieving $ε$-approximate second-order optimality that match best-known lower bounds. Our inexactness condition on the gradient is adaptive, allowing for crude accuracy in regions with large gradients. We describe two variants of our approach, one in which the step-size along the computed search direction is chosen adaptively and another in which the step-size is pre-defined. To obtain second-order optimality, our algorithms will make use of a negative curvature direction on some steps. These directions can be obtained, with high-probability, using a certain randomized algorithm. In this sense, all of our results hold with high-probability over the run of the algorithm. We evaluate the performance of our proposed algorithms empirically on several machine learning models.
△ Less
Submitted 10 April, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning
Authors:
Minghan Yang,
Dong Xu,
Qiwen Cui,
Zaiwen Wen,
Pengxiang Xu
Abstract:
In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies th…
▽ More
In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Dg manifolds, formal exponential maps and homotopy Lie algebras
Authors:
Seokbong Seol,
Mathieu Stiénon,
Ping Xu
Abstract:
This paper is devoted to the study of the relation between `formal exponential maps,' the Atiyah class, and Kapranov $L_\infty[1]$ algebras associated with dg manifolds in the $C^\infty$ context. Given a dg manifold, we prove that a `formal exponential map' exists if and only if the Atiyah class vanishes. Inspired by Kapranov's construction of a homotopy Lie algebra associated with the holomorphic…
▽ More
This paper is devoted to the study of the relation between `formal exponential maps,' the Atiyah class, and Kapranov $L_\infty[1]$ algebras associated with dg manifolds in the $C^\infty$ context. Given a dg manifold, we prove that a `formal exponential map' exists if and only if the Atiyah class vanishes. Inspired by Kapranov's construction of a homotopy Lie algebra associated with the holomorphic tangent bundle of a complex manifold, we prove that the space of vector fields on a dg manifold admits an $L_\infty[1]$ algebra structure, unique up to isomorphism, whose unary bracket is the Lie derivative w.r.t. the homological vector field, whose binary bracket is a 1-cocycle representative of the Atiyah class, and whose higher multibrackets can be computed by a recursive formula. For the dg manifold $(T_X^{0,1}[1],\bar{\partial})$ arising from a complex manifold $X$, we prove that this $L_\infty[1]$ algebra structure is quasi-isomorphic to the standard $L_\infty[1]$ algebra structure on the Dolbeault complex $Ω^{0,\bullet}(T^{1,0}_X)$.
△ Less
Submitted 19 November, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Hochschild cohomology of dg manifolds associated to integrable distributions
Authors:
Zhuo Chen,
Maosong Xiang,
Ping Xu
Abstract:
For the field $\mathbb{K} = \mathbb{R}$ or $\mathbb{C}$, and an integrable distribution $F \subseteq T_M \otimes_{\mathbb{R}} \mathbb{K}$ on a smooth manifold $M$, we study the Hochschild cohomology of the dg manifold $(F[1],d_F)$ and establish a canonical isomorphism with the Hochschild cohomology of the algebra of functions on leaf space in terms of transversal polydifferential operators of $F$.…
▽ More
For the field $\mathbb{K} = \mathbb{R}$ or $\mathbb{C}$, and an integrable distribution $F \subseteq T_M \otimes_{\mathbb{R}} \mathbb{K}$ on a smooth manifold $M$, we study the Hochschild cohomology of the dg manifold $(F[1],d_F)$ and establish a canonical isomorphism with the Hochschild cohomology of the algebra of functions on leaf space in terms of transversal polydifferential operators of $F$. In particular, for the dg manifold $(T_X^{0,1}[1],\bar{\partial})$ associated with a complex manifold $X$, we prove that its Hochschild cohomology is canonically isomorphic to the Hochschild cohomology $HH^{\bullet}(X)$ of the complex manifold $X$. As an application, we show that the Duflo-Kontsevich type theorem for the dg manifold $(T_X^{0,1}[1],\bar{\partial})$ implies the Duflo-Kontsevich theorem for complex manifolds.
△ Less
Submitted 3 August, 2022; v1 submitted 14 March, 2021;
originally announced March 2021.
-
High-bandwidth nonlinear control for soft actuators with recursive network models
Authors:
Sarah Aguasvivas Manzano,
Patricia Xu,
Khoi Ly,
Robert Shepherd,
Nikolaus Correll
Abstract:
We present a high-bandwidth, lightweight, and nonlinear output tracking technique for soft actuators that combines parsimonious recursive layers for forward output predictions and online optimization using Newton-Raphson. This technique allows for reduced model sizes and increased control loop frequencies when compared with conventional RNN models. Experimental results of this controller prototype…
▽ More
We present a high-bandwidth, lightweight, and nonlinear output tracking technique for soft actuators that combines parsimonious recursive layers for forward output predictions and online optimization using Newton-Raphson. This technique allows for reduced model sizes and increased control loop frequencies when compared with conventional RNN models. Experimental results of this controller prototype on a single soft actuator with soft positional sensors indicate effective tracking of referenced spatial trajectories and rejection of mechanical and electromagnetic disturbances. These are evidenced by root mean squared path tracking errors (RMSE) of 1.8mm using a fully connected (FC) substructure, 1.62mm using a gated recurrent unit (GRU) and 2.11mm using a long short term memory (LSTM) unit, all averaged over three tasks. Among these models, the highest flash memory requirement is 2.22kB enabling co-location of controller and actuator.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling
Authors:
Difan Zou,
Pan Xu,
Quanquan Gu
Abstract:
We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4ε^{-2})$ stochastic gradient evaluations suff…
▽ More
We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4ε^{-2})$ stochastic gradient evaluations suffice to guarantee $ε$-sampling error in terms of the total variation distance, where $d$ is the problem dimension. This improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $ε$-sampling error within $\tilde O(d^{15/4}ε^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin-based algorithms and sheds some light on the design of fast stochastic gradient-based sampling algorithms.
△ Less
Submitted 23 February, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Sketchy Empirical Natural Gradient Methods for Deep Learning
Authors:
Minghan Yang,
Dong Xu,
Zaiwen Wen,
Mengyun Chen,
Pengxiang Xu
Abstract:
In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement…
▽ More
In this paper, we develop an efficient sketchy empirical natural gradient method (SENG) for large-scale deep learning problems. The empirical Fisher information matrix is usually low-rank since the sampling is only practical on a small amount of data at each iteration. Although the corresponding natural gradient direction lies in a small subspace, both the computational cost and memory requirement are still not tractable due to the high dimensionality. We design randomized techniques for different neural network structures to resolve these challenges. For layers with a reasonable dimension, sketching can be performed on a regularized least squares subproblem. Otherwise, since the gradient is a vectorization of the product between two matrices, we apply sketching on the low-rank approximations of these matrices to compute the most expensive parts. A distributed version of SENG is also developed for extremely large-scale applications. Global convergence to stationary points is established under some mild assumptions and a fast linear convergence is analyzed under the neural tangent kernel (NTK) case. Extensive experiments on convolutional neural networks show the competitiveness of SENG compared with the state-of-the-art methods. On the task ResNet50 with ImageNet-1k, SENG achieves 75.9\% Top-1 testing accuracy within 41 epochs. Experiments on the distributed large-batch training show that the scaling efficiency is quite reasonable.
△ Less
Submitted 25 March, 2021; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Derived Differentiable Manifolds
Authors:
Kai Behrend,
Hsuan-Yi Liao,
Ping Xu
Abstract:
We develop the theory of derived differential geometry in terms of bundles of curved $L_\infty[1]$-algebras, i.e. dg manifolds of positive amplitudes. We prove the category of derived manifolds is a category of fibrant objects. Therefore, we can make sense of "homotopy fibered product" and "derived intersection" of submaifolds in a smooth manifold in the homotopy category of derived manifolds. We…
▽ More
We develop the theory of derived differential geometry in terms of bundles of curved $L_\infty[1]$-algebras, i.e. dg manifolds of positive amplitudes. We prove the category of derived manifolds is a category of fibrant objects. Therefore, we can make sense of "homotopy fibered product" and "derived intersection" of submaifolds in a smooth manifold in the homotopy category of derived manifolds. We construct a factorization of the diagonal using path spaces. First we construct an infinite-dimensional factorization using actual path spaces motivated by the AKSZ construction, then we cut down to finite dimensions using the Fiorenza-Manetti method. The main ingredient is the homotopy transfer theorem for curved $L_\infty[1]$-algebras.
We also prove the inverse function theorem for derived manifolds, and investigate the relationship between weak equivalence and quasi-isomorphism for derived manifolds.
△ Less
Submitted 13 June, 2021; v1 submitted 2 June, 2020;
originally announced June 2020.
-
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Authors:
Yue Wu,
Weitong Zhang,
Pan Xu,
Quanquan Gu
Abstract:
Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. How…
▽ More
Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., $\|\nabla J(\boldsymbolθ)\|_2^2 \le ε$) of the non-concave performance function $J(\boldsymbolθ)$, with $\mathcal{\tilde{O}}(ε^{-2.5})$ sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
△ Less
Submitted 10 October, 2022; v1 submitted 4 May, 2020;
originally announced May 2020.
-
MOTS: Minimax Optimal Thompson Sampling
Authors:
Tianyuan Jin,
Pan Xu,
Jieming Shi,
Xiaokui Xiao,
Quanquan Gu
Abstract:
Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound $Ω(\sqrt{KT})$ for $K$-armed bandit problems, where…
▽ More
Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound $Ω(\sqrt{KT})$ for $K$-armed bandit problems, where $T$ is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(\sqrt{KT})$ for finite time horizon $T$, as well as the asymptotic optimal regret bound for Gaussian rewards when $T$ approaches infinity. To our knowledge, MOTS is the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandit problems.
△ Less
Submitted 1 October, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Authors:
Pan Xu,
Quanquan Gu
Abstract:
Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decisio…
▽ More
Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.
△ Less
Submitted 3 March, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Authors:
Pan Xu,
Felicia Gao,
Quanquan Gu
Abstract:
Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/ε^{3/2})$ episodes to find an $ε$-approximate stationary point of the nonconcave performance function $J(\boldsymbolθ)$ (i.…
▽ More
Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/ε^{3/2})$ episodes to find an $ε$-approximate stationary point of the nonconcave performance function $J(\boldsymbolθ)$ (i.e., $\boldsymbolθ$ such that $\|\nabla J(\boldsymbolθ)\|_2^2\leqε$). This sample complexity improves the existing result $O(1/ε^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/ε^{1/6})$. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.
△ Less
Submitted 1 August, 2021; v1 submitted 18 September, 2019;
originally announced September 2019.
-
Mod-$p$ maximal compact inductions do not have irreducible admissible subrepresentations
Authors:
Peng Xu
Abstract:
Let $p$ be a prime number. We show in this short note that mod-$p$ maximal compact inductions of a $p$-adic split reductive group do not have irreducible admissible subrepresentations.
Let $p$ be a prime number. We show in this short note that mod-$p$ maximal compact inductions of a $p$-adic split reductive group do not have irreducible admissible subrepresentations.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
A Class of Distributed Event-Triggered Average Consensus Algorithms for Multi-Agent Systems
Authors:
Ping Xu,
Cameron Nowzari,
Zhi Tian
Abstract:
This paper proposes a class of distributed event-triggered algorithms that solve the average consensus problem in multi-agent systems. By designing events such that a specifically chosen Lyapunov function is monotonically decreasing, event-triggered algorithms succeed in reducing communications among agents while still ensuring that the entire system converges to the desired state. However, depend…
▽ More
This paper proposes a class of distributed event-triggered algorithms that solve the average consensus problem in multi-agent systems. By designing events such that a specifically chosen Lyapunov function is monotonically decreasing, event-triggered algorithms succeed in reducing communications among agents while still ensuring that the entire system converges to the desired state. However, depending on the chosen Lyapunov function the transient behaviors can be very different. Moreover, performance requirements also vary from application to application. Consequently, we are instead interested in considering a class of Lyapunov functions such that each Lyapunov function produces a different event-triggered coordination algorithm to solve the multi-agent average consensus problem. The proposed class of algorithms all guarantee exponential convergence of the resulting system and exclusion of Zeno behaviors. This allows us to easily implement different algorithms that all guarantee correctness to meet varying performance needs. We show that our findings can be applied to the practical clock synchronization problem in wireless sensor networks (WSNs) and further corroborate their effectiveness with simulation results.
△ Less
Submitted 23 November, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Authors:
Pan Xu,
Felicia Gao,
Quanquan Gu
Abstract:
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $ε$-approximate stationary point of the performance function within $O(1/ε^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/ε^2)$ by a factor of…
▽ More
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $ε$-approximate stationary point of the performance function within $O(1/ε^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/ε^2)$ by a factor of $O(1/ε^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on reinforcement learning benchmark tasks.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
On certain Iwahori--Hecke modules of $GL_3$ in characteristic $p$
Authors:
Peng Xu
Abstract:
In this note, we show that the natural analogue of certain finiteness result of Barthel--Livn$\acute{\text{e}}$ on $GL_2$ fails for $GL_3$. More precisely, within the pro-$p$-Iwahori invariants of a maximal compact induction of $GL_3$, we show there exist non-zero Iwahori--Hecke submodules of \emph{infinite} codimension.
In this note, we show that the natural analogue of certain finiteness result of Barthel--Livn$\acute{\text{e}}$ on $GL_2$ fails for $GL_3$. More precisely, within the pro-$p$-Iwahori invariants of a maximal compact induction of $GL_3$, we show there exist non-zero Iwahori--Hecke submodules of \emph{infinite} codimension.
△ Less
Submitted 7 April, 2020; v1 submitted 6 February, 2019;
originally announced February 2019.
-
Restriction of $p$-modular representations of $U(2, 1)$ to a Borel subgroup
Authors:
Peng Xu
Abstract:
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ defined over a non-archimedean local field $F$ of odd residue characteristic $p$, and $B$ be the standard Borel subgroup of $G$. In this note, we study the problem of the restriction of irreducible smooth $\overline{\mathbf{F}}_p$-representations of $G$ to $B$, and we obtain various results which are analogous to that of Pa$\check{\text{s}}$k…
▽ More
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ defined over a non-archimedean local field $F$ of odd residue characteristic $p$, and $B$ be the standard Borel subgroup of $G$. In this note, we study the problem of the restriction of irreducible smooth $\overline{\mathbf{F}}_p$-representations of $G$ to $B$, and we obtain various results which are analogous to that of Pa$\check{\text{s}}$k$\bar{\text{u}}$nas on $GL_2 (F)$ (\cite{Pas07}).
△ Less
Submitted 7 January, 2025; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Polyvector fields and polydifferential operators associated with Lie pairs
Authors:
Ruggero Bandiera,
Mathieu Stiénon,
Ping Xu
Abstract:
We prove that the spaces $\operatorname{tot}\big(Γ(Λ^\bullet A^\vee \otimes_R\mathcal{T}_{\operatorname{poly}}^{\bullet}\big)$ and $\operatorname{tot}\big(Γ(Λ^\bullet A^\vee)\otimes_R\mathcal{D}_{\operatorname{poly}}^{\bullet}\big)$ associated with a Lie pair $(L,A)$ each carry an $L_\infty$ algebra structure canonical up to an $L_\infty$ isomorphism with the identity map as linear part. These two…
▽ More
We prove that the spaces $\operatorname{tot}\big(Γ(Λ^\bullet A^\vee \otimes_R\mathcal{T}_{\operatorname{poly}}^{\bullet}\big)$ and $\operatorname{tot}\big(Γ(Λ^\bullet A^\vee)\otimes_R\mathcal{D}_{\operatorname{poly}}^{\bullet}\big)$ associated with a Lie pair $(L,A)$ each carry an $L_\infty$ algebra structure canonical up to an $L_\infty$ isomorphism with the identity map as linear part. These two spaces serve, respectively, as replacements for the spaces of formal polyvector fields and formal polydifferential operators on the Lie pair $(L,A)$. Consequently, both $\mathbb{H}^\bullet_{\operatorname{CE}}(A,\mathcal{T}_{\operatorname{poly}}^{\bullet})$ and $\mathbb{H}^\bullet_{\operatorname{CE}}(A,\mathcal{D}_{\operatorname{poly}}^{\bullet})$ admit unique Gerstenhaber algebra structures. Our approach is based on homotopy transfer and the construction of a Fedosov dg Lie algebroid (i.e. a dg foliation on a Fedosov dg manifold).
△ Less
Submitted 19 May, 2021; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Sample Efficient Stochastic Variance-Reduced Cubic Regularization Method
Authors:
Dongruo Zhou,
Pan Xu,
Quanquan Gu
Abstract:
We propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum efficiently in nonconvex optimization. The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods. At the heart of our analysis is the choice of a constant batch size of Hessian matrix comput…
▽ More
We propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum efficiently in nonconvex optimization. The proposed algorithm achieves a lower sample complexity of Hessian matrix computation than existing cubic regularization based methods. At the heart of our analysis is the choice of a constant batch size of Hessian matrix computation at each iteration and the stochastic variance reduction techniques. In detail, for a nonconvex function with $n$ component functions, Lite-SVRC converges to the local minimum within $\tilde{O}(n+n^{2/3}/ε^{3/2})$ Hessian sample complexity, which is faster than all existing cubic regularization based methods. Numerical experiments with different nonconvex optimization problems conducted on real datasets validate our theoretical results.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
P-moves between pants-block decompositions of 3-manifolds
Authors:
Pengcheng Xu
Abstract:
A pants-block decomposition of a 3-manifold is similar to a triangulation of a 3-manifold in many aspects. In this paper we show that any two pants-block decompositions of a 3-manifold are related by a finite sequence of moves which are called P-moves. The P-moves between pants-block decompositions are similar to the Pachner moves between triangulations. Moreover, we also give a list of types of P…
▽ More
A pants-block decomposition of a 3-manifold is similar to a triangulation of a 3-manifold in many aspects. In this paper we show that any two pants-block decompositions of a 3-manifold are related by a finite sequence of moves which are called P-moves. The P-moves between pants-block decompositions are similar to the Pachner moves between triangulations. Moreover, we also give a list of types of P-moves. The main tools we used in this paper are the Morse 2-functions, Reeb complexes and a new 2-dimensional complex called P-complex.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Newton-MR: Inexact Newton Method With Minimum Residual Sub-problem Solver
Authors:
Fred Roosta,
Yang Liu,
Peng Xu,
Michael W. Mahoney
Abstract:
We consider a variant of inexact Newton Method, called Newton-MR, in which the least-squares sub-problems are solved approximately using Minimum Residual method. By construction, Newton-MR can be readily applied for unconstrained optimization of a class of non-convex problems known as invex, which subsumes convexity as a sub-class. For invex optimization, instead of the classical Lipschitz continu…
▽ More
We consider a variant of inexact Newton Method, called Newton-MR, in which the least-squares sub-problems are solved approximately using Minimum Residual method. By construction, Newton-MR can be readily applied for unconstrained optimization of a class of non-convex problems known as invex, which subsumes convexity as a sub-class. For invex optimization, instead of the classical Lipschitz continuity assumptions on gradient and Hessian, Newton-MR's global convergence can be guaranteed under a weaker notion of joint regularity of Hessian and gradient. We also obtain Newton-MR's problem-independent local convergence to the set of minima. We show that fast local/global convergence can be guaranteed under a novel inexactness condition, which, to our knowledge, is much weaker than the prior related works. Numerical results demonstrate the performance of Newton-MR as compared with several other Newton-type alternatives on a few machine learning problems.
△ Less
Submitted 5 May, 2022; v1 submitted 29 September, 2018;
originally announced October 2018.
-
The Kirchhoff Index of Enhanced Hypercubes
Authors:
Ping Xu,
Qiongxiang Huang
Abstract:
Let $\{e_{1},\ldots,e_{n}\}$ be the standard basis of abelian group $Z_{2}^{n}$, which can be also viewed as a linear space of dimension $n$ over the Galois filed $F_{2}$, and $ε_{k}=e_k+e_{k+1}+\cdots+e_n$ for some $1\le k\le n-1$. It is well known that the so called enhanced hypercube $Q_{n, k}(1\le k \le n-1)$ is just the Cayley graph $Cay(Z_{2}^{n},S)$ where $S=\{e_{1},\ldots, e_{n},ε_{k}\}$.…
▽ More
Let $\{e_{1},\ldots,e_{n}\}$ be the standard basis of abelian group $Z_{2}^{n}$, which can be also viewed as a linear space of dimension $n$ over the Galois filed $F_{2}$, and $ε_{k}=e_k+e_{k+1}+\cdots+e_n$ for some $1\le k\le n-1$. It is well known that the so called enhanced hypercube $Q_{n, k}(1\le k \le n-1)$ is just the Cayley graph $Cay(Z_{2}^{n},S)$ where $S=\{e_{1},\ldots, e_{n},ε_{k}\}$. In this paper, we obtain the spectrum of $Q_{n, k}$, from which we give an exact formula of the Kirchhoff index of the enhanced hypercube $Q_{n, k}$. Furthermore, we prove that, for a given $n$, $Kf(Q_{n, k})$ is increased with the increase of $k$. Finally, we get $\lim\limits_{n\to\infty}\frac{Kf(Q_{n, k})}{\frac{2^{2n}}{n+1}}=1$.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
An Exact Upper Bound on the $L^p$ Lebesgue Constant and The $\infty$-Rényi Entropy Power Inequality for Integer Valued Random Variables
Authors:
Peng Xu,
Mokshay Madiman,
James Melbourne
Abstract:
In this paper, we proved an exact asymptotically sharp upper bound of the $L^p$ Lebesgue Constant (i.e. the $L^p$ norm of Dirichlet kernel) for $p\ge 2$. As an application, we also verified the implication of a new $\infty $-Rényi entropy power inequality for integer valued random variables.
In this paper, we proved an exact asymptotically sharp upper bound of the $L^p$ Lebesgue Constant (i.e. the $L^p$ norm of Dirichlet kernel) for $p\ge 2$. As an application, we also verified the implication of a new $\infty $-Rényi entropy power inequality for integer valued random variables.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Stochastic Nested Variance Reduction for Nonconvex Optimization
Authors:
Dongruo Zhou,
Pan Xu,
Quanquan Gu
Abstract:
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each…
▽ More
We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses $K+1$ nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an $ε$-approximate first-order stationary point (i.e., $\|\nabla F(\mathbf{x})\|_2\leq ε$) within $\tilde O(n\land ε^{-2}+ε^{-3}\land n^{1/2}ε^{-2})$ number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG $O(n+n^{2/3}ε^{-2})$ and that of SCSG $O(n\land ε^{-2}+ε^{-10/3}\land n^{2/3}ε^{-2})$. For gradient dominated functions, our algorithm also achieves better gradient complexity than the state-of-the-art algorithms. Thorough experimental results on different nonconvex optimization problems back up our theory.
△ Less
Submitted 19 October, 2020; v1 submitted 20 June, 2018;
originally announced June 2018.
-
On the pro-$p$-Iwahori invariants of supersingular representations of unramified $U(2, 1)$
Authors:
Peng Xu
Abstract:
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ over a non-archimedean local field $F$ of odd residue characteristic $p$. In this paper, for any supersingular representation of $G$ that contains the Steinberg weight, we prove its pro-$p$-Iwahori invariants, as a right module over the pro-$p$-Iwahori--Hecke algebra of $G$, is \emph{not} simple.
Let $G$ be the unramified unitary group $U(2, 1)(E/F)$ over a non-archimedean local field $F$ of odd residue characteristic $p$. In this paper, for any supersingular representation of $G$ that contains the Steinberg weight, we prove its pro-$p$-Iwahori invariants, as a right module over the pro-$p$-Iwahori--Hecke algebra of $G$, is \emph{not} simple.
△ Less
Submitted 8 June, 2018; v1 submitted 25 March, 2018;
originally announced March 2018.
-
Shifted Poisson structures on differentiable stacks
Authors:
Francesco Bonechi,
Nicola Ciccoli,
Camille Laurent-Gengoux,
Ping Xu
Abstract:
The purpose of this paper is to investigate shifted $(+1)$ Poisson structures in context of differential geometry. The relevant notion is shifted $(+1)$ Poisson structures on differentiable stacks. More precisely, we develop the notion of Morita equivalence of quasi-Poisson groupoids. Thus isomorphism classes of $(+1)$ Poisson stack correspond to Morita equivalence classes of quasi-Poisson groupoi…
▽ More
The purpose of this paper is to investigate shifted $(+1)$ Poisson structures in context of differential geometry. The relevant notion is shifted $(+1)$ Poisson structures on differentiable stacks. More precisely, we develop the notion of Morita equivalence of quasi-Poisson groupoids. Thus isomorphism classes of $(+1)$ Poisson stack correspond to Morita equivalence classes of quasi-Poisson groupoids. In the process, we carry out the following programs of independent interests:
(1) We introduce a $\mathbb Z$-graded Lie 2-algebra of polyvector fields on a given Lie groupoid and prove that its homotopy equivalence class is invariant under Morita equivalence of Lie groupoids, thus can be considered as polyvector fields on the corresponding differentiable stack ${\mathfrak X}$. It turns out that shifted $(+1)$ Poisson structures on ${\mathfrak X}$ correspond exactly to elements of the Maurer-Cartan moduli set of the corresponding dgla.
(2) We introduce the notion of tangent complex $T_{\mathfrak X}$ and cotangent complex $L_{\mathfrak X}$ of a differentiable stack ${\mathfrak X}$ in terms of any Lie groupoid $Γ{\rightrightarrows} M$ representing ${\mathfrak X}$. They correspond to homotopy class of 2-term homotopy $Γ$-modules $A[1]\rightarrow TM$ and $T^\vee M\rightarrow A^\vee[-1]$, respectively. We prove that a $(+1)$-shifted Poisson structure on a differentiable stack ${\mathfrak X}$, defines a morphism ${L_{\mathfrak X}}[1]\to {T_{\mathfrak X}}$.
△ Less
Submitted 29 September, 2020; v1 submitted 18 March, 2018;
originally announced March 2018.
-
Inexact Non-Convex Newton-Type Methods
Authors:
Zhewei Yao,
Peng Xu,
Farbod Roosta-Khorasani,
Michael W. Mahoney
Abstract:
For solving large-scale non-convex problems, we propose inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations. In particular, in addition to approximate sub-problem solves, both the Hessian and the gradient are suitably approximated. Using rather mild conditions on such approximations, we show that our proposed…
▽ More
For solving large-scale non-convex problems, we propose inexact variants of trust region and adaptive cubic regularization methods, which, to increase efficiency, incorporate various approximations. In particular, in addition to approximate sub-problem solves, both the Hessian and the gradient are suitably approximated. Using rather mild conditions on such approximations, we show that our proposed inexact methods achieve similar optimal worst-case iteration complexities as the exact counterparts. Our proposed algorithms, and their respective theoretical analysis, do not require knowledge of any unknowable problem-related quantities, and hence are easily implementable in practice. In the context of finite-sum problems, we then explore randomized sub-sampling methods as ways to construct the gradient and Hessian approximations and examine the empirical performance of our algorithms on some real datasets.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
Stochastic Variance-Reduced Cubic Regularized Newton Method
Authors:
Dongruo Zhou,
Pan Xu,
Quanquan Gu
Abstract:
We propose a stochastic variance-reduced cubic regularized Newton method for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(ε,\sqrtε)$-approximately local minimum within…
▽ More
We propose a stochastic variance-reduced cubic regularized Newton method for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(ε,\sqrtε)$-approximately local minimum within $\tilde{O}(n^{4/5}/ε^{3/2})$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to high-order non-convex optimization methods. Thorough experiments on various non-convex optimization problems support our theory.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.