-
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Authors:
Yuchen Liang,
Renxiang Huang,
Lifeng Lai,
Ness Shroff,
Yingbin Liang
Abstract:
Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation qu…
▽ More
Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation quality compared to uniform rate matrices, existing theoretical works have largely focused on the uniform rate matrices case. Notably, convergence guarantees and error analyses for absorbing diffusion models are still missing. In this work, we provide the first finite-time error bounds and convergence rate analysis for discrete diffusion models using absorbing rate matrices. We begin by deriving an upper bound on the KL divergence of the forward process, introducing a surrogate initialization distribution to address the challenge posed by the absorbing stationary distribution, which is a singleton and causes the KL divergence to be ill-defined. We then establish the first convergence guarantees for both the $τ$-leaping and uniformization samplers under absorbing rate matrices, demonstrating improved rates over their counterparts using uniform rate matrices. Furthermore, under suitable assumptions, we provide convergence guarantees without early stopping. Our analysis introduces several new technical tools to address challenges unique to absorbing rate matrices. These include a Jensen-type argument for bounding forward process convergence, novel techniques for bounding absorbing score functions, and a non-divergent upper bound on the score near initialization that removes the need of early-stopping.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
On the irrationality of certain $p$-adic zeta values
Authors:
Li Lai,
Cezar Lupu,
Johannes Sprang
Abstract:
A famous theorem of Zudilin states that at least one of the Riemann zeta values $ζ(5), ζ(7), ζ(9), ζ(11)$ is irrational. In this paper, we establish the $p$-adic analogue of Zudilin's theorem. As a weaker form of our result, it is proved that for any prime number $p \geqslant 5$ there exists an odd integer $i$ in the interval $[3,p+p/\log p+5]$ such that the $p$-adic zeta value $ζ_p(i)$ is irratio…
▽ More
A famous theorem of Zudilin states that at least one of the Riemann zeta values $ζ(5), ζ(7), ζ(9), ζ(11)$ is irrational. In this paper, we establish the $p$-adic analogue of Zudilin's theorem. As a weaker form of our result, it is proved that for any prime number $p \geqslant 5$ there exists an odd integer $i$ in the interval $[3,p+p/\log p+5]$ such that the $p$-adic zeta value $ζ_p(i)$ is irrational.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
A partial result towards the Chowla--Milnor conjecture
Authors:
Li Lai,
Jia Li
Abstract:
The Chowla--Milnor conjecture predicts the linear independence of certain Hurwitz zeta values. In this paper, we prove that for any fixed integer $k \geqslant 2$, the dimension of the $\mathbb{Q}$-linear span of $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$ ($1 \leqslant a < q/2$, $\gcd(a,q)=1$) is at least $(c_k -o(1)) \cdot \log q$ as the positive integer $q \to +\infty$ for some constant $c_k>0$ depending only…
▽ More
The Chowla--Milnor conjecture predicts the linear independence of certain Hurwitz zeta values. In this paper, we prove that for any fixed integer $k \geqslant 2$, the dimension of the $\mathbb{Q}$-linear span of $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$ ($1 \leqslant a < q/2$, $\gcd(a,q)=1$) is at least $(c_k -o(1)) \cdot \log q$ as the positive integer $q \to +\infty$ for some constant $c_k>0$ depending only on $k$. It is well known that $ζ(k,a/q)+(-1)^{k}ζ(k,1-a/q) \in \overline{\mathbb{Q}}π^k$, but much less is known previously for $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$. Our proof is similar to those of Ball--Rivoal (2001) and Zudilin (2002) concerning the linear independence of Riemann zeta values. However, we use a new type of rational functions to construct linear forms.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
A note on the irrationality of $ζ_2(5)$
Authors:
Li Lai,
Johannes Sprang,
Wadim Zudilin
Abstract:
In a spirit of Apéry's proof of the irrationality of $ζ(3)$, we construct a sequence $p_n/q_n$ of rational approximations to the $2$-adic zeta value $ζ_2(5)$ which satisfy $0 < |ζ_2(5)-p_n/q_n|_2 < \max\{|p_n|,|q_n|\}^{-1-δ}$ for an explicit constant $δ>0$. This leads to a new proof of the irrationality of $ζ_2(5)$, the result established recently by Calegari, Dimitrov and Tang using a different m…
▽ More
In a spirit of Apéry's proof of the irrationality of $ζ(3)$, we construct a sequence $p_n/q_n$ of rational approximations to the $2$-adic zeta value $ζ_2(5)$ which satisfy $0 < |ζ_2(5)-p_n/q_n|_2 < \max\{|p_n|,|q_n|\}^{-1-δ}$ for an explicit constant $δ>0$. This leads to a new proof of the irrationality of $ζ_2(5)$, the result established recently by Calegari, Dimitrov and Tang using a different method. Furthermore, our approximations allow us to obtain an upper bound for the irrationality measure of this $2$-adic quantity; namely, we show that $μ(ζ_2(5)) \le (16\log2)/(8\log2-5) = 20.342\dots$.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
A note on the number of irrational odd zeta values, II
Authors:
Li Lai
Abstract:
We prove that there are at least $1.284 \cdot \sqrt{s/\log s}$ irrational numbers among $ζ(3)$, $ζ(5)$, $ζ(7)$, $\ldots$, $ζ(s-1)$ for any sufficiently large even integer $s$. This result improves upon the previous finding by a constant factor. The proof combines the elimination technique of Fischler-Sprang-Zudilin (2019) with the $Φ_n$ factor method of Zudilin (2001).
We prove that there are at least $1.284 \cdot \sqrt{s/\log s}$ irrational numbers among $ζ(3)$, $ζ(5)$, $ζ(7)$, $\ldots$, $ζ(s-1)$ for any sufficiently large even integer $s$. This result improves upon the previous finding by a constant factor. The proof combines the elimination technique of Fischler-Sprang-Zudilin (2019) with the $Φ_n$ factor method of Zudilin (2001).
△ Less
Submitted 12 January, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
Stability of first-order methods in tame optimization
Authors:
Lexiao Lai
Abstract:
Modern data science applications demand solving large-scale optimization problems. The prevalent approaches are first-order methods, valued for their scalability. These methods are implemented to tackle highly irregular problems where assumptions of convexity and smoothness are untenable.
Seeking to deepen the understanding of these methods, we study first-order methods with constant step size f…
▽ More
Modern data science applications demand solving large-scale optimization problems. The prevalent approaches are first-order methods, valued for their scalability. These methods are implemented to tackle highly irregular problems where assumptions of convexity and smoothness are untenable.
Seeking to deepen the understanding of these methods, we study first-order methods with constant step size for minimizing locally Lipschitz tame functions. To do so, we propose notions of discrete Lyapunov stability for optimization methods. Concerning common first-order methods, we provide necessary and sufficient conditions for stability. We also show that certain local minima can be unstable, without additional noise in the method. Our analysis relies on the connection between the iterates of the first-order methods and continuous-time dynamics.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
Nonsmooth rank-one symmetric matrix factorization landscape
Authors:
Cédric Josz,
Lexiao Lai
Abstract:
We consider nonsmooth rank-one symmetric matrix factorization. It has no spurious second-order stationary points.
We consider nonsmooth rank-one symmetric matrix factorization. It has no spurious second-order stationary points.
△ Less
Submitted 26 February, 2025; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Transformers Handle Endogeneity in In-Context Linear Regression
Authors:
Haodong Liang,
Krishnakumar Balasubramanian,
Lifeng Lai
Abstract:
We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage lea…
▽ More
We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity.
△ Less
Submitted 10 May, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Proximal random reshuffling under local Lipschitz continuity
Authors:
Cedric Josz,
Lexiao Lai,
Xiaopeng Li
Abstract:
We study proximal random reshuffling for minimizing the sum of locally Lipschitz functions and a proper lower semicontinuous convex function without assuming coercivity or the existence of limit points. The algorithmic guarantees pertaining to near approximate stationarity rely on a new tracking lemma linking the iterates to trajectories of conservative fields. One of the novelties in the analysis…
▽ More
We study proximal random reshuffling for minimizing the sum of locally Lipschitz functions and a proper lower semicontinuous convex function without assuming coercivity or the existence of limit points. The algorithmic guarantees pertaining to near approximate stationarity rely on a new tracking lemma linking the iterates to trajectories of conservative fields. One of the novelties in the analysis consists in handling conservative fields with unbounded values.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Small improvements on the Ball-Rivoal theorem and its $p$-adic variant
Authors:
Li Lai
Abstract:
We prove that the dimension of the $\mathbb{Q}$-linear span of $1,ζ(3),ζ(5),\ldots,ζ(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This slightly refines a well-known result of Rivoal (2000) or Ball-Rivoal (2001). Quite unexpectedly, the proof only involves inserting the arithmetic observation of Zudilin (2001) into the original proof of Ball-Rivoa…
▽ More
We prove that the dimension of the $\mathbb{Q}$-linear span of $1,ζ(3),ζ(5),\ldots,ζ(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This slightly refines a well-known result of Rivoal (2000) or Ball-Rivoal (2001). Quite unexpectedly, the proof only involves inserting the arithmetic observation of Zudilin (2001) into the original proof of Ball-Rivoal. Although this result is covered by a recent development of Fischler (2021+), our proof has the advantages of being simple and providing explicit non-vanishing small linear forms in $1$ and odd zeta values.
Moreover, we establish the $p$-adic variant: for any prime number $p$, the dimension of the $\mathbb{Q}$-linear span of $1,ζ_p(3),ζ_p(5),\ldots,ζ_p(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This is new, it slightly refines a result of Sprang (2020).
△ Less
Submitted 29 January, 2025; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk
Authors:
Xinyi Ni,
Lifeng Lai
Abstract:
Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goa…
▽ More
Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Global stability of first-order methods for coercive tame functions
Authors:
Cédric Josz,
Lexiao Lai
Abstract:
We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumpt…
▽ More
We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumptions, this result applies to the subgradient method with momentum, the stochastic subgradient method with random reshuffling and momentum, and the random-permutations cyclic coordinate descent method.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning
Authors:
Guanlin Liu,
Lifeng Lai
Abstract:
Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive…
▽ More
Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty
Authors:
Guanlin Liu,
Zhihan Zhou,
Han Liu,
Lifeng Lai
Abstract:
Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-ρ$ and an alternative…
▽ More
Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-ρ$ and an alternative adversarial action with probability $ρ$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.
△ Less
Submitted 20 July, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Convergence of the momentum method for semialgebraic functions with locally Lipschitz gradients
Authors:
Cédric Josz,
Lexiao Lai,
Xiaopeng Li
Abstract:
We propose a new length formula that governs the iterates of the momentum method when minimizing differentiable semialgebraic functions with locally Lipschitz gradients. It enables us to establish local convergence, global convergence, and convergence to local minimizers without assuming global Lipschitz continuity of the gradient, coercivity, and a global growth condition, as is done in the liter…
▽ More
We propose a new length formula that governs the iterates of the momentum method when minimizing differentiable semialgebraic functions with locally Lipschitz gradients. It enables us to establish local convergence, global convergence, and convergence to local minimizers without assuming global Lipschitz continuity of the gradient, coercivity, and a global growth condition, as is done in the literature. As a result, we provide the first convergence guarantee of the momentum method starting from arbitrary initial points when applied to principal component analysis, matrix sensing, and linear neural networks.
△ Less
Submitted 7 January, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Many $p$-adic odd zeta values are irrational
Authors:
Li Lai,
Johannes Sprang
Abstract:
For any prime $p$ and $\varepsilon>0$ we prove that for any sufficiently large positive odd integer $s$ at least $(c_p-\varepsilon) \sqrt{\frac{s}{\log s}}$ of the $p$-adic zeta values $ζ_p(3),ζ_p(5),\dots,ζ_p(s)$ are irrational. The constant $c_p$ is positive and does only depend on $p$. This result establishes a $p$-adic version of the elimination technique used by Fischler--Sprang--Zudilin and…
▽ More
For any prime $p$ and $\varepsilon>0$ we prove that for any sufficiently large positive odd integer $s$ at least $(c_p-\varepsilon) \sqrt{\frac{s}{\log s}}$ of the $p$-adic zeta values $ζ_p(3),ζ_p(5),\dots,ζ_p(s)$ are irrational. The constant $c_p$ is positive and does only depend on $p$. This result establishes a $p$-adic version of the elimination technique used by Fischler--Sprang--Zudilin and Lai--Yu to prove a similar result on classical zeta values. The main difficulty consists in proving the non-vanishing of the resulting linear forms. We overcome this problem by using a new irrationality criterion.
△ Less
Submitted 17 February, 2025; v1 submitted 17 June, 2023;
originally announced June 2023.
-
On the irrationality of certain $2$-adic zeta values
Authors:
Li Lai
Abstract:
Let $ζ_2(\cdot)$ be the Kubota-Leopoldt $2$-adic zeta function. We prove that, for every nonnegative integer $s$, there exists an odd integer $j$ in the interval $[s+3,3s+5]$ such that $ζ_2(j)$ is irrational. In particular, at least one of $ζ_2(7),ζ_2(9),ζ_2(11),ζ_2(13)$ is irrational.
Our approach is inspired by the recent work of Sprang. We construct explicit rational functions. The Volkenborn…
▽ More
Let $ζ_2(\cdot)$ be the Kubota-Leopoldt $2$-adic zeta function. We prove that, for every nonnegative integer $s$, there exists an odd integer $j$ in the interval $[s+3,3s+5]$ such that $ζ_2(j)$ is irrational. In particular, at least one of $ζ_2(7),ζ_2(9),ζ_2(11),ζ_2(13)$ is irrational.
Our approach is inspired by the recent work of Sprang. We construct explicit rational functions. The Volkenborn integrals of these rational functions' (higher-order) derivatives produce good linear combinations of $1$ and $2$-adic Hurwitz zeta values. The most difficult step is proving that certain Volkenborn integrals are nonzero, which is resolved by carefully manipulating the binomial coefficients.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Sufficient conditions for instability of the subgradient method with constant step size
Authors:
Cédric Josz,
Lexiao Lai
Abstract:
We provide sufficient conditions for instability of the subgradient method with constant step size around a local minimum of a locally Lipschitz semi-algebraic function. They are satisfied by several spurious local minima arising in robust principal component analysis and neural networks.
We provide sufficient conditions for instability of the subgradient method with constant step size around a local minimum of a locally Lipschitz semi-algebraic function. They are satisfied by several spurious local minima arising in robust principal component analysis and neural networks.
△ Less
Submitted 29 June, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Lyapunov stability of the subgradient method with constant step size
Authors:
Cédric Josz,
Lexiao Lai
Abstract:
We consider the subgradient method with constant step size for minimizing locally Lipschitz semi-algebraic functions. In order to analyze the behavior of its iterates in the vicinity of a local minimum, we introduce a notion of discrete Lyapunov stability and propose necessary and sufficient conditions for stability.
We consider the subgradient method with constant step size for minimizing locally Lipschitz semi-algebraic functions. In order to analyze the behavior of its iterates in the vicinity of a local minimum, we introduce a notion of discrete Lyapunov stability and propose necessary and sufficient conditions for stability.
△ Less
Submitted 6 March, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Nonsmooth rank-one matrix factorization landscape
Authors:
Cédric Josz,
Lexiao Lai
Abstract:
We provide the first positive result on the nonsmooth optimization landscape of robust principal component analysis, to the best of our knowledge. It is the object of several conjectures and remains mostly uncharted territory. We identify a necessary and sufficient condition for the absence of spurious local minima in the rank-one case. Our proof exploits the subdifferential regularity of the obje…
▽ More
We provide the first positive result on the nonsmooth optimization landscape of robust principal component analysis, to the best of our knowledge. It is the object of several conjectures and remains mostly uncharted territory. We identify a necessary and sufficient condition for the absence of spurious local minima in the rank-one case. Our proof exploits the subdifferential regularity of the objective function in order to eliminate the existence quantifier from the first-order optimality condition known as Fermat's rule.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
On two conjectures of Sun concerning Apéry-like series
Authors:
Steven Charlton,
Herbert Gangl,
Li Lai,
Ce Xu,
Jianqiang Zhao
Abstract:
In this paper, we shall prove two conjectures of Z.-W. Sun concerning Apéry-like series. One of the series is alternating whereas the other one is not. Our main strategy is to convert the series (resp.~the alternating series) to log-sine-cosine (resp.~log-sinh-cosh) integrals. Then we express all these integrals in terms of single-valued Bloch-Wigner-Ramakrishnan-Wojtkowiak-Zagier polylogarithms.…
▽ More
In this paper, we shall prove two conjectures of Z.-W. Sun concerning Apéry-like series. One of the series is alternating whereas the other one is not. Our main strategy is to convert the series (resp.~the alternating series) to log-sine-cosine (resp.~log-sinh-cosh) integrals. Then we express all these integrals in terms of single-valued Bloch-Wigner-Ramakrishnan-Wojtkowiak-Zagier polylogarithms. The conjectures then follow from a few highly non-trivial functional equations of the polylogarithms of weight $3$ and $4$.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Elementary proofs of Zagier's formula for multiple zeta values and its odd variant
Authors:
Li Lai,
Cezar Lupu,
Derek Orr
Abstract:
In this paper, we give elementary proofs of Zagier's formula for multiple zeta values involving Hoffman element and its odd variant due to Murakami. Zagier's formula was a key ingredient in the proof of Hoffman's conjecture. Moreover, using the same approach, we prove Murakami's formula for multiple $t$-values. This formula is essential in proving a Brown type result which asserts that each multip…
▽ More
In this paper, we give elementary proofs of Zagier's formula for multiple zeta values involving Hoffman element and its odd variant due to Murakami. Zagier's formula was a key ingredient in the proof of Hoffman's conjecture. Moreover, using the same approach, we prove Murakami's formula for multiple $t$-values. This formula is essential in proving a Brown type result which asserts that each multiple zeta value is a $\mathbb{Q}$-linear combination of multiple $t$-values of the same weight involving $2$'s and $3$'s.
△ Less
Submitted 29 January, 2022; v1 submitted 23 January, 2022;
originally announced January 2022.
-
Efficient Action Poisoning Attacks on Linear Contextual Bandits
Authors:
Guanlin Liu,
Lifeng Lai
Abstract:
Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design acti…
▽ More
Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design action poisoning attack schemes against linear contextual bandit algorithms in both white-box and black-box settings. We further analyze the cost of the proposed attack strategies for a very popular and widely used bandit algorithm: LinUCB. We show that, in both white-box and black-box settings, the proposed attack schemes can force the LinUCB agent to pull a target arm very frequently by spending only logarithm cost.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Coupling and Simulation of Fluid-Structure Interaction Problems for Automotive Sun-roof on Graphics Processing Unit
Authors:
Liang S. Lai,
Choi-Hong Lai,
Abal-Kassim Cheik Ahamed,
Frederic Magoules
Abstract:
In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and d…
▽ More
In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning
Authors:
Guanlin Liu,
Lifeng Lai
Abstract:
Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action…
▽ More
Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent. Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted, which brings some design challenges. We study the action poisoning attack in both white-box and black-box settings. We introduce an adaptive attack scheme called LCB-H, which works for most RL agents in the black-box setting. We prove that the LCB-H attack can force any efficient RL agent, whose dynamic regret scales sublinearly with the total number of steps taken, to choose actions according to a policy selected by the attacker very frequently, with only sublinear cost. In addition, we apply LCB-H attack against a popular model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by spending only logarithm cost, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker very frequently.
△ Less
Submitted 26 October, 2021; v1 submitted 9 October, 2021;
originally announced October 2021.
-
On the Convergence of Projected Alternating Maximization for Equitable and Optimal Transport
Authors:
Minhui Huang,
Shiqian Ma,
Lifeng Lai
Abstract:
This paper studies the equitable and optimal transport (EOT) problem, which has many applications such as fair division problems and optimal transport with multiple agents etc. In the discrete distributions case, the EOT problem can be formulated as a linear program (LP). Since this LP is prohibitively large for general LP solvers, Scetbon \etal \cite{scetbon2021equitable} suggests to perturb the…
▽ More
This paper studies the equitable and optimal transport (EOT) problem, which has many applications such as fair division problems and optimal transport with multiple agents etc. In the discrete distributions case, the EOT problem can be formulated as a linear program (LP). Since this LP is prohibitively large for general LP solvers, Scetbon \etal \cite{scetbon2021equitable} suggests to perturb the problem by adding an entropy regularization. They proposed a projected alternating maximization algorithm (PAM) to solve the dual of the entropy regularized EOT. In this paper, we provide the first convergence analysis of PAM. A novel rounding procedure is proposed to help construct the primal solution for the original EOT problem. We also propose a variant of PAM by incorporating the extrapolation technique that can numerically improve the performance of PAM. Results in this paper may shed lights on block coordinate (gradient) descent methods for general optimization problems.
△ Less
Submitted 30 September, 2021; v1 submitted 29 September, 2021;
originally announced September 2021.
-
On the largest prime divisor of $n!+1$
Authors:
Li Lai
Abstract:
For an integer $m >1$, we denote by $P(m)$ the largest prime divisor of $m$. We prove that $\limsup_{n \rightarrow +\infty} P(n!+1)/n \geqslant 1+9\log 2>7.238$, which improves a result of Stewart. More generally, for any nonzero polynomial $f(X)$ with integer coefficients, we show that $\limsup_{n \rightarrow +\infty} P(n!+f(n))/n \geqslant 1+9\log2$. This improves a result of Luca and Shparlinsk…
▽ More
For an integer $m >1$, we denote by $P(m)$ the largest prime divisor of $m$. We prove that $\limsup_{n \rightarrow +\infty} P(n!+1)/n \geqslant 1+9\log 2>7.238$, which improves a result of Stewart. More generally, for any nonzero polynomial $f(X)$ with integer coefficients, we show that $\limsup_{n \rightarrow +\infty} P(n!+f(n))/n \geqslant 1+9\log2$. This improves a result of Luca and Shparlinski. These improvements come from an additional combinatoric idea to the works mentioned above.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Exploration Enhancement of Nature-Inspired Swarm-based Optimization Algorithms
Authors:
Kwok Pui Choi,
Enzio Hai Hong Kam,
Tze Leung Lai,
Xin T. Tong,
Weng Kee Wong
Abstract:
Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions…
▽ More
Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions for an algorithm to converge almost surely (a.s.) to a global optimum. We then propose a general, simple and effective strategy, called Perturbation-Projection (PP), to enhance an algorithm's exploration capability so that our convergence conditions are guaranteed to hold. We illustrate this approach using three widely used nature-inspired swarm-based optimization algorithms: particle swarm optimization (PSO), bat algorithm (BAT) and competitive swarm optimizer (CSO). Extensive numerical experiments show that each of the three algorithms with the enhanced PP strategy outperforms the original version in a number of notable ways.
△ Less
Submitted 20 March, 2021;
originally announced March 2021.
-
At least two of $ζ(5),ζ(7),\ldots,ζ(35)$ are irrational
Authors:
Li Lai,
Li Zhou
Abstract:
Let $ζ(s)$ be the Riemann zeta function. We prove the statement in the title, which improves a recent result of Rivoal and Zudilin by lowering $69$ to $35$. We also prove that at least one of $β(2),β(4),\ldots,β(10)$ is irrational, where $β(s) = L(s,χ_4)$ and $χ_4$ is the Dirichlet character with conductor $4$.
Let $ζ(s)$ be the Riemann zeta function. We prove the statement in the title, which improves a recent result of Rivoal and Zudilin by lowering $69$ to $35$. We also prove that at least one of $β(2),β(4),\ldots,β(10)$ is irrational, where $β(s) = L(s,χ_4)$ and $χ_4$ is the Dirichlet character with conductor $4$.
△ Less
Submitted 18 October, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance
Authors:
Minhui Huang,
Shiqian Ma,
Lifeng Lai
Abstract:
The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and…
▽ More
The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then compute the Wasserstein distance between the projected data. However, this approach requires to solve a max-min problem over the Stiefel manifold, which is very challenging in practice. The only existing work that solves this problem directly is the RGAS (Riemannian Gradient Ascent with Sinkhorn Iteration) algorithm, which requires to solve an entropy-regularized optimal transport problem in each iteration, and thus can be costly for large-scale problems. In this paper, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem, which is based on a novel reformulation of the regularized max-min problem over the Stiefel manifold. We show that the complexity of arithmetic operations for RBCD to obtain an $ε$-stationary point is $O(ε^{-3})$. This significantly improves the corresponding complexity of RGAS, which is $O(ε^{-12})$. Moreover, our RBCD has very low per-iteration complexity, and hence is suitable for large-scale problems. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large.
△ Less
Submitted 27 September, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
On the Adversarial Robustness of LASSO Based Feature Selection
Authors:
Fuwei Li,
Lifeng Lai,
Shuguang Cui
Abstract:
In this paper, we investigate the adversarial robustness of feature selection based on the $\ell_1$ regularized linear regression model, namely LASSO. In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix in order to manipulate the selected features. We formulate the modification strategy…
▽ More
In this paper, we investigate the adversarial robustness of feature selection based on the $\ell_1$ regularized linear regression model, namely LASSO. In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix in order to manipulate the selected features. We formulate the modification strategy of the adversary as a bi-level optimization problem. Due to the difficulty of the non-differentiability of the $\ell_1$ norm at the zero point, we reformulate the $\ell_1$ norm regularizer as linear inequality constraints. We employ the interior-point method to solve this reformulated LASSO problem and obtain the gradient information. Then we use the projected gradient descent method to design the modification strategy. In addition, We demonstrate that this method can be extended to other $\ell_1$ based feature selection methods, such as group LASSO and sparse group LASSO. Numerical examples with synthetic and real data illustrate that our method is efficient and effective.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Robust Low-rank Matrix Completion via an Alternating Manifold Proximal Gradient Continuation Method
Authors:
Minhui Huang,
Shiqian Ma,
Lifeng Lai
Abstract:
Robust low-rank matrix completion (RMC), or robust principal component analysis with partially observed data, has been studied extensively for computer vision, signal processing and machine learning applications. This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries…
▽ More
Robust low-rank matrix completion (RMC), or robust principal component analysis with partially observed data, has been studied extensively for computer vision, signal processing and machine learning applications. This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix. A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) and the l1 norm of the sparse matrix (to promote sparsity). In this paper, motivated by some recent works on low-rank matrix completion and Riemannian optimization, we formulate this problem as a nonsmooth Riemannian optimization problem over Grassmann manifold. This new formulation is scalable because the low-rank matrix is factorized to the multiplication of two much smaller matrices. We then propose an alternating manifold proximal gradient continuation (AManPGC) method to solve the proposed new formulation. The convergence rate of the proposed algorithm is rigorously analyzed. Numerical results on both synthetic data and real data on background extraction from surveillance videos are reported to demonstrate the advantages of the proposed new formulation and algorithm over several popular existing approaches.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense
Authors:
Guanlin Liu,
Lifeng lai
Abstract:
Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We…
▽ More
Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm, a widely used bandit algorithm, into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log T,A\})$, where $T$ is the total number of rounds and $A$ is the upper bound of the total attack cost.
△ Less
Submitted 21 February, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
On the rigidity of stationary charged black holes: small perturbations of the non-extremal Kerr-Newman family
Authors:
Li Lai,
Jiong-Yue Li,
Pin Yu
Abstract:
We prove a perturbative result concerning the uniqueness of Kerr-Newman family of black holes: given an asymptotically flat space-time with bifurcate horizons, if it agrees with a non-extremal Kerr-Newman space-time asymptotically flat at infinity and it is sufficiently close to the Kerr-Newman family, then the space-time must be one of the Kerr-Newman solutions. The closeness to the Kerr-Newman f…
▽ More
We prove a perturbative result concerning the uniqueness of Kerr-Newman family of black holes: given an asymptotically flat space-time with bifurcate horizons, if it agrees with a non-extremal Kerr-Newman space-time asymptotically flat at infinity and it is sufficiently close to the Kerr-Newman family, then the space-time must be one of the Kerr-Newman solutions. The closeness to the Kerr-Newman family is measured by the smallness of a pair of Mars-Simon type tensors, which were introduced by Wong in \cite{Wong_09} to detect the Kerr-Newmann family.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
A note on the number of irrational odd zeta values
Authors:
Li Lai,
Pin Yu
Abstract:
It is proved that, for all odd integer $s \geqslant s_0(\varepsilon)$, there are at least $\big( c_0 - \varepsilon \big) \frac{s^{1/2}}{(\log s)^{1/2}} $ many irrational numbers among the following odd zeta values: $ζ(3),ζ(5),ζ(7),\cdots,ζ(s)$. The constant $c_0 = 1.192507\ldots$ can be expressed in closed form. The work is based on the previous work of Fischler, Sprang and Zudilin [FSZ19], improv…
▽ More
It is proved that, for all odd integer $s \geqslant s_0(\varepsilon)$, there are at least $\big( c_0 - \varepsilon \big) \frac{s^{1/2}}{(\log s)^{1/2}} $ many irrational numbers among the following odd zeta values: $ζ(3),ζ(5),ζ(7),\cdots,ζ(s)$. The constant $c_0 = 1.192507\ldots$ can be expressed in closed form. The work is based on the previous work of Fischler, Sprang and Zudilin [FSZ19], improves the lower bound $2^{(1-\varepsilon)\frac{\log s}{\log\log s}}$ therein. The main new ingredient is an optimal design for the zeros of the auxiliary rational functions, which relates to the inverse of Euler totient funtion.
△ Less
Submitted 16 January, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression
Authors:
Puning Zhao,
Lifeng Lai
Abstract:
k Nearest Neighbor (kNN) method is a simple and popular statistical method for classification and regression. For both classification and regression problems, existing works have shown that, if the distribution of the feature vector has bounded support and the probability density function is bounded away from zero in its support, the convergence rate of the standard kNN method, in which k is the s…
▽ More
k Nearest Neighbor (kNN) method is a simple and popular statistical method for classification and regression. For both classification and regression problems, existing works have shown that, if the distribution of the feature vector has bounded support and the probability density function is bounded away from zero in its support, the convergence rate of the standard kNN method, in which k is the same for all test samples, is minimax optimal. On the contrary, if the distribution has unbounded support, we show that there is a gap between the convergence rate achieved by the standard kNN method and the minimax bound. To close this gap, we propose an adaptive kNN method, in which different k is selected for different samples. Our selection rule does not require precise knowledge of the underlying distribution of features. The new proposed method significantly outperforms the standard one. We characterize the convergence rate of the proposed adaptive method, and show that it matches the minimax lower bound.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
On the Adversarial Robustness of Multivariate Robust Estimation
Authors:
Erhan Bayraktar,
Lifeng Lai
Abstract:
In this paper, we investigate the adversarial robustness of multivariate $M$-Estimators. In the considered model, after observing the whole dataset, an adversary can modify all data points with the goal of maximizing inference errors. We use adversarial influence function (AIF) to measure the asymptotic rate at which the adversary can change the inference result. We first characterize the adversar…
▽ More
In this paper, we investigate the adversarial robustness of multivariate $M$-Estimators. In the considered model, after observing the whole dataset, an adversary can modify all data points with the goal of maximizing inference errors. We use adversarial influence function (AIF) to measure the asymptotic rate at which the adversary can change the inference result. We first characterize the adversary's optimal modification strategy and its corresponding AIF. From the defender's perspective, we would like to design an estimator that has a small AIF. For the case of joint location and scale estimation problem, we characterize the optimal $M$-estimator that has the smallest AIF. We further identify a tradeoff between robustness against adversarial modifications and robustness against outliers, and derive the optimal $M$-estimator that achieves the best tradeoff.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Time-Dependent Surveillance-Evasion Games
Authors:
Elliot Cartee,
Lexiao Lai,
Qianli Song,
Alexander Vladimirsky
Abstract:
Surveillance-Evasion (SE) games form an important class of adversarial trajectory-planning problems. We consider time-dependent SE games, in which an Evader is trying to reach its target while minimizing the cumulative exposure to a moving enemy Observer. That Observer is simultaneously aiming to maximize the same exposure by choosing how often to use each of its predefined patrol trajectories. Fo…
▽ More
Surveillance-Evasion (SE) games form an important class of adversarial trajectory-planning problems. We consider time-dependent SE games, in which an Evader is trying to reach its target while minimizing the cumulative exposure to a moving enemy Observer. That Observer is simultaneously aiming to maximize the same exposure by choosing how often to use each of its predefined patrol trajectories. Following the framework introduced in Gilles and Vladimirsky (arXiv:1812.10620), we develop efficient algorithms for finding Nash Equilibrium policies for both players by blending techniques from semi-infinite game theory, convex optimization, and multi-objective dynamic programming on continuous planning spaces. We illustrate our method on several examples with Observers using omnidirectional and angle-restricted sensors on a domain with occluding obstacles.
△ Less
Submitted 5 September, 2019; v1 submitted 4 March, 2019;
originally announced March 2019.
-
How did Donald Trump Surprisingly Win the 2016 United States Presidential Election? an Information-Theoretic Perspective (Clean Sensing for Big Data Analytics:Optimal Strategies,Estimation Error Bounds Tighter than the Cramér-Rao Bound)
Authors:
Weiyu Xu,
Lifeng Lai,
Amin Khajehnejad
Abstract:
Donald Trump was lagging behind in nearly all opinion polls leading up to the 2016 US presidential election, but he surprisingly won the election. This raises the following important questions: 1) why most opinion polls were not accurate in 2016? and 2) how to improve the accuracies of opinion polls? In this paper, we study the inaccuracies of opinion polls in the 2016 election through the lens of…
▽ More
Donald Trump was lagging behind in nearly all opinion polls leading up to the 2016 US presidential election, but he surprisingly won the election. This raises the following important questions: 1) why most opinion polls were not accurate in 2016? and 2) how to improve the accuracies of opinion polls? In this paper, we study the inaccuracies of opinion polls in the 2016 election through the lens of information theory. We first propose a general framework of parameter estimation, called clean sensing (polling), which performs optimal parameter estimation with sensing cost constraints, from heterogeneous and potentially distorted data sources. We then cast the opinion polling as a problem of parameter estimation from potentially distorted heterogeneous data sources, and derive the optimal polling strategy using heterogenous and possibly distorted data under cost constraints. Our results show that a larger number of data samples do not necessarily lead to better polling accuracy, which give a possible explanation of the inaccuracies of opinion polls in 2016. The optimal sensing strategy should instead optimally allocate sensing resources over heterogenous data sources according to several factors including data quality, and, moreover, for a particular data source, it should strike an optimal balance between the quality of data samples, and the quantity of data samples.
As a byproduct of this research, in a general setting, we derive a group of new lower bounds on the mean-squared errors of general unbiased and biased parameter estimators. These new lower bounds can be tighter than the classical Cramér-Rao bound (CRB) and Chapman-Robbins bound. Our derivations are via studying the Lagrange dual problems of certain convex programs. The classical Cramér-Rao bound and Chapman-Robbins bound follow naturally from our results for special cases of these convex programs.
△ Less
Submitted 31 December, 2018;
originally announced December 2018.
-
Analysis of KNN Information Estimators for Smooth Distributions
Authors:
Puning Zhao,
Lifeng Lai
Abstract:
KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well f…
▽ More
KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well for a much broader class of distributions, including not only those with bounded support and densities bounded away from zero, but also those with bounded support but densities approaching zero, and those with unbounded support. In this paper, we analyze the convergence rate of the error of KSG estimator for smooth distributions, whose support of density can be both bounded and unbounded. As KSG mutual information estimator can be viewed as an adaptive recombination of KL entropy estimators, in our analysis, we also provide convergence analysis of KL entropy estimator for a broad class of distributions.
△ Less
Submitted 24 October, 2019; v1 submitted 26 October, 2018;
originally announced October 2018.
-
On the adversarial robustness of robust estimators
Authors:
Lifeng Lai,
Erhan Bayraktar
Abstract:
Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead of assuming that only a fraction of the data points are outliers as considered in the classic robust estimation setup, in this paper, we consider an adversarial setup in which an attacker can observe the whole dataset and can modify all data samples in an adversarial manner so as to m…
▽ More
Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead of assuming that only a fraction of the data points are outliers as considered in the classic robust estimation setup, in this paper, we consider an adversarial setup in which an attacker can observe the whole dataset and can modify all data samples in an adversarial manner so as to maximize the estimation error caused by his attack. We characterize the attacker's optimal attack strategy, and further introduce adversarial influence function (AIF) to quantify an estimator's sensitivity to such adversarial attacks. We provide an approach to characterize AIF for any given robust estimator, and then design optimal estimator that minimizes AIF, which implies it is least sensitive to adversarial attacks and hence is most robust against adversarial attacks. From this characterization, we identify a tradeoff between AIF (i.e., robustness against adversarial attack) and influence function, a quantity used in classic robust estimators to measure robustness against outliers, and design estimators that strike a desirable tradeoff between these two quantities.
△ Less
Submitted 30 March, 2020; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Efficient Byzantine Sequential Change Detection
Authors:
Georgios Fellouris,
Erhan Bayraktar,
Lifeng Lai
Abstract:
In the multisensor sequential change detection problem, a disruption occurs in an environment monitored by multiple sensors. This disruption induces a change in the observations of an unknown subset of sensors. In the Byzantine version of this problem, which is the focus of this work, it is further assumed that the postulated change-point model may be misspecified for an unknown subset of sensors.…
▽ More
In the multisensor sequential change detection problem, a disruption occurs in an environment monitored by multiple sensors. This disruption induces a change in the observations of an unknown subset of sensors. In the Byzantine version of this problem, which is the focus of this work, it is further assumed that the postulated change-point model may be misspecified for an unknown subset of sensors. The problem then is to detect the change quickly and reliably, for any possible subset of affected sensors, even if the misspecified sensors are controlled by an adversary. Given a user-specified upper bound on the number of compromised sensors, we propose and study three families of sequential change-detection rules for this problem. These are designed and evaluated under a generalization of Lorden's criterion, where conditional expected detection delay and expected time to false alarm are both computed in the worst-case scenario for the compromised sensors. The first-order asymptotic performance of these procedures is characterized as the worst-case false alarm rate goes to 0. The insights from these theoretical results are corroborated by a simulation study.
△ Less
Submitted 23 August, 2017; v1 submitted 9 September, 2016;
originally announced September 2016.
-
Precise Phase Transition of Total Variation Minimization
Authors:
Bingwen Zhang,
Weiyu Xu,
Jian-Feng Cai,
Lifeng Lai
Abstract:
Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characteriz…
▽ More
Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characterizing the phase transition of total variation (TV) minimization in recovering sparse-gradient signal is still open. In this paper, we fully characterize the phase transition curve of the TV minimization. Our proof builds on Donoho, Johnstone and Montanari's conjectured phase transition curve for the TV approximate message passing algorithm (AMP), together with the linkage between the minmax Mean Square Error of a denoising problem and the high-dimensional convex geometry for TV minimization.
△ Less
Submitted 14 September, 2015;
originally announced September 2015.
-
A general theory of particle filters in hidden Markov models and some applications
Authors:
Hock Peng Chan,
Tze Leung Lai
Abstract:
By making use of martingale representations, we derive the asymptotic normality of particle filters in hidden Markov models and a relatively simple formula for their asymptotic variances. Although repeated resamplings result in complicated dependence among the sample paths, the asymptotic variance formula and martingale representations lead to consistent estimates of the standard errors of the par…
▽ More
By making use of martingale representations, we derive the asymptotic normality of particle filters in hidden Markov models and a relatively simple formula for their asymptotic variances. Although repeated resamplings result in complicated dependence among the sample paths, the asymptotic variance formula and martingale representations lead to consistent estimates of the standard errors of the particle filter estimates of the hidden states.
△ Less
Submitted 18 December, 2013;
originally announced December 2013.
-
On the definition and K-theory realization of a modular functor
Authors:
Igor Kriz,
Luhang Lai
Abstract:
We present a definition of a (super)-modular functor which includes certain interesting cases that previous definitions do not allow. We also introduce a notion of topological twisting of a modular functor, and construct formally a realization by a 2-dimensional topological field theory valued in twisted K-modules. We discuss, among other things, the N=1-supersymmetric minimal models from the poin…
▽ More
We present a definition of a (super)-modular functor which includes certain interesting cases that previous definitions do not allow. We also introduce a notion of topological twisting of a modular functor, and construct formally a realization by a 2-dimensional topological field theory valued in twisted K-modules. We discuss, among other things, the N=1-supersymmetric minimal models from the point of view of this formalism.
△ Less
Submitted 7 January, 2018; v1 submitted 18 October, 2013;
originally announced October 2013.
-
Byzantine Fault Tolerant Distributed Quickest Change Detection
Authors:
Erhan Bayraktar,
Lifeng Lai
Abstract:
We introduce and solve the problem of Byzantine fault tolerant distributed quickest change detection in both continuous and discrete time setups. In this problem, multiple sensors sequentially observe random signals from the environment and send their observations to a control center that will determine whether there is a change in the statistical behavior of the observations. We assume that the s…
▽ More
We introduce and solve the problem of Byzantine fault tolerant distributed quickest change detection in both continuous and discrete time setups. In this problem, multiple sensors sequentially observe random signals from the environment and send their observations to a control center that will determine whether there is a change in the statistical behavior of the observations. We assume that the signals are independent and identically distributed across sensors. An unknown subset of sensors are compromised and will send arbitrarily modified and even artificially generated signals to the control center. It is shown that the performance of the the so-called CUSUM statistic, which is optimal when all sensors are honest, will be significantly degraded in the presence of even a single dishonest sensor. In particular, instead of in a logarithmically the detection delay grows linearly with the average run length (ARL) to false alarm. To mitigate such a performance degradation, we propose a fully distributed low complexity detection scheme. We show that the proposed scheme can recover the log scaling. We also propose a centralized group-wise scheme that can further reduce the detection delay.
△ Less
Submitted 29 December, 2014; v1 submitted 9 June, 2013;
originally announced June 2013.
-
Evaluating probability forecasts
Authors:
Tze Leung Lai,
Shulamith T. Gross,
David Bo Shen
Abstract:
Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient's positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical…
▽ More
Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient's positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical theory for scoring rules and propose an alternative approach to the evaluation of probability forecasts. This approach uses loss functions relating the predicted to the actual probabilities of the events and applies martingale theory to exploit the temporal structure between the forecast and the subsequent occurrence or nonoccurrence of the event.
△ Less
Submitted 23 February, 2012;
originally announced February 2012.
-
A sequential Monte Carlo approach to computing tail probabilities in stochastic models
Authors:
Hock Peng Chan,
Tze Leung Lai
Abstract:
Sequential Monte Carlo methods which involve sequential importance sampling and resampling are shown to provide a versatile approach to computing probabilities of rare events. By making use of martingale representations of the sequential Monte Carlo estimators, we show how resampling weights can be chosen to yield logarithmically efficient Monte Carlo estimates of large deviation probabilities for…
▽ More
Sequential Monte Carlo methods which involve sequential importance sampling and resampling are shown to provide a versatile approach to computing probabilities of rare events. By making use of martingale representations of the sequential Monte Carlo estimators, we show how resampling weights can be chosen to yield logarithmically efficient Monte Carlo estimates of large deviation probabilities for multidimensional Markov random walks.
△ Less
Submitted 21 February, 2012;
originally announced February 2012.
-
Multistage tests of multiple hypotheses
Authors:
Jay Bartroff,
Tze Leung Lai
Abstract:
Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses whi…
▽ More
Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses while preserving the family-wise error rate, and extends Holm's (1979) step-down procedure to the sequential setting, yielding substantial savings in sample size with small loss in power.
△ Less
Submitted 10 July, 2011;
originally announced July 2011.
-
Modern Sequential Analysis and its Applications to Computerized Adaptive Testing
Authors:
Jay Bartroff,
Matthew Finkelman,
Tze Leung Lai
Abstract:
After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adap…
▽ More
After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests.
△ Less
Submitted 13 June, 2011;
originally announced June 2011.