Search | arXiv e-print repository

Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models

Authors: Yuchen Liang, Renxiang Huang, Lifeng Lai, Ness Shroff, Yingbin Liang

Abstract: Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation qu… ▽ More Discrete state space diffusion models have shown significant advantages in applications involving discrete data, such as text and image generation. It has also been observed that their performance is highly sensitive to the choice of rate matrices, particularly between uniform and absorbing rate matrices. While empirical results suggest that absorbing rate matrices often yield better generation quality compared to uniform rate matrices, existing theoretical works have largely focused on the uniform rate matrices case. Notably, convergence guarantees and error analyses for absorbing diffusion models are still missing. In this work, we provide the first finite-time error bounds and convergence rate analysis for discrete diffusion models using absorbing rate matrices. We begin by deriving an upper bound on the KL divergence of the forward process, introducing a surrogate initialization distribution to address the challenge posed by the absorbing stationary distribution, which is a singleton and causes the KL divergence to be ill-defined. We then establish the first convergence guarantees for both the $τ$-leaping and uniformization samplers under absorbing rate matrices, demonstrating improved rates over their counterparts using uniform rate matrices. Furthermore, under suitable assumptions, we provide convergence guarantees without early stopping. Our analysis introduces several new technical tools to address challenges unique to absorbing rate matrices. These include a Jensen-type argument for bounding forward process convergence, novel techniques for bounding absorbing score functions, and a non-divergent upper bound on the score near initialization that removes the need of early-stopping. △ Less

Submitted 2 June, 2025; originally announced June 2025.

arXiv:2505.23088 [pdf, ps, other]

On the irrationality of certain $p$-adic zeta values

Authors: Li Lai, Cezar Lupu, Johannes Sprang

Abstract: A famous theorem of Zudilin states that at least one of the Riemann zeta values $ζ(5), ζ(7), ζ(9), ζ(11)$ is irrational. In this paper, we establish the $p$-adic analogue of Zudilin's theorem. As a weaker form of our result, it is proved that for any prime number $p \geqslant 5$ there exists an odd integer $i$ in the interval $[3,p+p/\log p+5]$ such that the $p$-adic zeta value $ζ_p(i)$ is irratio… ▽ More A famous theorem of Zudilin states that at least one of the Riemann zeta values $ζ(5), ζ(7), ζ(9), ζ(11)$ is irrational. In this paper, we establish the $p$-adic analogue of Zudilin's theorem. As a weaker form of our result, it is proved that for any prime number $p \geqslant 5$ there exists an odd integer $i$ in the interval $[3,p+p/\log p+5]$ such that the $p$-adic zeta value $ζ_p(i)$ is irrational. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 24 pages, 1 table

MSC Class: 11J72 (primary); 11M06; 33C20 (secondary)

arXiv:2505.12687 [pdf, ps, other]

A partial result towards the Chowla--Milnor conjecture

Authors: Li Lai, Jia Li

Abstract: The Chowla--Milnor conjecture predicts the linear independence of certain Hurwitz zeta values. In this paper, we prove that for any fixed integer $k \geqslant 2$, the dimension of the $\mathbb{Q}$-linear span of $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$ ($1 \leqslant a < q/2$, $\gcd(a,q)=1$) is at least $(c_k -o(1)) \cdot \log q$ as the positive integer $q \to +\infty$ for some constant $c_k>0$ depending only… ▽ More The Chowla--Milnor conjecture predicts the linear independence of certain Hurwitz zeta values. In this paper, we prove that for any fixed integer $k \geqslant 2$, the dimension of the $\mathbb{Q}$-linear span of $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$ ($1 \leqslant a < q/2$, $\gcd(a,q)=1$) is at least $(c_k -o(1)) \cdot \log q$ as the positive integer $q \to +\infty$ for some constant $c_k>0$ depending only on $k$. It is well known that $ζ(k,a/q)+(-1)^{k}ζ(k,1-a/q) \in \overline{\mathbb{Q}}π^k$, but much less is known previously for $ζ(k,a/q)-(-1)^{k}ζ(k,1-a/q)$. Our proof is similar to those of Ball--Rivoal (2001) and Zudilin (2002) concerning the linear independence of Riemann zeta values. However, we use a new type of rational functions to construct linear forms. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 39 pages, 7 figures

MSC Class: 11J72 (primary); 11M35; 33C20 (secondary)

arXiv:2505.05005 [pdf, ps, other]

A note on the irrationality of $ζ_2(5)$

Authors: Li Lai, Johannes Sprang, Wadim Zudilin

Abstract: In a spirit of Apéry's proof of the irrationality of $ζ(3)$, we construct a sequence $p_n/q_n$ of rational approximations to the $2$-adic zeta value $ζ_2(5)$ which satisfy $0 < |ζ_2(5)-p_n/q_n|_2 < \max\{|p_n|,|q_n|\}^{-1-δ}$ for an explicit constant $δ>0$. This leads to a new proof of the irrationality of $ζ_2(5)$, the result established recently by Calegari, Dimitrov and Tang using a different m… ▽ More In a spirit of Apéry's proof of the irrationality of $ζ(3)$, we construct a sequence $p_n/q_n$ of rational approximations to the $2$-adic zeta value $ζ_2(5)$ which satisfy $0 < |ζ_2(5)-p_n/q_n|_2 < \max\{|p_n|,|q_n|\}^{-1-δ}$ for an explicit constant $δ>0$. This leads to a new proof of the irrationality of $ζ_2(5)$, the result established recently by Calegari, Dimitrov and Tang using a different method. Furthermore, our approximations allow us to obtain an upper bound for the irrationality measure of this $2$-adic quantity; namely, we show that $μ(ζ_2(5)) \le (16\log2)/(8\log2-5) = 20.342\dots$. △ Less

Submitted 8 May, 2025; originally announced May 2025.

Comments: 2^2 x 5 pages

Report number: MPIM-Bonn-2025 MSC Class: 11J72; 11J82; 11M06; 33C20

arXiv:2501.05321 [pdf, ps, other]

A note on the number of irrational odd zeta values, II

Authors: Li Lai

Abstract: We prove that there are at least $1.284 \cdot \sqrt{s/\log s}$ irrational numbers among $ζ(3)$, $ζ(5)$, $ζ(7)$, $\ldots$, $ζ(s-1)$ for any sufficiently large even integer $s$. This result improves upon the previous finding by a constant factor. The proof combines the elimination technique of Fischler-Sprang-Zudilin (2019) with the $Φ_n$ factor method of Zudilin (2001). We prove that there are at least $1.284 \cdot \sqrt{s/\log s}$ irrational numbers among $ζ(3)$, $ζ(5)$, $ζ(7)$, $\ldots$, $ζ(s-1)$ for any sufficiently large even integer $s$. This result improves upon the previous finding by a constant factor. The proof combines the elimination technique of Fischler-Sprang-Zudilin (2019) with the $Φ_n$ factor method of Zudilin (2001). △ Less

Submitted 12 January, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

Comments: 17 pages, 1 table. v2 added a new observation (Rmk. 3.6) and improved the final constant

MSC Class: 11J72 (primary); 11M06; 33C20 (secondary)

arXiv:2412.00640 [pdf, other]

Stability of first-order methods in tame optimization

Authors: Lexiao Lai

Abstract: Modern data science applications demand solving large-scale optimization problems. The prevalent approaches are first-order methods, valued for their scalability. These methods are implemented to tackle highly irregular problems where assumptions of convexity and smoothness are untenable. Seeking to deepen the understanding of these methods, we study first-order methods with constant step size f… ▽ More Modern data science applications demand solving large-scale optimization problems. The prevalent approaches are first-order methods, valued for their scalability. These methods are implemented to tackle highly irregular problems where assumptions of convexity and smoothness are untenable. Seeking to deepen the understanding of these methods, we study first-order methods with constant step size for minimizing locally Lipschitz tame functions. To do so, we propose notions of discrete Lyapunov stability for optimization methods. Concerning common first-order methods, we provide necessary and sufficient conditions for stability. We also show that certain local minima can be unstable, without additional noise in the method. Our analysis relies on the connection between the iterates of the first-order methods and continuous-time dynamics. △ Less

Submitted 30 November, 2024; originally announced December 2024.

Comments: PhD thesis

arXiv:2410.17487 [pdf, ps, other]

Nonsmooth rank-one symmetric matrix factorization landscape

Authors: Cédric Josz, Lexiao Lai

Abstract: We consider nonsmooth rank-one symmetric matrix factorization. It has no spurious second-order stationary points. We consider nonsmooth rank-one symmetric matrix factorization. It has no spurious second-order stationary points. △ Less

Submitted 26 February, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

arXiv:2410.01265 [pdf, other]

Transformers Handle Endogeneity in In-Context Linear Regression

Authors: Haodong Liang, Krishnakumar Balasubramanian, Lifeng Lai

Abstract: We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage lea… ▽ More We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity. △ Less

Submitted 10 May, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

Comments: 37 pages, 8 figures

arXiv:2408.07182 [pdf, ps, other]

Proximal random reshuffling under local Lipschitz continuity

Authors: Cedric Josz, Lexiao Lai, Xiaopeng Li

Abstract: We study proximal random reshuffling for minimizing the sum of locally Lipschitz functions and a proper lower semicontinuous convex function without assuming coercivity or the existence of limit points. The algorithmic guarantees pertaining to near approximate stationarity rely on a new tracking lemma linking the iterates to trajectories of conservative fields. One of the novelties in the analysis… ▽ More We study proximal random reshuffling for minimizing the sum of locally Lipschitz functions and a proper lower semicontinuous convex function without assuming coercivity or the existence of limit points. The algorithmic guarantees pertaining to near approximate stationarity rely on a new tracking lemma linking the iterates to trajectories of conservative fields. One of the novelties in the analysis consists in handling conservative fields with unbounded values. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.14236 [pdf, ps, other]

Small improvements on the Ball-Rivoal theorem and its $p$-adic variant

Authors: Li Lai

Abstract: We prove that the dimension of the $\mathbb{Q}$-linear span of $1,ζ(3),ζ(5),\ldots,ζ(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This slightly refines a well-known result of Rivoal (2000) or Ball-Rivoal (2001). Quite unexpectedly, the proof only involves inserting the arithmetic observation of Zudilin (2001) into the original proof of Ball-Rivoa… ▽ More We prove that the dimension of the $\mathbb{Q}$-linear span of $1,ζ(3),ζ(5),\ldots,ζ(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This slightly refines a well-known result of Rivoal (2000) or Ball-Rivoal (2001). Quite unexpectedly, the proof only involves inserting the arithmetic observation of Zudilin (2001) into the original proof of Ball-Rivoal. Although this result is covered by a recent development of Fischler (2021+), our proof has the advantages of being simple and providing explicit non-vanishing small linear forms in $1$ and odd zeta values. Moreover, we establish the $p$-adic variant: for any prime number $p$, the dimension of the $\mathbb{Q}$-linear span of $1,ζ_p(3),ζ_p(5),\ldots,ζ_p(s-1)$ is at least $(1.119 \cdot \log s)/(1+\log 2)$ for any sufficiently large even integer $s$. This is new, it slightly refines a result of Sprang (2020). △ Less

Submitted 29 January, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: 51 pages, 1 figure, 2 tables; v2 added p-adic analogue, improved the constant 1.108 to 1.119

MSC Class: 11J72; 11M06; 33C20

arXiv:2405.01718 [pdf, other]

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

Authors: Xinyi Ni, Lifeng Lai

Abstract: Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goa… ▽ More Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2308.00899 [pdf, other]

Global stability of first-order methods for coercive tame functions

Authors: Cédric Josz, Lexiao Lai

Abstract: We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumpt… ▽ More We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumptions, this result applies to the subgradient method with momentum, the stochastic subgradient method with random reshuffling and momentum, and the random-permutations cyclic coordinate descent method. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 30 pages, 1 figure

arXiv:2307.07670 [pdf, other]

Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning

Authors: Guanlin Liu, Lifeng Lai

Abstract: Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive… ▽ More Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.07666 [pdf, other]

Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty

Authors: Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai

Abstract: Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-ρ$ and an alternative… ▽ More Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-ρ$ and an alternative adversarial action with probability $ρ$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations. △ Less

Submitted 20 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.03331 [pdf, ps, other]

Convergence of the momentum method for semialgebraic functions with locally Lipschitz gradients

Authors: Cédric Josz, Lexiao Lai, Xiaopeng Li

Abstract: We propose a new length formula that governs the iterates of the momentum method when minimizing differentiable semialgebraic functions with locally Lipschitz gradients. It enables us to establish local convergence, global convergence, and convergence to local minimizers without assuming global Lipschitz continuity of the gradient, coercivity, and a global growth condition, as is done in the liter… ▽ More We propose a new length formula that governs the iterates of the momentum method when minimizing differentiable semialgebraic functions with locally Lipschitz gradients. It enables us to establish local convergence, global convergence, and convergence to local minimizers without assuming global Lipschitz continuity of the gradient, coercivity, and a global growth condition, as is done in the literature. As a result, we provide the first convergence guarantee of the momentum method starting from arbitrary initial points when applied to principal component analysis, matrix sensing, and linear neural networks. △ Less

Submitted 7 January, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 33 pages. Accepted for publication at SIAM Journal on Optimization

arXiv:2306.10393 [pdf, ps, other]

Many $p$-adic odd zeta values are irrational

Authors: Li Lai, Johannes Sprang

Abstract: For any prime $p$ and $\varepsilon>0$ we prove that for any sufficiently large positive odd integer $s$ at least $(c_p-\varepsilon) \sqrt{\frac{s}{\log s}}$ of the $p$-adic zeta values $ζ_p(3),ζ_p(5),\dots,ζ_p(s)$ are irrational. The constant $c_p$ is positive and does only depend on $p$. This result establishes a $p$-adic version of the elimination technique used by Fischler--Sprang--Zudilin and… ▽ More For any prime $p$ and $\varepsilon>0$ we prove that for any sufficiently large positive odd integer $s$ at least $(c_p-\varepsilon) \sqrt{\frac{s}{\log s}}$ of the $p$-adic zeta values $ζ_p(3),ζ_p(5),\dots,ζ_p(s)$ are irrational. The constant $c_p$ is positive and does only depend on $p$. This result establishes a $p$-adic version of the elimination technique used by Fischler--Sprang--Zudilin and Lai--Yu to prove a similar result on classical zeta values. The main difficulty consists in proving the non-vanishing of the resulting linear forms. We overcome this problem by using a new irrationality criterion. △ Less

Submitted 17 February, 2025; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: 36 pages, to appear in Michigan Math. J

MSC Class: 11J72 (Primary) 11F85; 11M06 (Secondary)

arXiv:2304.00816 [pdf, ps, other]

On the irrationality of certain $2$-adic zeta values

Authors: Li Lai

Abstract: Let $ζ_2(\cdot)$ be the Kubota-Leopoldt $2$-adic zeta function. We prove that, for every nonnegative integer $s$, there exists an odd integer $j$ in the interval $[s+3,3s+5]$ such that $ζ_2(j)$ is irrational. In particular, at least one of $ζ_2(7),ζ_2(9),ζ_2(11),ζ_2(13)$ is irrational. Our approach is inspired by the recent work of Sprang. We construct explicit rational functions. The Volkenborn… ▽ More Let $ζ_2(\cdot)$ be the Kubota-Leopoldt $2$-adic zeta function. We prove that, for every nonnegative integer $s$, there exists an odd integer $j$ in the interval $[s+3,3s+5]$ such that $ζ_2(j)$ is irrational. In particular, at least one of $ζ_2(7),ζ_2(9),ζ_2(11),ζ_2(13)$ is irrational. Our approach is inspired by the recent work of Sprang. We construct explicit rational functions. The Volkenborn integrals of these rational functions' (higher-order) derivatives produce good linear combinations of $1$ and $2$-adic Hurwitz zeta values. The most difficult step is proving that certain Volkenborn integrals are nonzero, which is resolved by carefully manipulating the binomial coefficients. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 21 pages

MSC Class: 11J72 (Primary) 11F85; 11M06 (Secondary)

arXiv:2211.14852 [pdf, other]

Sufficient conditions for instability of the subgradient method with constant step size

Authors: Cédric Josz, Lexiao Lai

Abstract: We provide sufficient conditions for instability of the subgradient method with constant step size around a local minimum of a locally Lipschitz semi-algebraic function. They are satisfied by several spurious local minima arising in robust principal component analysis and neural networks. We provide sufficient conditions for instability of the subgradient method with constant step size around a local minimum of a locally Lipschitz semi-algebraic function. They are satisfied by several spurious local minima arising in robust principal component analysis and neural networks. △ Less

Submitted 29 June, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: 18 pages, 5 figures

arXiv:2211.14850 [pdf, other]

doi 10.1007/s10107-023-01936-6

Lyapunov stability of the subgradient method with constant step size

Authors: Cédric Josz, Lexiao Lai

Abstract: We consider the subgradient method with constant step size for minimizing locally Lipschitz semi-algebraic functions. In order to analyze the behavior of its iterates in the vicinity of a local minimum, we introduce a notion of discrete Lyapunov stability and propose necessary and sufficient conditions for stability. We consider the subgradient method with constant step size for minimizing locally Lipschitz semi-algebraic functions. In order to analyze the behavior of its iterates in the vicinity of a local minimum, we introduce a notion of discrete Lyapunov stability and propose necessary and sufficient conditions for stability. △ Less

Submitted 6 March, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: 11 pages, 2 figures

MSC Class: 65K05 90C30

Journal ref: Mathematical Programming 2023

arXiv:2211.14848 [pdf, other]

Nonsmooth rank-one matrix factorization landscape

Authors: Cédric Josz, Lexiao Lai

Abstract: We provide the first positive result on the nonsmooth optimization landscape of robust principal component analysis, to the best of our knowledge. It is the object of several conjectures and remains mostly uncharted territory. We identify a necessary and sufficient condition for the absence of spurious local minima in the rank-one case. Our proof exploits the subdifferential regularity of the obje… ▽ More We provide the first positive result on the nonsmooth optimization landscape of robust principal component analysis, to the best of our knowledge. It is the object of several conjectures and remains mostly uncharted territory. We identify a necessary and sufficient condition for the absence of spurious local minima in the rank-one case. Our proof exploits the subdifferential regularity of the objective function in order to eliminate the existence quantifier from the first-order optimality condition known as Fermat's rule. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: 23 pages, 5 figures

arXiv:2210.14704 [pdf, ps, other]

doi 10.1515/forum-2022-0325

On two conjectures of Sun concerning Apéry-like series

Authors: Steven Charlton, Herbert Gangl, Li Lai, Ce Xu, Jianqiang Zhao

Abstract: In this paper, we shall prove two conjectures of Z.-W. Sun concerning Apéry-like series. One of the series is alternating whereas the other one is not. Our main strategy is to convert the series (resp.~the alternating series) to log-sine-cosine (resp.~log-sinh-cosh) integrals. Then we express all these integrals in terms of single-valued Bloch-Wigner-Ramakrishnan-Wojtkowiak-Zagier polylogarithms.… ▽ More In this paper, we shall prove two conjectures of Z.-W. Sun concerning Apéry-like series. One of the series is alternating whereas the other one is not. Our main strategy is to convert the series (resp.~the alternating series) to log-sine-cosine (resp.~log-sinh-cosh) integrals. Then we express all these integrals in terms of single-valued Bloch-Wigner-Ramakrishnan-Wojtkowiak-Zagier polylogarithms. The conjectures then follow from a few highly non-trivial functional equations of the polylogarithms of weight $3$ and $4$. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 19 pages

Journal ref: Forum Math. 2023

arXiv:2201.09262 [pdf, ps, other]

Elementary proofs of Zagier's formula for multiple zeta values and its odd variant

Authors: Li Lai, Cezar Lupu, Derek Orr

Abstract: In this paper, we give elementary proofs of Zagier's formula for multiple zeta values involving Hoffman element and its odd variant due to Murakami. Zagier's formula was a key ingredient in the proof of Hoffman's conjecture. Moreover, using the same approach, we prove Murakami's formula for multiple $t$-values. This formula is essential in proving a Brown type result which asserts that each multip… ▽ More In this paper, we give elementary proofs of Zagier's formula for multiple zeta values involving Hoffman element and its odd variant due to Murakami. Zagier's formula was a key ingredient in the proof of Hoffman's conjecture. Moreover, using the same approach, we prove Murakami's formula for multiple $t$-values. This formula is essential in proving a Brown type result which asserts that each multiple zeta value is a $\mathbb{Q}$-linear combination of multiple $t$-values of the same weight involving $2$'s and $3$'s. △ Less

Submitted 29 January, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: 15 pages, new version

MSC Class: Primary 11M06; 11M32. Secondary 11B65; 11B68

arXiv:2112.05367 [pdf, ps, other]

Efficient Action Poisoning Attacks on Linear Contextual Bandits

Authors: Guanlin Liu, Lifeng Lai

Abstract: Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design acti… ▽ More Contextual bandit algorithms have many applicants in a variety of scenarios. In order to develop trustworthy contextual bandit systems, understanding the impacts of various adversarial attacks on contextual bandit algorithms is essential. In this paper, we propose a new class of attacks: action poisoning attacks, where an adversary can change the action signal selected by the agent. We design action poisoning attack schemes against linear contextual bandit algorithms in both white-box and black-box settings. We further analyze the cost of the proposed attack strategies for a very popular and widely used bandit algorithm: LinUCB. We show that, in both white-box and black-box settings, the proposed attack schemes can force the LinUCB agent to pull a target arm very frequently by spending only logarithm cost. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2112.00087 [pdf, other]

Coupling and Simulation of Fluid-Structure Interaction Problems for Automotive Sun-roof on Graphics Processing Unit

Authors: Liang S. Lai, Choi-Hong Lai, Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and d… ▽ More In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches. △ Less

Submitted 30 November, 2021; originally announced December 2021.

arXiv:2110.04471 [pdf, ps, other]

Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning

Authors: Guanlin Liu, Lifeng Lai

Abstract: Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action… ▽ More Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent. Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted, which brings some design challenges. We study the action poisoning attack in both white-box and black-box settings. We introduce an adaptive attack scheme called LCB-H, which works for most RL agents in the black-box setting. We prove that the LCB-H attack can force any efficient RL agent, whose dynamic regret scales sublinearly with the total number of steps taken, to choose actions according to a policy selected by the attacker very frequently, with only sublinear cost. In addition, we apply LCB-H attack against a popular model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by spending only logarithm cost, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker very frequently. △ Less

Submitted 26 October, 2021; v1 submitted 9 October, 2021; originally announced October 2021.

arXiv:2109.15030 [pdf, other]

On the Convergence of Projected Alternating Maximization for Equitable and Optimal Transport

Authors: Minhui Huang, Shiqian Ma, Lifeng Lai

Abstract: This paper studies the equitable and optimal transport (EOT) problem, which has many applications such as fair division problems and optimal transport with multiple agents etc. In the discrete distributions case, the EOT problem can be formulated as a linear program (LP). Since this LP is prohibitively large for general LP solvers, Scetbon \etal \cite{scetbon2021equitable} suggests to perturb the… ▽ More This paper studies the equitable and optimal transport (EOT) problem, which has many applications such as fair division problems and optimal transport with multiple agents etc. In the discrete distributions case, the EOT problem can be formulated as a linear program (LP). Since this LP is prohibitively large for general LP solvers, Scetbon \etal \cite{scetbon2021equitable} suggests to perturb the problem by adding an entropy regularization. They proposed a projected alternating maximization algorithm (PAM) to solve the dual of the entropy regularized EOT. In this paper, we provide the first convergence analysis of PAM. A novel rounding procedure is proposed to help construct the primal solution for the original EOT problem. We also propose a variant of PAM by incorporating the extrapolation technique that can numerically improve the performance of PAM. Results in this paper may shed lights on block coordinate (gradient) descent methods for general optimization problems. △ Less

Submitted 30 September, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2103.14894 [pdf, ps, other]

On the largest prime divisor of $n!+1$

Authors: Li Lai

Abstract: For an integer $m >1$, we denote by $P(m)$ the largest prime divisor of $m$. We prove that $\limsup_{n \rightarrow +\infty} P(n!+1)/n \geqslant 1+9\log 2>7.238$, which improves a result of Stewart. More generally, for any nonzero polynomial $f(X)$ with integer coefficients, we show that $\limsup_{n \rightarrow +\infty} P(n!+f(n))/n \geqslant 1+9\log2$. This improves a result of Luca and Shparlinsk… ▽ More For an integer $m >1$, we denote by $P(m)$ the largest prime divisor of $m$. We prove that $\limsup_{n \rightarrow +\infty} P(n!+1)/n \geqslant 1+9\log 2>7.238$, which improves a result of Stewart. More generally, for any nonzero polynomial $f(X)$ with integer coefficients, we show that $\limsup_{n \rightarrow +\infty} P(n!+f(n))/n \geqslant 1+9\log2$. This improves a result of Luca and Shparlinski. These improvements come from an additional combinatoric idea to the works mentioned above. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: 11 pages

MSC Class: 11D75; 11J25

arXiv:2103.11113 [pdf, other]

Exploration Enhancement of Nature-Inspired Swarm-based Optimization Algorithms

Authors: Kwok Pui Choi, Enzio Hai Hong Kam, Tze Leung Lai, Xin T. Tong, Weng Kee Wong

Abstract: Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions… ▽ More Nature-inspired swarm-based algorithms have been widely applied to tackle high-dimensional and complex optimization problems across many disciplines. They are general purpose optimization algorithms, easy to use and implement, flexible and assumption-free. A common drawback of these algorithms is premature convergence and the solution found is not a global optimum. We provide sufficient conditions for an algorithm to converge almost surely (a.s.) to a global optimum. We then propose a general, simple and effective strategy, called Perturbation-Projection (PP), to enhance an algorithm's exploration capability so that our convergence conditions are guaranteed to hold. We illustrate this approach using three widely used nature-inspired swarm-based optimization algorithms: particle swarm optimization (PSO), bat algorithm (BAT) and competitive swarm optimizer (CSO). Extensive numerical experiments show that each of the three algorithms with the enhanced PP strategy outperforms the original version in a number of notable ways. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Comments: 20 pages, 9 figures

arXiv:2103.00904 [pdf, ps, other]

At least two of $ζ(5),ζ(7),\ldots,ζ(35)$ are irrational

Authors: Li Lai, Li Zhou

Abstract: Let $ζ(s)$ be the Riemann zeta function. We prove the statement in the title, which improves a recent result of Rivoal and Zudilin by lowering $69$ to $35$. We also prove that at least one of $β(2),β(4),\ldots,β(10)$ is irrational, where $β(s) = L(s,χ_4)$ and $χ_4$ is the Dirichlet character with conductor $4$. Let $ζ(s)$ be the Riemann zeta function. We prove the statement in the title, which improves a recent result of Rivoal and Zudilin by lowering $69$ to $35$. We also prove that at least one of $β(2),β(4),\ldots,β(10)$ is irrational, where $β(s) = L(s,χ_4)$ and $χ_4$ is the Dirichlet character with conductor $4$. △ Less

Submitted 18 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 14 pages. A better collection of parameters for Thm. 6.1 (motivated by one referee). Some minor corrections according to referee reports

MSC Class: 11J72 (primary); 11M06; 33C20 (secondary)

arXiv:2012.05199 [pdf, other]

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

Authors: Minhui Huang, Shiqian Ma, Lifeng Lai

Abstract: The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and… ▽ More The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then compute the Wasserstein distance between the projected data. However, this approach requires to solve a max-min problem over the Stiefel manifold, which is very challenging in practice. The only existing work that solves this problem directly is the RGAS (Riemannian Gradient Ascent with Sinkhorn Iteration) algorithm, which requires to solve an entropy-regularized optimal transport problem in each iteration, and thus can be costly for large-scale problems. In this paper, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem, which is based on a novel reformulation of the regularized max-min problem over the Stiefel manifold. We show that the complexity of arithmetic operations for RBCD to obtain an $ε$-stationary point is $O(ε^{-3})$. This significantly improves the corresponding complexity of RGAS, which is $O(ε^{-12})$. Moreover, our RBCD has very low per-iteration complexity, and hence is suitable for large-scale problems. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large. △ Less

Submitted 27 September, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.10045 [pdf, ps, other]

doi 10.1109/TSP.2021.3115943

On the Adversarial Robustness of LASSO Based Feature Selection

Authors: Fuwei Li, Lifeng Lai, Shuguang Cui

Abstract: In this paper, we investigate the adversarial robustness of feature selection based on the $\ell_1$ regularized linear regression model, namely LASSO. In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix in order to manipulate the selected features. We formulate the modification strategy… ▽ More In this paper, we investigate the adversarial robustness of feature selection based on the $\ell_1$ regularized linear regression model, namely LASSO. In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix in order to manipulate the selected features. We formulate the modification strategy of the adversary as a bi-level optimization problem. Due to the difficulty of the non-differentiability of the $\ell_1$ norm at the zero point, we reformulate the $\ell_1$ norm regularizer as linear inequality constraints. We employ the interior-point method to solve this reformulated LASSO problem and obtain the gradient information. Then we use the projected gradient descent method to design the modification strategy. In addition, We demonstrate that this method can be extended to other $\ell_1$ based feature selection methods, such as group LASSO and sparse group LASSO. Numerical examples with synthetic and real data illustrate that our method is efficient and effective. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2008.07740 [pdf, other]

doi 10.1109/TSP.2021.3073544

Robust Low-rank Matrix Completion via an Alternating Manifold Proximal Gradient Continuation Method

Authors: Minhui Huang, Shiqian Ma, Lifeng Lai

Abstract: Robust low-rank matrix completion (RMC), or robust principal component analysis with partially observed data, has been studied extensively for computer vision, signal processing and machine learning applications. This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries… ▽ More Robust low-rank matrix completion (RMC), or robust principal component analysis with partially observed data, has been studied extensively for computer vision, signal processing and machine learning applications. This problem aims to decompose a partially observed matrix into the superposition of a low-rank matrix and a sparse matrix, where the sparse matrix captures the grossly corrupted entries of the matrix. A widely used approach to tackle RMC is to consider a convex formulation, which minimizes the nuclear norm of the low-rank matrix (to promote low-rankness) and the l1 norm of the sparse matrix (to promote sparsity). In this paper, motivated by some recent works on low-rank matrix completion and Riemannian optimization, we formulate this problem as a nonsmooth Riemannian optimization problem over Grassmann manifold. This new formulation is scalable because the low-rank matrix is factorized to the multiplication of two much smaller matrices. We then propose an alternating manifold proximal gradient continuation (AManPGC) method to solve the proposed new formulation. The convergence rate of the proposed algorithm is rigorously analyzed. Numerical results on both synthetic data and real data on background extraction from surveillance videos are reported to demonstrate the advantages of the proposed new formulation and algorithm over several popular existing approaches. △ Less

Submitted 18 August, 2020; originally announced August 2020.

arXiv:2002.08000 [pdf, other]

doi 10.1109/TSP.2020.3021525

Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense

Authors: Guanlin Liu, Lifeng lai

Abstract: Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We… ▽ More Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm, a widely used bandit algorithm, into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log T,A\})$, where $T$ is the total number of rounds and $A$ is the upper bound of the total attack cost. △ Less

Submitted 21 February, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: 13 pages, 7 figures, submitted to IEEE Transaction on Signal Processing

arXiv:1911.10560 [pdf, other]

On the rigidity of stationary charged black holes: small perturbations of the non-extremal Kerr-Newman family

Authors: Li Lai, Jiong-Yue Li, Pin Yu

Abstract: We prove a perturbative result concerning the uniqueness of Kerr-Newman family of black holes: given an asymptotically flat space-time with bifurcate horizons, if it agrees with a non-extremal Kerr-Newman space-time asymptotically flat at infinity and it is sufficiently close to the Kerr-Newman family, then the space-time must be one of the Kerr-Newman solutions. The closeness to the Kerr-Newman f… ▽ More We prove a perturbative result concerning the uniqueness of Kerr-Newman family of black holes: given an asymptotically flat space-time with bifurcate horizons, if it agrees with a non-extremal Kerr-Newman space-time asymptotically flat at infinity and it is sufficiently close to the Kerr-Newman family, then the space-time must be one of the Kerr-Newman solutions. The closeness to the Kerr-Newman family is measured by the smallness of a pair of Mars-Simon type tensors, which were introduced by Wong in \cite{Wong_09} to detect the Kerr-Newmann family. △ Less

Submitted 24 November, 2019; originally announced November 2019.

Comments: 36 pages, 4 figures

arXiv:1911.08458 [pdf, ps, other]

doi 10.1112/S0010437X20007307

A note on the number of irrational odd zeta values

Authors: Li Lai, Pin Yu

Abstract: It is proved that, for all odd integer $s \geqslant s_0(\varepsilon)$, there are at least $\big( c_0 - \varepsilon \big) \frac{s^{1/2}}{(\log s)^{1/2}} $ many irrational numbers among the following odd zeta values: $ζ(3),ζ(5),ζ(7),\cdots,ζ(s)$. The constant $c_0 = 1.192507\ldots$ can be expressed in closed form. The work is based on the previous work of Fischler, Sprang and Zudilin [FSZ19], improv… ▽ More It is proved that, for all odd integer $s \geqslant s_0(\varepsilon)$, there are at least $\big( c_0 - \varepsilon \big) \frac{s^{1/2}}{(\log s)^{1/2}} $ many irrational numbers among the following odd zeta values: $ζ(3),ζ(5),ζ(7),\cdots,ζ(s)$. The constant $c_0 = 1.192507\ldots$ can be expressed in closed form. The work is based on the previous work of Fischler, Sprang and Zudilin [FSZ19], improves the lower bound $2^{(1-\varepsilon)\frac{\log s}{\log\log s}}$ therein. The main new ingredient is an optimal design for the zeros of the auxiliary rational functions, which relates to the inverse of Euler totient funtion. △ Less

Submitted 16 January, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: 15 pages, corrected typos, improved the constant 1/10 to about 1.19

MSC Class: 11J72 (primary); 11M06; 33C20 (secondary)

Journal ref: Compositio Math. 156 (2020) 1699-1717

arXiv:1910.10513 [pdf, other]

Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression

Authors: Puning Zhao, Lifeng Lai

Abstract: k Nearest Neighbor (kNN) method is a simple and popular statistical method for classification and regression. For both classification and regression problems, existing works have shown that, if the distribution of the feature vector has bounded support and the probability density function is bounded away from zero in its support, the convergence rate of the standard kNN method, in which k is the s… ▽ More k Nearest Neighbor (kNN) method is a simple and popular statistical method for classification and regression. For both classification and regression problems, existing works have shown that, if the distribution of the feature vector has bounded support and the probability density function is bounded away from zero in its support, the convergence rate of the standard kNN method, in which k is the same for all test samples, is minimax optimal. On the contrary, if the distribution has unbounded support, we show that there is a gap between the convergence rate achieved by the standard kNN method and the minimax bound. To close this gap, we propose an adaptive kNN method, in which different k is selected for different samples. Our selection rule does not require precise knowledge of the underlying distribution of features. The new proposed method significantly outperforms the standard one. We characterize the convergence rate of the proposed adaptive method, and show that it matches the minimax lower bound. △ Less

Submitted 22 October, 2019; originally announced October 2019.

arXiv:1903.11220 [pdf, ps, other]

On the Adversarial Robustness of Multivariate Robust Estimation

Authors: Erhan Bayraktar, Lifeng Lai

Abstract: In this paper, we investigate the adversarial robustness of multivariate $M$-Estimators. In the considered model, after observing the whole dataset, an adversary can modify all data points with the goal of maximizing inference errors. We use adversarial influence function (AIF) to measure the asymptotic rate at which the adversary can change the inference result. We first characterize the adversar… ▽ More In this paper, we investigate the adversarial robustness of multivariate $M$-Estimators. In the considered model, after observing the whole dataset, an adversary can modify all data points with the goal of maximizing inference errors. We use adversarial influence function (AIF) to measure the asymptotic rate at which the adversary can change the inference result. We first characterize the adversary's optimal modification strategy and its corresponding AIF. From the defender's perspective, we would like to design an estimator that has a small AIF. For the case of joint location and scale estimation problem, we characterize the optimal $M$-estimator that has the smallest AIF. We further identify a tradeoff between robustness against adversarial modifications and robustness against outliers, and derive the optimal $M$-estimator that achieves the best tradeoff. △ Less

Submitted 26 March, 2019; originally announced March 2019.

arXiv:1903.01332 [pdf, other]

Time-Dependent Surveillance-Evasion Games

Authors: Elliot Cartee, Lexiao Lai, Qianli Song, Alexander Vladimirsky

Abstract: Surveillance-Evasion (SE) games form an important class of adversarial trajectory-planning problems. We consider time-dependent SE games, in which an Evader is trying to reach its target while minimizing the cumulative exposure to a moving enemy Observer. That Observer is simultaneously aiming to maximize the same exposure by choosing how often to use each of its predefined patrol trajectories. Fo… ▽ More Surveillance-Evasion (SE) games form an important class of adversarial trajectory-planning problems. We consider time-dependent SE games, in which an Evader is trying to reach its target while minimizing the cumulative exposure to a moving enemy Observer. That Observer is simultaneously aiming to maximize the same exposure by choosing how often to use each of its predefined patrol trajectories. Following the framework introduced in Gilles and Vladimirsky (arXiv:1812.10620), we develop efficient algorithms for finding Nash Equilibrium policies for both players by blending techniques from semi-infinite game theory, convex optimization, and multi-objective dynamic programming on continuous planning spaces. We illustrate our method on several examples with Observers using omnidirectional and angle-restricted sensors on a domain with occluding obstacles. △ Less

Submitted 5 September, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

Comments: 6 pages, Latex, final version for IEEE CDC 2019

arXiv:1812.11891 [pdf, other]

How did Donald Trump Surprisingly Win the 2016 United States Presidential Election? an Information-Theoretic Perspective (Clean Sensing for Big Data Analytics:Optimal Strategies,Estimation Error Bounds Tighter than the Cramér-Rao Bound)

Authors: Weiyu Xu, Lifeng Lai, Amin Khajehnejad

Abstract: Donald Trump was lagging behind in nearly all opinion polls leading up to the 2016 US presidential election, but he surprisingly won the election. This raises the following important questions: 1) why most opinion polls were not accurate in 2016? and 2) how to improve the accuracies of opinion polls? In this paper, we study the inaccuracies of opinion polls in the 2016 election through the lens of… ▽ More Donald Trump was lagging behind in nearly all opinion polls leading up to the 2016 US presidential election, but he surprisingly won the election. This raises the following important questions: 1) why most opinion polls were not accurate in 2016? and 2) how to improve the accuracies of opinion polls? In this paper, we study the inaccuracies of opinion polls in the 2016 election through the lens of information theory. We first propose a general framework of parameter estimation, called clean sensing (polling), which performs optimal parameter estimation with sensing cost constraints, from heterogeneous and potentially distorted data sources. We then cast the opinion polling as a problem of parameter estimation from potentially distorted heterogeneous data sources, and derive the optimal polling strategy using heterogenous and possibly distorted data under cost constraints. Our results show that a larger number of data samples do not necessarily lead to better polling accuracy, which give a possible explanation of the inaccuracies of opinion polls in 2016. The optimal sensing strategy should instead optimally allocate sensing resources over heterogenous data sources according to several factors including data quality, and, moreover, for a particular data source, it should strike an optimal balance between the quality of data samples, and the quantity of data samples. As a byproduct of this research, in a general setting, we derive a group of new lower bounds on the mean-squared errors of general unbiased and biased parameter estimators. These new lower bounds can be tighter than the classical Cramér-Rao bound (CRB) and Chapman-Robbins bound. Our derivations are via studying the Lagrange dual problems of certain convex programs. The classical Cramér-Rao bound and Chapman-Robbins bound follow naturally from our results for special cases of these convex programs. △ Less

Submitted 31 December, 2018; originally announced December 2018.

Comments: 45 pages

arXiv:1810.11571 [pdf, other]

Analysis of KNN Information Estimators for Smooth Distributions

Authors: Puning Zhao, Lifeng Lai

Abstract: KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well f… ▽ More KSG mutual information estimator, which is based on the distances of each sample to its k-th nearest neighbor, is widely used to estimate mutual information between two continuous random variables. Existing work has analyzed the convergence rate of this estimator for random variables whose densities are bounded away from zero in its support. In practice, however, KSG estimator also performs well for a much broader class of distributions, including not only those with bounded support and densities bounded away from zero, but also those with bounded support but densities approaching zero, and those with unbounded support. In this paper, we analyze the convergence rate of the error of KSG estimator for smooth distributions, whose support of density can be both bounded and unbounded. As KSG mutual information estimator can be viewed as an adaptive recombination of KL entropy estimators, in our analysis, we also provide convergence analysis of KL entropy estimator for a broad class of distributions. △ Less

Submitted 24 October, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

arXiv:1806.03801 [pdf, other]

On the adversarial robustness of robust estimators

Authors: Lifeng Lai, Erhan Bayraktar

Abstract: Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead of assuming that only a fraction of the data points are outliers as considered in the classic robust estimation setup, in this paper, we consider an adversarial setup in which an attacker can observe the whole dataset and can modify all data samples in an adversarial manner so as to m… ▽ More Motivated by recent data analytics applications, we study the adversarial robustness of robust estimators. Instead of assuming that only a fraction of the data points are outliers as considered in the classic robust estimation setup, in this paper, we consider an adversarial setup in which an attacker can observe the whole dataset and can modify all data samples in an adversarial manner so as to maximize the estimation error caused by his attack. We characterize the attacker's optimal attack strategy, and further introduce adversarial influence function (AIF) to quantify an estimator's sensitivity to such adversarial attacks. We provide an approach to characterize AIF for any given robust estimator, and then design optimal estimator that minimizes AIF, which implies it is least sensitive to adversarial attacks and hence is most robust against adversarial attacks. From this characterization, we identify a tradeoff between AIF (i.e., robustness against adversarial attack) and influence function, a quantity used in classic robust estimators to measure robustness against outliers, and design estimators that strike a desirable tradeoff between these two quantities. △ Less

Submitted 30 March, 2020; v1 submitted 11 June, 2018; originally announced June 2018.

arXiv:1609.02661 [pdf, other]

Efficient Byzantine Sequential Change Detection

Authors: Georgios Fellouris, Erhan Bayraktar, Lifeng Lai

Abstract: In the multisensor sequential change detection problem, a disruption occurs in an environment monitored by multiple sensors. This disruption induces a change in the observations of an unknown subset of sensors. In the Byzantine version of this problem, which is the focus of this work, it is further assumed that the postulated change-point model may be misspecified for an unknown subset of sensors.… ▽ More In the multisensor sequential change detection problem, a disruption occurs in an environment monitored by multiple sensors. This disruption induces a change in the observations of an unknown subset of sensors. In the Byzantine version of this problem, which is the focus of this work, it is further assumed that the postulated change-point model may be misspecified for an unknown subset of sensors. The problem then is to detect the change quickly and reliably, for any possible subset of affected sensors, even if the misspecified sensors are controlled by an adversary. Given a user-specified upper bound on the number of compromised sensors, we propose and study three families of sequential change-detection rules for this problem. These are designed and evaluated under a generalization of Lorden's criterion, where conditional expected detection delay and expected time to false alarm are both computed in the worst-case scenario for the compromised sensors. The first-order asymptotic performance of these procedures is characterized as the worst-case false alarm rate goes to 0. The insights from these theoretical results are corroborated by a simulation study. △ Less

Submitted 23 August, 2017; v1 submitted 9 September, 2016; originally announced September 2016.

Comments: 36 pages, 4 figures

MSC Class: 62L10; 60G40

arXiv:1509.04376 [pdf, ps, other]

Precise Phase Transition of Total Variation Minimization

Authors: Bingwen Zhang, Weiyu Xu, Jian-Feng Cai, Lifeng Lai

Abstract: Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characteriz… ▽ More Characterizing the phase transitions of convex optimizations in recovering structured signals or data is of central importance in compressed sensing, machine learning and statistics. The phase transitions of many convex optimization signal recovery methods such as $\ell_1$ minimization and nuclear norm minimization are well understood through recent years' research. However, rigorously characterizing the phase transition of total variation (TV) minimization in recovering sparse-gradient signal is still open. In this paper, we fully characterize the phase transition curve of the TV minimization. Our proof builds on Donoho, Johnstone and Montanari's conjectured phase transition curve for the TV approximate message passing algorithm (AMP), together with the linkage between the minmax Mean Square Error of a denoising problem and the high-dimensional convex geometry for TV minimization. △ Less

Submitted 14 September, 2015; originally announced September 2015.

Comments: 6 pages

arXiv:1312.5114 [pdf, ps, other]

doi 10.1214/13-AOS1172

A general theory of particle filters in hidden Markov models and some applications

Authors: Hock Peng Chan, Tze Leung Lai

Abstract: By making use of martingale representations, we derive the asymptotic normality of particle filters in hidden Markov models and a relatively simple formula for their asymptotic variances. Although repeated resamplings result in complicated dependence among the sample paths, the asymptotic variance formula and martingale representations lead to consistent estimates of the standard errors of the par… ▽ More By making use of martingale representations, we derive the asymptotic normality of particle filters in hidden Markov models and a relatively simple formula for their asymptotic variances. Although repeated resamplings result in complicated dependence among the sample paths, the asymptotic variance formula and martingale representations lead to consistent estimates of the standard errors of the particle filter estimates of the hidden states. △ Less

Submitted 18 December, 2013; originally announced December 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1172 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1172

Journal ref: Annals of Statistics 2013, Vol. 41, No. 6, 2877-2904

arXiv:1310.5174 [pdf, ps, other]

doi 10.1142/S0129055X18500083

On the definition and K-theory realization of a modular functor

Authors: Igor Kriz, Luhang Lai

Abstract: We present a definition of a (super)-modular functor which includes certain interesting cases that previous definitions do not allow. We also introduce a notion of topological twisting of a modular functor, and construct formally a realization by a 2-dimensional topological field theory valued in twisted K-modules. We discuss, among other things, the N=1-supersymmetric minimal models from the poin… ▽ More We present a definition of a (super)-modular functor which includes certain interesting cases that previous definitions do not allow. We also introduce a notion of topological twisting of a modular functor, and construct formally a realization by a 2-dimensional topological field theory valued in twisted K-modules. We discuss, among other things, the N=1-supersymmetric minimal models from the point of view of this formalism. △ Less

Submitted 7 January, 2018; v1 submitted 18 October, 2013; originally announced October 2013.

Comments: Accepted for publication in the Reviews in Mathematical Physics

MSC Class: 57R56; 19L50; 19K56

arXiv:1306.2086 [pdf, ps, other]

Byzantine Fault Tolerant Distributed Quickest Change Detection

Authors: Erhan Bayraktar, Lifeng Lai

Abstract: We introduce and solve the problem of Byzantine fault tolerant distributed quickest change detection in both continuous and discrete time setups. In this problem, multiple sensors sequentially observe random signals from the environment and send their observations to a control center that will determine whether there is a change in the statistical behavior of the observations. We assume that the s… ▽ More We introduce and solve the problem of Byzantine fault tolerant distributed quickest change detection in both continuous and discrete time setups. In this problem, multiple sensors sequentially observe random signals from the environment and send their observations to a control center that will determine whether there is a change in the statistical behavior of the observations. We assume that the signals are independent and identically distributed across sensors. An unknown subset of sensors are compromised and will send arbitrarily modified and even artificially generated signals to the control center. It is shown that the performance of the the so-called CUSUM statistic, which is optimal when all sensors are honest, will be significantly degraded in the presence of even a single dishonest sensor. In particular, instead of in a logarithmically the detection delay grows linearly with the average run length (ARL) to false alarm. To mitigate such a performance degradation, we propose a fully distributed low complexity detection scheme. We show that the proposed scheme can recover the log scaling. We also propose a centralized group-wise scheme that can further reduce the detection delay. △ Less

Submitted 29 December, 2014; v1 submitted 9 June, 2013; originally announced June 2013.

Comments: Final version. To appear in the SIAM Journal on Control and Optimization. Keywords: (Non-Bayesian) quickest change detection, Byzantine fault tolerance, distributed sensor network, robust optimal stopping in continuous and discrete time

arXiv:1202.5140 [pdf, ps, other]

doi 10.1214/11-AOS902

Evaluating probability forecasts

Authors: Tze Leung Lai, Shulamith T. Gross, David Bo Shen

Abstract: Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient's positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical… ▽ More Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient's positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical theory for scoring rules and propose an alternative approach to the evaluation of probability forecasts. This approach uses loss functions relating the predicted to the actual probabilities of the events and applies martingale theory to exploit the temporal structure between the forecast and the subsequent occurrence or nonoccurrence of the event. △ Less

Submitted 23 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS902 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS902

Journal ref: Annals of Statistics 2011, Vol. 39, No. 5, 2356-2382

arXiv:1202.4582 [pdf, ps, other]

doi 10.1214/10-AAP758

A sequential Monte Carlo approach to computing tail probabilities in stochastic models

Authors: Hock Peng Chan, Tze Leung Lai

Abstract: Sequential Monte Carlo methods which involve sequential importance sampling and resampling are shown to provide a versatile approach to computing probabilities of rare events. By making use of martingale representations of the sequential Monte Carlo estimators, we show how resampling weights can be chosen to yield logarithmically efficient Monte Carlo estimates of large deviation probabilities for… ▽ More Sequential Monte Carlo methods which involve sequential importance sampling and resampling are shown to provide a versatile approach to computing probabilities of rare events. By making use of martingale representations of the sequential Monte Carlo estimators, we show how resampling weights can be chosen to yield logarithmically efficient Monte Carlo estimates of large deviation probabilities for multidimensional Markov random walks. △ Less

Submitted 21 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/10-AAP758 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP758

Journal ref: Annals of Applied Probability 2011, Vol. 21, No. 6, 2315-2342

arXiv:1107.1919 [pdf, ps, other]

Multistage tests of multiple hypotheses

Authors: Jay Bartroff, Tze Leung Lai

Abstract: Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses whi… ▽ More Conventional multiple hypothesis tests use step-up, step-down, or closed testing methods to control the overall error rates. We will discuss marrying these methods with adaptive multistage sampling rules and stopping rules to perform efficient multiple hypothesis testing in sequential experimental designs. The result is a multistage step-down procedure that adaptively tests multiple hypotheses while preserving the family-wise error rate, and extends Holm's (1979) step-down procedure to the sequential setting, yielding substantial savings in sample size with small loss in power. △ Less

Submitted 10 July, 2011; originally announced July 2011.

arXiv:1106.2559 [pdf, other]

doi 10.1007/s11336-007-9053-9

Modern Sequential Analysis and its Applications to Computerized Adaptive Testing

Authors: Jay Bartroff, Matthew Finkelman, Tze Leung Lai

Abstract: After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adap… ▽ More After a brief review of recent advances in sequential analysis involving sequential generalized likelihood ratio tests, we discuss their use in psychometric testing and extend the asymptotic optimality theory of these sequential tests to the case of sequentially generated experiments, of particular interest in computerized adaptive testing. We then show how these methods can be used to design adaptive mastery tests, which are asymptotically optimal and are also shown to provide substantial improvements over currently used sequential and fixed length tests. △ Less

Submitted 13 June, 2011; originally announced June 2011.

Journal ref: Psychometrika 73 (2008) 473-486

Showing 1–50 of 61 results for author: Lai, L