Online Policy Learning via a Self-Normalized Maximal Inequality

Girard, Samuel; Bibaut, Aurélien; Zenati, Houssam

Statistics > Machine Learning

arXiv:2510.15483 (stat)

[Submitted on 17 Oct 2025]

Title:Online Policy Learning via a Self-Normalized Maximal Inequality

Authors:Samuel Girard, Aurélien Bibaut, Houssam Zenati

View PDF HTML (experimental)

Abstract:Adaptive experiments produce dependent data that break i.i.d. assumptions that underlie classical concentration bounds and invalidate standard learning guarantees. In this paper, we develop a self-normalized maximal inequality for martingale empirical processes. Building on this, we first propose an adaptive sample-variance penalization procedure which balances empirical loss and sample variance, valid for general dependent data. Next, this allows us to derive a new variance-regularized pessimistic off-policy learning objective, for which we establish excess-risk guarantees. Subsequently, we show that, when combined with sequential updates and under standard complexity and margin conditions, the resulting estimator achieves fast convergence rates in both parametric and nonparametric regimes, improving over the usual $1/\sqrt{n}$
baseline. We complement our theoretical findings with numerical simulations that illustrate the practical gains of our approach.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2510.15483 [stat.ML]
	(or arXiv:2510.15483v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2510.15483

Submission history

From: Samuel Girard [view email]
[v1] Fri, 17 Oct 2025 09:53:42 UTC (464 KB)

Statistics > Machine Learning

Title:Online Policy Learning via a Self-Normalized Maximal Inequality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Online Policy Learning via a Self-Normalized Maximal Inequality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators