-
Optimal single threshold stopping rules and sharp prophet inequalities
Authors:
Alexander Goldenshluger,
Yaakov Malinovsky,
Assaf Zeevi
Abstract:
This paper considers a finite horizon optimal stopping problem for a sequence of independent and identically distributed random variables. The objective is to design stopping rules that attempt to select the random variable with the highest value in the sequence. The performance of any stopping rule may be benchmarked relative to the selection of a "prophet" that has perfect foreknowledge of the l…
▽ More
This paper considers a finite horizon optimal stopping problem for a sequence of independent and identically distributed random variables. The objective is to design stopping rules that attempt to select the random variable with the highest value in the sequence. The performance of any stopping rule may be benchmarked relative to the selection of a "prophet" that has perfect foreknowledge of the largest value. Such comparisons are typically stated in the form of "prophet inequalities." In this paper we characterize sharp prophet inequalities for single threshold stopping rules as solutions to infinite two person zero sum games on the unit square with special payoff kernels. The proposed game theoretic characterization allows one to derive sharp non-asymptotic prophet inequalities for different classes of distributions. This, in turn, gives rise to a simple and computationally tractable algorithmic paradigm for deriving optimal single threshold stopping rules.
△ Less
Submitted 19 July, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Bayesian Design Principles for Frequentist Sequential Learning
Authors:
Yunbei Xu,
Assaf Zeevi
Abstract:
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic belief…
▽ More
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
△ Less
Submitted 8 February, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory
Authors:
Yunbei Xu,
Assaf Zeevi
Abstract:
We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework re…
▽ More
We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches. It also provides improvements and some level of unification in the study of localized complexities, one-sided uniform inequalities, and sample-based iterative algorithms. In the so-called "slow rate" regime, we provides the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization. In the "fast rate" regime, we establish finite-sample problem-dependent bounds that are comparable to precise asymptotics. In addition, we show that iterative algorithms like gradient descent and first-order Expectation-Maximization can achieve optimal generalization error in several representative problems across the areas of non-convex learning, stochastic optimization, and learning with missing data.
△ Less
Submitted 23 December, 2020; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits
Authors:
Yunbei Xu,
Assaf Zeevi
Abstract:
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) often struggle to deal with general function classes and large context spaces. In this paper, we study general contextual bandits with an offline regression oracle and pro…
▽ More
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) often struggle to deal with general function classes and large context spaces. In this paper, we study general contextual bandits with an offline regression oracle and propose a simple, generic principle to design optimistic algorithms, dubbed "Upper Counterfactual Confidence Bounds" (UCCB). The key innovation of UCCB is building confidence bounds in policy space, rather than in action space as is done in UCB. We demonstrate that these algorithms are provably optimal and computationally efficient in handling general function classes and large context spaces. Furthermore, we illustrate that the UCCB principle can be seamlessly extended to infinite-action general contextual bandits, provide the first solutions to these settings when employing an offline regression oracle.
△ Less
Submitted 9 March, 2024; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Discriminative Learning via Adaptive Questioning
Authors:
Achal Bassamboo,
Vikas Deep,
Sandeep Juneja,
Assaf Zeevi
Abstract:
We consider the problem of designing an adaptive sequence of questions that optimally classify a candidate's ability into one of several categories or discriminative grades. A candidate's ability is modeled as an unknown parameter, which, together with the difficulty of the question asked, determines the likelihood with which s/he is able to answer a question correctly. The learning algorithm is o…
▽ More
We consider the problem of designing an adaptive sequence of questions that optimally classify a candidate's ability into one of several categories or discriminative grades. A candidate's ability is modeled as an unknown parameter, which, together with the difficulty of the question asked, determines the likelihood with which s/he is able to answer a question correctly. The learning algorithm is only able to observe these noisy responses to its queries. We consider this problem from a fixed confidence-based $δ$-correct framework, that in our setting seeks to arrive at the correct ability discrimination at the fastest possible rate while guaranteeing that the probability of error is less than a pre-specified and small $δ$. In this setting we develop lower bounds on any sequential questioning strategy and develop geometrical insights into the problem structure both from primal and dual formulation. In addition, we arrive at algorithms that essentially match these lower bounds. Our key conclusions are that, asymptotically, any candidate needs to be asked questions at most at two (candidate ability-specific) levels, although, in a reasonably general framework, questions need to be asked only at a single level. Further, and interestingly, the problem structure facilitates endogenous exploration, so there is no need for a separately designed exploration stage in the algorithm.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
A Unified Approach for Solving Sequential Selection Problems
Authors:
Alexander Goldenshluger,
Yaakov Malinovsky,
Assaf Zeevi
Abstract:
In this paper we develop a unified approach for solving a wide class of sequential selection problems. This class includes, but is not limited to, selection problems with no-information, rank-dependent rewards, and considers both fixed as well as random problem horizons. The proposed framework is based on a reduction of the original selection problem to one of optimal stopping for a sequence of ju…
▽ More
In this paper we develop a unified approach for solving a wide class of sequential selection problems. This class includes, but is not limited to, selection problems with no-information, rank-dependent rewards, and considers both fixed as well as random problem horizons. The proposed framework is based on a reduction of the original selection problem to one of optimal stopping for a sequence of judiciously constructed independent random variables. We demonstrate that our approach allows exact and efficient computation of optimal policies and various performance metrics thereof for a variety of sequential selection problems, several of which have not been solved to date.
△ Less
Submitted 23 January, 2020; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards
Authors:
Omar Besbes,
Yonatan Gur,
Assaf Zeevi
Abstract:
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (explor…
▽ More
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret. Our analysis draws some connections between two rather disparate strands of literature: the adversarial and the stochastic MAB frameworks.
△ Less
Submitted 6 June, 2019; v1 submitted 13 May, 2014;
originally announced May 2014.
-
Non-stationary Stochastic Optimization
Authors:
O. Besbes,
Y. Gur,
A. Zeevi
Abstract:
We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. We identify sharp conditions under which it is possible to achieve long-run-average…
▽ More
We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. We identify sharp conditions under which it is possible to achieve long-run-average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: adversarial online convex optimization; and the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the "price of non-stationarity," which mathematically captures the added complexity embedded in a temporally changing environment versus a stationary one.
△ Less
Submitted 22 December, 2014; v1 submitted 20 July, 2013;
originally announced July 2013.
-
Nonparametric Bandits with Covariates
Authors:
Philippe Rigollet,
Assaf Zeevi
Abstract:
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up…
▽ More
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably "localized" bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature.
△ Less
Submitted 8 March, 2010;
originally announced March 2010.
-
Woodroofe's one-armed bandit problem revisited
Authors:
Alexander Goldenshluger,
Assaf Zeevi
Abstract:
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799--806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal police…
▽ More
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799--806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that involve suitable modifications of the myopic rule. It is shown that the regret, as well as the rate of sampling from the inferior population, can be finite or grow at various rates with the time horizon of the problem, depending on "local" properties of the covariate distribution. Proofs rely on martingale methods and information theoretic arguments.
△ Less
Submitted 1 September, 2009;
originally announced September 2009.
-
Recovering convex boundaries from blurred and noisy observations
Authors:
Alexander Goldenshluger,
Assaf Zeevi
Abstract:
We consider the problem of estimating convex boundaries from blurred and noisy observations. In our model, the convolution of an intensity function $f$ is observed with additive Gaussian white noise. The function $f$ is assumed to have convex support $G$ whose boundary is to be recovered. Rather than directly estimating the intensity function, we develop a procedure which is based on estimating…
▽ More
We consider the problem of estimating convex boundaries from blurred and noisy observations. In our model, the convolution of an intensity function $f$ is observed with additive Gaussian white noise. The function $f$ is assumed to have convex support $G$ whose boundary is to be recovered. Rather than directly estimating the intensity function, we develop a procedure which is based on estimating the support function of the set $G$. This approach is closely related to the method of geometric hyperplane probing, a well-known technique in computer vision applications. We establish bounds that reveal how the estimation accuracy depends on the ill-posedness of the convolution operator and the behavior of the intensity function near the boundary.
△ Less
Submitted 1 August, 2006;
originally announced August 2006.
-
The Hough transform estimator
Authors:
Alexander Goldenshluger,
Assaf Zeevi
Abstract:
This article pursues a statistical study of the Hough transform, the celebrated computer vision algorithm used to detect the presence of lines in a noisy image. We first study asymptotic properties of the Hough transform estimator, whose objective is to find the line that ``best'' fits a set of planar points. In particular, we establish strong consistency and rates of convergence, and characteri…
▽ More
This article pursues a statistical study of the Hough transform, the celebrated computer vision algorithm used to detect the presence of lines in a noisy image. We first study asymptotic properties of the Hough transform estimator, whose objective is to find the line that ``best'' fits a set of planar points. In particular, we establish strong consistency and rates of convergence, and characterize the limiting distribution of the Hough transform estimator. While the convergence rates are seen to be slower than those found in some standard regression methods, the Hough transform estimator is shown to be more robust as measured by its breakdown point. We next study the Hough transform in the context of the problem of detecting multiple lines. This is addressed via the framework of excess mass functionals and modality testing. Throughout, several numerical examples help illustrate various properties of the estimator. Relations between the Hough transform and more mainstream statistical paradigms and methods are discussed as well.
△ Less
Submitted 29 March, 2005;
originally announced March 2005.
-
Validity of heavy traffic steady-state approximations in generalized Jackson Networks
Authors:
David Gamarnik,
Assaf Zeevi
Abstract:
We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavy-traffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic intensity approaches unity. However, barring simple instances, it is still not known whether the…
▽ More
We consider a single class open queueing network, also known as a generalized Jackson network (GJN). A classical result in heavy-traffic theory asserts that the sequence of normalized queue length processes of the GJN converge weakly to a reflected Brownian motion (RBM) in the orthant, as the traffic intensity approaches unity. However, barring simple instances, it is still not known whether the stationary distribution of RBM provides a valid approximation for the steady-state of the original network. In this paper we resolve this open problem by proving that the re-scaled stationary distribution of the GJN converges to the stationary distribution of the RBM, thus validating a so-called ``interchange-of-limits'' for this class of networks. Our method of proof involves a combination of Lyapunov function techniques, strong approximations and tail probability bounds that yield tightness of the sequence of stationary distributions of the GJN.
△ Less
Submitted 9 March, 2006; v1 submitted 4 October, 2004;
originally announced October 2004.
-
Optimal change-point estimation from indirect observations
Authors:
A. Goldenshluger,
A. Tsybakov,
A. Zeevi
Abstract:
We study nonparametric change-point estimation from indirect noisy observations. Focusing on the white noise convolution model, we consider two classes of functions that are smooth apart from the change-point. We establish lower bounds on the minimax risk in estimating the change-point and develop rate optimal estimation procedures. The results demonstrate that the best achievable rates of conve…
▽ More
We study nonparametric change-point estimation from indirect noisy observations. Focusing on the white noise convolution model, we consider two classes of functions that are smooth apart from the change-point. We establish lower bounds on the minimax risk in estimating the change-point and develop rate optimal estimation procedures. The results demonstrate that the best achievable rates of convergence are determined both by smoothness of the function away from the change-point and by the degree of ill-posedness of the convolution operator. Optimality is obtained by introducing a new technique that involves, as a key element, detection of zero crossings of an estimate of the properly smoothed second derivative of the underlying function.
△ Less
Submitted 18 May, 2006; v1 submitted 23 July, 2004;
originally announced July 2004.