-
Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss
Authors:
Yi-Shan Wu,
Yijie Zhang,
Badr-Eddine Chérief-Abdellatif,
Yevgeny Seldin
Abstract:
PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the…
▽ More
PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the way remained elusive for PAC-Bayes. While PAC-Bayes allows construction of data-informed priors, the final confidence intervals depend only on the number of points that were not used for the construction of the prior, whereas confidence information in the prior, which is related to the number of points used to construct the prior, is lost. This limits the possibility and benefit of sequential prior updates, because the final bounds depend only on the size of the final batch.
We present a novel and, in retrospect, surprisingly simple and powerful PAC-Bayesian procedure that allows sequential prior updates with no information loss. The procedure is based on a novel decomposition of the expected loss of randomized classifiers. The decomposition rewrites the loss of the posterior as an excess loss relative to a downscaled loss of the prior plus the downscaled loss of the prior, which is bounded recursively. As a side result, we also present a generalization of the split-kl and PAC-Bayes-split-kl inequalities to discrete random variables, which we use for bounding the excess losses, and which can be of independent interest. In empirical evaluation the new procedure significantly outperforms state-of-the-art.
△ Less
Submitted 8 April, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
Authors:
Saeed Masoudian,
Julian Zimmert,
Yevgeny Seldin
Abstract:
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay $d_{\mathrm{max}}$ and had a linear dependence of the regret on it, our algorithm can tolerate arbitrary excessive delays up to order $T$ (where $T$ is the time horizon). The algorithm is based on three technical innovations, whic…
▽ More
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay $d_{\mathrm{max}}$ and had a linear dependence of the regret on it, our algorithm can tolerate arbitrary excessive delays up to order $T$ (where $T$ is the time horizon). The algorithm is based on three technical innovations, which may all be of independent interest: (1) We introduce the first implicit exploration scheme that works in best-of-both-worlds setting. (2) We introduce the first control of distribution drift that does not rely on boundedness of delays. The control is based on the implicit exploration scheme and adaptive skipping of observations with excessive delays. (3) We introduce a procedure relating standard regret with drifted regret that does not rely on boundedness of delays. At the conceptual level, we demonstrate that complexity of best-of-both-worlds bandits with delayed feedback is characterized by the amount of information missing at the time of decision making (measured by the number of outstanding observations) rather than the time that the information is missing (measured by the delays).
△ Less
Submitted 27 May, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Delayed Bandits: When Do Intermediate Observations Help?
Authors:
Emmanuel Esposito,
Saeed Masoudian,
Hao Qiu,
Dirk van der Hoeven,
Nicolò Cesa-Bianchi,
Yevgeny Seldin
Abstract:
We study a $K$-armed bandit with delayed feedback and intermediate observations. We consider a model where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. We show that the regime of the mapping of states to losses determines the complexity of the problem, irrespective of…
▽ More
We study a $K$-armed bandit with delayed feedback and intermediate observations. We consider a model where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. We show that the regime of the mapping of states to losses determines the complexity of the problem, irrespective of whether the mapping of actions to states is stochastic or adversarial. If the mapping of states to losses is adversarial, then the regret rate is of order $\sqrt{(K+d)T}$ (within log factors), where $T$ is the time horizon and $d$ is a fixed delay. This matches the regret rate of a $K$-armed bandit with delayed feedback and without intermediate observations, implying that intermediate observations are not helpful. However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\big(K+\min\{|\mathcal{S}|,d\}\big)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help. We also provide refined high-probability regret upper bounds for non-uniform delays, together with experimental validation of our algorithms.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
Authors:
Saeed Masoudian,
Julian Zimmert,
Yevgeny Seldin
Abstract:
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is…
▽ More
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is $\mathcal{O}(\sqrt{TK} + \sqrt{dT\log K})$, where $T$ is the time horizon, $K$ is the number of arms, and $d$ is the fixed delay, whereas the stochastic regret guarantee is $\mathcal{O}\left(\sum_{i \neq i^*}(\frac{1}{Δ_i} \log(T) + \frac{d}{Δ_{i}\log K}) + d K^{1/3}\log K\right)$, where $Δ_i$ are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay $d_{max}$ and achieves $\mathcal{O}(\sqrt{TK} + \sqrt{D\log K} + d_{max}K^{1/3} \log K)$ regret in the adversarial regime, where $D$ is the total delay, and $\mathcal{O}\left(\sum_{i \neq i^*}(\frac{1}{Δ_i} \log(T) + \frac{σ_{max}}{Δ_{i}\log K}) + d_{max}K^{1/3}\log K\right)$ regret in the stochastic regime, where $σ_{max}$ is the maximal number of outstanding observations. Finally, we present a lower bound that matches regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables
Authors:
Yi-Shan Wu,
Yevgeny Seldin
Abstract:
We present a new concentration of measure inequality for sums of independent bounded random variables, which we name a split-kl inequality. The inequality is particularly well-suited for ternary random variables, which naturally show up in a variety of problems, including analysis of excess losses in classification, analysis of weighted majority votes, and learning with abstention. We demonstrate…
▽ More
We present a new concentration of measure inequality for sums of independent bounded random variables, which we name a split-kl inequality. The inequality is particularly well-suited for ternary random variables, which naturally show up in a variety of problems, including analysis of excess losses in classification, analysis of weighted majority votes, and learning with abstention. We demonstrate that for ternary random variables the inequality is simultaneously competitive with the kl inequality, the Empirical Bernstein inequality, and the Unexpected Bernstein inequality, and in certain regimes outperforms all of them. It resolves an open question by Tolstikhin and Seldin [2013] and Mhammedi et al. [2019] on how to match simultaneously the combinatorial power of the kl inequality when the distribution happens to be close to binary and the power of Bernstein inequalities to exploit low variance when the probability mass is concentrated on the middle value. We also derive a PAC-Bayes-split-kl inequality and compare it with the PAC-Bayes-kl, PAC-Bayes-Empirical-Bennett, and PAC-Bayes-Unexpected-Bernstein inequalities in an analysis of excess losses and in an analysis of a weighted majority vote for several UCI datasets. Last but not least, our study provides the first direct comparison of the Empirical Bernstein and Unexpected Bernstein inequalities and their PAC-Bayes extensions.
△ Less
Submitted 17 January, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
Authors:
Chloé Rouyer,
Dirk van der Hoeven,
Nicolò Cesa-Bianchi,
Yevgeny Seldin
Abstract:
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is…
▽ More
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is $\tilde{O} (\sqrt{αT})$, where $T$ is the time horizon and $α$ is the independence number of the feedback graph. The bound against stochastic environments is $O\big( (\ln T)^2 \max_{S\in \mathcal I(G)} \sum_{i \in S} Δ_i^{-1}\big)$ where $\mathcal I(G)$ is the family of all independent sets in a suitably defined undirected version of the graph and $Δ_i$ are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote
Authors:
Yi-Shan Wu,
Andrés R. Masegosa,
Stephan S. Lorenzen,
Christian Igel,
Yevgeny Seldin
Abstract:
We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain e…
▽ More
We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive a new concentration of measure inequality, which we name PAC-Bayes-Bennett, since it combines PAC-Bayesian bounding with Bennett's inequality. We use it for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality of Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work of Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.
△ Less
Submitted 17 January, 2023; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions
Authors:
Saeed Masoudian,
Yevgeny Seldin
Abstract:
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a $(Δ,C,T)$ self-bounding constraint the algorithm achieves…
▽ More
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a $(Δ,C,T)$ self-bounding constraint the algorithm achieves $\mathcal{O}\left(\left(\sum_{i\neq i^*} \frac{1}{Δ_i}\right)\log_+\left(\frac{(K-1)T}{\left(\sum_{i\neq i^*} \frac{1}{Δ_i}\right)^2}\right)+\sqrt{C\left(\sum_{i\neq i^*}\frac{1}{Δ_i}\right)\log_+\left(\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{Δ_i}}\right)}\right)$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $Δ_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\left(1,\log x\right)$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.
△ Less
Submitted 13 September, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
Authors:
Chloé Rouyer,
Yevgeny Seldin,
Nicolò Cesa-Bianchi
Abstract:
We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price $λ$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax opt…
▽ More
We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price $λ$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of $O\big((λK)^{1/3}T^{2/3} + \sqrt{KT}\big)$, where $T$ is the time horizon and $K$ is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of $O\left(\big((λK)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} Δ_i^{-1}\right)$, where $Δ_i$ are the suboptimality gaps and $i^*$ is a unique optimal arm. In the special case of $λ= 0$ (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
Authors:
Andrés R. Masegosa,
Stephan S. Lorenzen,
Christian Igel,
Yevgeny Seldin
Abstract:
We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to ex…
▽ More
We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.
△ Less
Submitted 17 December, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays
Authors:
Julian Zimmert,
Yevgeny Seldin
Abstract:
We propose a new algorithm for adversarial multi-armed bandits with unrestricted delays. The algorithm is based on a novel hybrid regularizer applied in the Follow the Regularized Leader (FTRL) framework. It achieves $\mathcal{O}(\sqrt{kn}+\sqrt{D\log(k)})$ regret guarantee, where $k$ is the number of arms, $n$ is the number of rounds, and $D$ is the total delay. The result matches the lower bound…
▽ More
We propose a new algorithm for adversarial multi-armed bandits with unrestricted delays. The algorithm is based on a novel hybrid regularizer applied in the Follow the Regularized Leader (FTRL) framework. It achieves $\mathcal{O}(\sqrt{kn}+\sqrt{D\log(k)})$ regret guarantee, where $k$ is the number of arms, $n$ is the number of rounds, and $D$ is the total delay. The result matches the lower bound within constants and requires no prior knowledge of $n$ or $D$. Additionally, we propose a refined tuning of the algorithm, which achieves $\mathcal{O}(\sqrt{kn}+\min_{S}|S|+\sqrt{D_{\bar S}\log(k)})$ regret guarantee, where $S$ is a set of rounds excluded from delay counting, $\bar S = [n]\setminus S$ are the counted rounds, and $D_{\bar S}$ is the total delay in the counted rounds. If the delays are highly unbalanced, the latter regret guarantee can be significantly tighter than the former. The result requires no advance knowledge of the delays and resolves an open problem of Thune et al. (2019). The new FTRL algorithm and its refined tuning are anytime and require no doubling, which resolves another open problem of Thune et al. (2019).
△ Less
Submitted 16 June, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Nonstochastic Multiarmed Bandits with Unrestricted Delays
Authors:
Tobias Sommer Thune,
Nicolò Cesa-Bianchi,
Yevgeny Seldin
Abstract:
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the $O(\sqrt{(KT + D)\ln K} )$ regret bound conjectured by Cesa-Bianchi et al. [2019] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm…
▽ More
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the $O(\sqrt{(KT + D)\ln K} )$ regret bound conjectured by Cesa-Bianchi et al. [2019] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of $D$ and $T$. For this algorithm we then construct a novel doubling scheme that forgoes the prior knowledge requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order $\min_β(|S_β|+β\ln K + (KT + D_β)/β)$, where $|S_β|$ is the number of observations with delay exceeding $β$, and $D_β$ is the total delay of observations with delay below $β$. The bound relaxes to $O (\sqrt{(KT + D)\ln K} )$, but we also provide examples where $D_β\ll D$ and the oracle bound has a polynomially better dependence on the problem parameters.
△ Less
Submitted 19 November, 2019; v1 submitted 3 June, 2019;
originally announced June 2019.
-
On PAC-Bayesian Bounds for Random Forests
Authors:
Stephan Sloth Lorenzen,
Christian Igel,
Yevgeny Seldin
Abstract:
Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training proce…
▽ More
Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments.
△ Less
Submitted 6 March, 2019; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Authors:
Julian Zimmert,
Yevgeny Seldin
Abstract:
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent (OMD) with Tsallis entropy regularization with power $α=1/2$ and reduced-variance loss estimators. More generally, we define an adversarial regime with a self-…
▽ More
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent (OMD) with Tsallis entropy regularization with power $α=1/2$ and reduced-variance loss estimators. More generally, we define an adversarial regime with a self-bounding constraint, which includes stochastic regime, stochastically constrained adversarial regime (Wei and Luo), and stochastic regime with adversarial corruptions (Lykouris et al.) as special cases, and show that the algorithm achieves logarithmic regret guarantee in this regime and all of its special cases simultaneously with the adversarial regret guarantee.} The algorithm also achieves adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it significantly outperforms UCB1 and EXP3 in stochastic environments. We also provide examples of adversarial environments, where UCB1 and Thompson Sampling exhibit almost linear regret, whereas our algorithm suffers only logarithmic regret. To the best of our knowledge, this is the first example demonstrating vulnerability of Thompson Sampling in adversarial environments. Last, but not least, we present a general stochastic analysis and a general adversarial analysis of OMD algorithms with Tsallis entropy regularization for $α\in[0,1]$ and explain the reason why $α=1/2$ works best.
△ Less
Submitted 2 March, 2022; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Factored Bandits
Authors:
Julian Zimmert,
Yevgeny Seldin
Abstract:
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to cons…
▽ More
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).
△ Less
Submitted 29 October, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Adaptation to Easy Data in Prediction with Limited Advice
Authors:
Tobias Sommer Thune,
Yevgeny Seldin
Abstract:
We derive an online learning algorithm with improved regret guarantees for `easy' loss sequences. We consider two types of `easiness': (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have s…
▽ More
We derive an online learning algorithm with improved regret guarantees for `easy' loss sequences. We consider two types of `easiness': (a) stochastic loss sequences and (b) adversarial loss sequences with small effective range of the losses. While a number of algorithms have been proposed for exploiting small effective range in the full information setting, Gerchinovitz and Lattimore [2016] have shown the impossibility of regret scaling with the effective range of the losses in the bandit setting. We show that just one additional observation per round is sufficient to circumvent the impossibility result. The proposed Second Order Difference Adjustments (SODA) algorithm requires no prior knowledge of the effective range of the losses, $\varepsilon$, and achieves an $O(\varepsilon \sqrt{KT \ln K}) + \tilde{O}(\varepsilon K \sqrt[4]{T})$ expected regret guarantee, where $T$ is the time horizon and $K$ is the number of actions. The scaling with the effective loss range is achieved under significantly weaker assumptions than those made by Cesa-Bianchi and Shamir [2018] in an earlier attempt to circumvent the impossibility result. We also provide a regret lower bound of $Ω(\varepsilon\sqrt{T K})$, which almost matches the upper bound. In addition, we show that in the stochastic setting SODA achieves an $O\left(\sum_{a:Δ_a>0} \frac{K^3 \varepsilon^2}{Δ_a}\right)$ pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. In other words, SODA is safe against an unrestricted oblivious adversary and provides improved regret guarantees for at least two different types of `easiness' simultaneously.
△ Less
Submitted 27 August, 2019; v1 submitted 2 July, 2018;
originally announced July 2018.
-
An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
Authors:
Yevgeny Seldin,
Gábor Lugosi
Abstract:
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $Δe^{1/Δ^2}$, where $Δ$ is the minimal gap of a problem instance. In the ad…
▽ More
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $Δe^{1/Δ^2}$, where $Δ$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
△ Less
Submitted 9 May, 2017; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Authors:
Brian Brost,
Yevgeny Seldin,
Ingemar J. Cox,
Christina Lioma
Abstract:
New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for…
▽ More
New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off. Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to compare at each iteration was left open. We address this question by proposing a generalization of the dueling bandits model that uses simultaneous comparisons of an unrestricted number of rankers. We evaluate our algorithm on synthetic data and several standard large-scale online ranker evaluation datasets. Our experimental results show that the algorithm yields orders of magnitude improvement in performance compared to stateof- the-art dueling bandit algorithms.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.
-
A Strongly Quasiconvex PAC-Bayesian Bound
Authors:
Niklas Thiemann,
Christian Igel,
Olivier Wintenberger,
Yevgeny Seldin
Abstract:
We propose a new PAC-Bayesian bound and a way of constructing a hypothesis space, so that the bound is convex in the posterior distribution and also convex in a trade-off parameter between empirical performance of the posterior distribution and its complexity. The complexity is measured by the Kullback-Leibler divergence to a prior. We derive an alternating procedure for minimizing the bound. We s…
▽ More
We propose a new PAC-Bayesian bound and a way of constructing a hypothesis space, so that the bound is convex in the posterior distribution and also convex in a trade-off parameter between empirical performance of the posterior distribution and its complexity. The complexity is measured by the Kullback-Leibler divergence to a prior. We derive an alternating procedure for minimizing the bound. We show that the bound can be rewritten as a one-dimensional function of the trade-off parameter and provide sufficient conditions under which the function has a single global minimum. When the conditions are satisfied the alternating minimization is guaranteed to converge to the global minimum of the bound. We provide experimental results demonstrating that rigorous minimization of the bound is competitive with cross-validation in tuning the trade-off between complexity and empirical performance. In all our experiments the trade-off turned to be quasiconvex even when the sufficient conditions were violated.
△ Less
Submitted 24 August, 2017; v1 submitted 19 August, 2016;
originally announced August 2016.
-
An Improved Multileaving Algorithm for Online Ranker Evaluation
Authors:
Brian Brost,
Ingemar J. Cox,
Yevgeny Seldin,
Christina Lioma
Abstract:
Online ranker evaluation is a key challenge in information retrieval. An important task in the online evaluation of rankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have been found to be efficient and sensitive, i.e. they can quickly detect even small differences in quality. It has recently been shown that multileaving methods exhibit similar…
▽ More
Online ranker evaluation is a key challenge in information retrieval. An important task in the online evaluation of rankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have been found to be efficient and sensitive, i.e. they can quickly detect even small differences in quality. It has recently been shown that multileaving methods exhibit similar sensitivity but can be more efficient than interleaving methods. This paper presents empirical results demonstrating that existing multileaving methods either do not scale well with the number of rankers, or, more problematically, can produce results which substantially differ from evaluation measures like NDCG. The latter problem is caused by the fact that they do not correctly account for the similarities that can occur between rankers being multileaved. We propose a new multileaving method for handling this problem and demonstrate that it substantially outperforms existing methods, in some cases reducing errors by as much as 50%.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
Advice-Efficient Prediction with Expert Advice
Authors:
Yevgeny Seldin,
Peter Bartlett,
Koby Crammer
Abstract:
Advice-efficient prediction with expert advice (in analogy to label-efficient prediction) is a variant of prediction with expert advice game, where on each round of the game we are allowed to ask for advice of a limited number $M$ out of $N$ experts. This setting is especially interesting when asking for advice of every expert on every round is expensive. We present an algorithm for advice-efficie…
▽ More
Advice-efficient prediction with expert advice (in analogy to label-efficient prediction) is a variant of prediction with expert advice game, where on each round of the game we are allowed to ask for advice of a limited number $M$ out of $N$ experts. This setting is especially interesting when asking for advice of every expert on every round is expensive. We present an algorithm for advice-efficient prediction with expert advice that achieves $O(\sqrt{\frac{N}{M}T\ln N})$ regret on $T$ rounds of the game.
△ Less
Submitted 12 April, 2013;
originally announced April 2013.
-
PAC-Bayesian Inequalities for Martingales
Authors:
Yevgeny Seldin,
François Laviolette,
Nicolò Cesa-Bianchi,
John Shawe-Taylor,
Peter Auer
Abstract:
We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot…
▽ More
We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered.
We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0,1] interval by the expectation of the same function of independent Bernoulli variables. This inequality is applied to derive a tighter analog of Hoeffding-Azuma's inequality.
△ Less
Submitted 30 July, 2012; v1 submitted 31 October, 2011;
originally announced October 2011.
-
PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits
Authors:
Yevgeny Seldin,
Nicolò Cesa-Bianchi,
Peter Auer,
François Laviolette,
John Shawe-Taylor
Abstract:
We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The s…
▽ More
We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The second ingredient is an application of this inequality to the exploration-exploitation trade-off via importance weighted sampling. We apply the new tool to the stochastic multiarmed bandit problem, however, the main importance of this paper is the development and understanding of the new tool rather than improvement of existing algorithms for stochastic multiarmed bandits. In the follow-up work we demonstrate that the new tool can improve over state-of-the-art in structurally richer problems, such as stochastic multiarmed bandits with side information (Seldin et al., 2011a).
△ Less
Submitted 30 January, 2012; v1 submitted 31 October, 2011;
originally announced October 2011.
-
PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off
Authors:
Yevgeny Seldin,
Nicolò Cesa-Bianchi,
François Laviolette,
Peter Auer,
John Shawe-Taylor,
Jan Peters
Abstract:
We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin…
▽ More
We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.
△ Less
Submitted 23 May, 2011;
originally announced May 2011.
-
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Authors:
Yevgeny Seldin,
François Laviolette,
John Shawe-Taylor,
Jan Peters,
Peter Auer
Abstract:
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concen…
▽ More
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.
△ Less
Submitted 19 May, 2011; v1 submitted 12 May, 2011;
originally announced May 2011.
-
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
Authors:
Yevgeny Seldin
Abstract:
We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian anal…
▽ More
We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight.
△ Less
Submitted 2 September, 2010;
originally announced September 2010.