-
arXiv:1505.00369 [pdf, ps, other]
Batched bandit problems
Abstract: Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost fo… ▽ More
Submitted 29 March, 2016; v1 submitted 2 May, 2015; originally announced May 2015.
Comments: Published at http://dx.doi.org/10.1214/15-AOS1381 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS1381
Journal ref: Annals of Statistics 2016, Vol. 44, No. 2, 660-681