Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors
Authors:
Prateek Jaiswal,
Debdeep Pati,
Anirban Bhattacharya,
Bani K. Mallick
Abstract:
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS…
▽ More
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} Δ_k\left(\frac{\log(T)}{C(α)Δ_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $Δ_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(α)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $α$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
AI Assisted Annotator using Reinforcement Learning
Authors:
V. Ratna Saripalli,
Gopal Avinash,
Dibyajyoti Pati,
Michael Potter,
Charles W. Anderson
Abstract:
Healthcare data suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated in healthcare. Unlike other data sets, medical data annotation, which is critical to accurate ground truth, requires medical domain expertise for a better patient outcome. In this work, we report on the use of reinforcement learning to mimic the decision making process of ann…
▽ More
Healthcare data suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated in healthcare. Unlike other data sets, medical data annotation, which is critical to accurate ground truth, requires medical domain expertise for a better patient outcome. In this work, we report on the use of reinforcement learning to mimic the decision making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained DQN and A2C agents using the data from monitoring devices annotated by an expert. Initial results from these RL agents learning the expert annotation behavior are promising. The A2C agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to DQN agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use.
△ Less
Submitted 11 June, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.