Skip to main content

Showing 1–15 of 15 results for author: Auer, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24193  [pdf, ps, other

    cs.LG

    Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback

    Authors: Ofir Schlisselberg, Tal Lancewicki, Peter Auer, Yishay Mansour

    Abstract: We study the multi-armed bandit problem with adversarially chosen delays in the Best-of-Both-Worlds (BoBW) framework, which aims to achieve near-optimal performance in both stochastic and adversarial environments. While prior work has made progress toward this goal, existing algorithms suffer from significant gaps to the known lower bounds, especially in the stochastic settings. Our main contribut… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  2. arXiv:2503.22557  [pdf, other

    cs.CV

    MO-CTranS: A unified multi-organ segmentation model learning from multiple heterogeneously labelled datasets

    Authors: Zhendi Gong, Susan Francis, Eleanor Cox, Stamatios N. Sotiropoulos, Dorothee P. Auer, Guoping Qiu, Andrew P. French, Xin Chen

    Abstract: Multi-organ segmentation holds paramount significance in many clinical tasks. In practice, compared to large fully annotated datasets, multiple small datasets are often more accessible and organs are not labelled consistently. Normally, an individual model is trained for each of these datasets, which is not an effective way of using data for model learning. It remains challenging to train a single… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted by International Symposium on Biomedical Imaging (ISIB) 2025 as an oral presentation

    ACM Class: I.2; I.4.6

  3. arXiv:1910.08446  [pdf, ps, other

    cs.LG stat.ML

    Autonomous exploration for navigating in non-stationary CMPs

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

    Abstract: We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prov… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

  4. arXiv:1905.05857  [pdf, ps, other

    cs.LG stat.ML

    Variational Regret Bounds for Reinforcement Learning

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer

    Abstract: We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms… ▽ More

    Submitted 10 September, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: Presented at UAI 2019

  5. arXiv:1805.10066  [pdf, other

    cs.LG stat.ML

    A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer

    Abstract: We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitab… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  6. arXiv:1709.10128  [pdf, other

    cs.NI cs.CR cs.LG

    Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks

    Authors: Monireh Dabaghchian, Amir Alipour-Fanid, Kai Zeng, Qingsi Wang, Peter Auer

    Abstract: In a cognitive radio network, a secondary user learns the spectrum environment and dynamically accesses the channel where the primary user is inactive. At the same time, a primary user emulation (PUE) attacker can send falsified primary user signals and prevent the secondary user from utilizing the available channel. The best attacking strategies that an attacker can apply have not been well studi… ▽ More

    Submitted 19 March, 2018; v1 submitted 28 September, 2017; originally announced September 2017.

  7. arXiv:1605.08722  [pdf, ps, other

    cs.LG

    An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

    Authors: Peter Auer, Chao-Kai Chiang

    Abstract: We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $O(K\sqrt{n \log n})$ and against stochastic bandits the pseudo-regret is $O(\sum_i (\log n)/Δ_i)$. We also show that no algorithm with $O(\log n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O}(\sqrt{n})$ expected r… ▽ More

    Submitted 27 May, 2016; originally announced May 2016.

  8. arXiv:1507.04523  [pdf, ps, other

    cs.LG

    Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

    Authors: Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

    Abstract: In this paper, we study the problem of estimating uniformly well the mean values of several distributions given a finite budget of samples. If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance. However, in the more realistic case where the distributions ar… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

    Comments: 30 pages, 2 Postscript figures, uses elsarticle.cls, earlier, shorter version published in Proceedings of the 22nd International Conference, Algorithmic Learning Theory

    ACM Class: G.3

  9. arXiv:1410.0471  [pdf, other

    cs.IR cs.AI

    PinView: Implicit Feedback in Content-Based Image Retrieval

    Authors: Zakria Hussain, Arto Klami, Jussi Kujala, Alex P. Leung, Kitsuchart Pasupa, Peter Auer, Samuel Kaski, Jorma Laaksonen, John Shawe-Taylor

    Abstract: This paper describes PinView, a content-based image retrieval system that exploits implicit relevance feedback collected during a search session. PinView contains several novel methods to infer the intent of the user. From relevance feedback, such as eye movements or pointer clicks, and visual features of images, PinView learns a similarity metric between images which depends on the current intere… ▽ More

    Submitted 2 October, 2014; originally announced October 2014.

    Comments: 12 pages

  10. arXiv:1209.2693  [pdf, ps, other

    cs.LG math.OC stat.ML

    Regret Bounds for Restless Markov Bandits

    Authors: Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos

    Abstract: We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addi… ▽ More

    Submitted 12 September, 2012; originally announced September 2012.

    Comments: In proceedings of The 23rd International Conference on Algorithmic Learning Theory (ALT 2012)

    Journal ref: Proceedings of ALT, Lyon, France, LNCS 7568, pp.214-228, 2012

  11. arXiv:1202.3741  [pdf

    cs.AI

    Noisy Search with Comparative Feedback

    Authors: Shiau Hong Lim, Peter Auer

    Abstract: We present theoretical results in terms of lower and upper bounds on the query complexity of noisy search with comparative feedback. In this search model, the noise in the feedback depends on the distance between query points and the search target. Consequently, the error probability in the feedback is not fixed but varies for the queries posed by the search algorithm. Our results show that a targ… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Report number: UAI-P-2011-PG-445-452

  12. arXiv:1110.6886  [pdf, other

    cs.LG cs.IT stat.ML

    PAC-Bayesian Inequalities for Martingales

    Authors: Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, Peter Auer

    Abstract: We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot… ▽ More

    Submitted 30 July, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

  13. arXiv:1110.6755  [pdf, other

    cs.LG

    PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits

    Authors: Yevgeny Seldin, Nicolò Cesa-Bianchi, Peter Auer, François Laviolette, John Shawe-Taylor

    Abstract: We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The s… ▽ More

    Submitted 30 January, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

  14. arXiv:1105.4585  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

    Authors: Yevgeny Seldin, Nicolò Cesa-Bianchi, François Laviolette, Peter Auer, John Shawe-Taylor, Jan Peters

    Abstract: We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin… ▽ More

    Submitted 23 May, 2011; originally announced May 2011.

    Comments: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop/

  15. arXiv:1105.2416  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

    Authors: Yevgeny Seldin, François Laviolette, John Shawe-Taylor, Jan Peters, Peter Auer

    Abstract: We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concen… ▽ More

    Submitted 19 May, 2011; v1 submitted 12 May, 2011; originally announced May 2011.