Skip to main content

Showing 1–8 of 8 results for author: Mladenov, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2302.02061  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Reinforcement Learning with History-Dependent Dynamic Contexts

    Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

    Abstract: We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveragin… ▽ More

    Submitted 17 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Published in ICML 2023

  2. arXiv:2102.06129  [pdf, other

    cs.LG stat.ML

    Meta-Thompson Sampling

    Authors: Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

    Abstract: Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit… ▽ More

    Submitted 23 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning

  3. arXiv:2008.00104  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

    Authors: Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

    Abstract: Most recommender systems (RS) research assumes that a user's utility can be maximized independently of the utility of the other agents (e.g., other users, content providers). In realistic settings, this is often not true---the dynamics of an RS ecosystem couple the long-term utility of all agents. In this work, we explore settings in which content providers cannot remain viable unless they receive… ▽ More

    Submitted 18 August, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

  4. arXiv:2006.05094  [pdf, other

    cs.LG stat.ML

    Meta-Learning Bandit Policies by Gradient Ascent

    Authors: Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

    Abstract: Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters. The former are often too conservative in practical settings, while the latter require assumptions that are hard to verify in practice. We study bandit problems that fall… ▽ More

    Submitted 5 January, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  5. arXiv:2002.06772  [pdf, other

    cs.LG stat.ML

    Differentiable Bandit Exploration

    Authors: Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

    Abstract: Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution $\mathcal{P}$. In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$. Our approach is a form of meta-learning and exploits properties of $\mathcal{P}$ without making strong assumptions about its form. To do this, we param… ▽ More

    Submitted 9 June, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

  6. arXiv:1909.04847  [pdf, other

    cs.LG cs.HC cs.IR stat.ML

    RecSim: A Configurable Simulation Platform for Recommender Systems

    Authors: Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, Craig Boutilier

    Abstract: We propose RecSim, a configurable platform for authoring simulation environments for recommender systems (RSs) that naturally supports sequential interaction with users. RecSim allows the creation of new environments that reflect particular aspects of user behavior and item structure at a level of abstraction well-suited to pushing the limits of current reinforcement learning (RL) and RS technique… ▽ More

    Submitted 26 September, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

  7. arXiv:1905.13559  [pdf, other

    cs.LG cs.AI stat.ML

    Advantage Amplification in Slowly Evolving Latent-State Environments

    Authors: Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

    Abstract: Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle of advantage amplification that can overcome these hurdles through the use… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  8. arXiv:1904.02664  [pdf, other

    cs.LG stat.ML

    Empirical Bayes Regret Minimization

    Authors: Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari

    Abstract: Most bandit algorithm designs are purely theoretical. Therefore, they have strong regret guarantees, but also are often too conservative in practice. In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution. We focus on a tractable instance of this problem, the confidence interval and… ▽ More

    Submitted 10 June, 2020; v1 submitted 4 April, 2019; originally announced April 2019.