Skip to main content

Showing 1–3 of 3 results for author: Martic, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:1811.07871  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Scalable agent alignment via reward modeling: a research direction

    Authors: Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

    Abstract: One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-leve… ▽ More

    Submitted 19 November, 2018; originally announced November 2018.

  2. arXiv:1806.01186  [pdf, other

    cs.LG cs.AI stat.ML

    Penalizing side effects using stepwise relative reachability

    Authors: Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

    Abstract: How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two c… ▽ More

    Submitted 8 March, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

  3. arXiv:1706.03741  [pdf, other

    stat.ML cs.AI cs.HC cs.LG

    Deep reinforcement learning from human preferences

    Authors: Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

    Abstract: For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari… ▽ More

    Submitted 17 February, 2023; v1 submitted 12 June, 2017; originally announced June 2017.