Skip to main content

Showing 1–50 of 58 results for author: Foerster, N

.
  1. arXiv:2506.09659  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Intent Factored Generation: Unleashing the Diversity in Your Language Model

    Authors: Eltayeb Ahmed, Uljad Berdica, Martha Elliott, Danijela Horak, Jakob N. Foerster

    Abstract: Obtaining multiple meaningfully diverse, high quality samples from Large Language Models for a fixed prompt remains an open challenge. Current methods for increasing diversity often only operate at the token-level, paraphrasing the same response. This is problematic because it leads to poor exploration on reasoning problems and to unengaging, repetitive conversational agents. To address this we pr… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  2. arXiv:2506.04051  [pdf, ps, other

    cs.CL cs.AI

    High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

    Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa

    Abstract: Large Language Models (LLMs) currently respond to every prompt. However, they can produce incorrect answers when they lack knowledge or capability -- a problem known as hallucination. We instead propose post-training an LLM to generate content only when confident in its correctness and to otherwise (partially) abstain. Specifically, our method, HALT, produces capability-aligned post-training data… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  3. arXiv:2506.01687  [pdf, ps, other

    cs.CL

    StochasTok: Improving Fine-Grained Subword Understanding in LLMs

    Authors: Anya Sims, Thom Foster, Klara Kaleb, Tuan-Duy H. Nguyen, Joseph Lee, Jakob N. Foerster, Yee Whye Teh, Cong Lu

    Abstract: Subword-level understanding is integral to numerous tasks, including understanding multi-digit numbers, spelling mistakes, abbreviations, rhyming, and wordplay. Despite this, current large language models (LLMs) still often struggle with seemingly simple subword-level tasks like How many 'r's in 'strawberry'?. A key factor behind these failures is tokenization which obscures the fine-grained struc… ▽ More

    Submitted 10 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  4. arXiv:2505.22442  [pdf, ps, other

    cs.LG cs.AI

    SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning

    Authors: Mattie Fellows, Clarisse Wibault, Uljad Berdica, Johannes Forkel, Michael A. Osborne, Jakob N. Foerster

    Abstract: Sample efficiency remains a major obstacle for real world adoption of reinforcement learning (RL): success has been limited to settings where simulators provide access to essentially unlimited environment interactions, which in reality are typically costly or dangerous to obtain. Offline RL in principle offers a solution by exploiting offline data to learn a near-optimal policy before deployment.… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  5. arXiv:2505.20659  [pdf, ps, other

    cs.LG

    An Optimisation Framework for Unsupervised Environment Design

    Authors: Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster

    Abstract: For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing st… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Reinforcement Learning Conference 2025

  6. arXiv:2504.11453  [pdf, other

    cs.LG cs.AI cs.RO

    A Clean Slate for Offline Reinforcement Learning

    Authors: Matthew Thomas Jackson, Uljad Berdica, Jarek Liesen, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: Progress in offline reinforcement learning (RL) has been impeded by ambiguous problem definitions and entangled algorithmic designs, resulting in inconsistent implementations, insufficient ablations, and unfair evaluations. Although offline RL explicitly avoids environment interaction, prior methods frequently employ extensive, undocumented online evaluation for hyperparameter tuning, complicating… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  7. arXiv:2503.17821  [pdf, other

    cs.AI

    OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

    Authors: Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, Jakob Nicolaus Foerster

    Abstract: AI agents hold the potential to transform everyday life by helping humans achieve their goals. To do this successfully, agents need to be able to coordinate with novel partners without prior interaction, a setting known as zero-shot coordination (ZSC). Overcooked has become one of the most popular benchmarks for evaluating coordination capabilities of AI agents and learning algorithms. In this wor… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  8. SensPS: Sensing Personal Space Comfortable Distance between Human-Human Using Multimodal Sensors

    Authors: Ko Watanabe, Nico Förster, Shoya Ishimaru

    Abstract: Personal space, also known as peripersonal space, is crucial in human social interaction, influencing comfort, communication, and social stress. Estimating and respecting personal space is essential for enhancing human-computer interaction (HCI) and smart environments. Personal space preferences vary due to individual traits, cultural background, and contextual factors. Advanced multimodal sensing… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  9. arXiv:2502.00757  [pdf, other

    cs.CR cs.AI cs.NE

    AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

    Authors: J Rosser, Jakob Nicolaus Foerster

    Abstract: Scaffolding Large Language Models (LLMs) into multi-agent systems often improves performance on complex tasks, but the safety impact of such scaffolds has not been thoroughly explored. We introduce AgentBreeder, a framework for multi-objective self-improving evolutionary search over scaffolds. We evaluate discovered scaffolds on widely recognized reasoning, mathematics, and safety benchmarks and c… ▽ More

    Submitted 14 April, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

    MSC Class: 68T42; 68T50 ACM Class: I.2.11

  10. arXiv:2411.13543  [pdf, other

    cs.AI

    BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

    Authors: Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

    Abstract: Large Language Models (LLMs) and Vision Language Models (VLMs) possess extensive knowledge and exhibit promising reasoning abilities, however, they still struggle to perform well in complex, dynamic environments. Real-world tasks require handling intricate interactions, advanced spatial reasoning, long-term planning, and continuous exploration of new strategies-areas in which we lack effective met… ▽ More

    Submitted 1 April, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Published as a conference paper at ICLR 2025

  11. arXiv:2411.06568  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning Objectives for Preference Optimization

    Authors: Carlo Alfano, Silvia Sapora, Jakob Nicolaus Foerster, Patrick Rebeschini, Yee Whye Teh

    Abstract: Evaluating preference optimization (PO) algorithms on LLM alignment is a challenging task that presents prohibitive costs, noise, and several variables like model size and hyper-parameters. In this work, we show that it is possible to gain insights on the efficacy of PO algorithm on much simpler benchmarks. We design a diagnostic suite of MuJoCo tasks and datasets, which we use to systematically e… ▽ More

    Submitted 4 February, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

  12. arXiv:2411.00666  [pdf, other

    cs.LG cs.AI

    Beyond the Boundaries of Proximal Policy Optimization

    Authors: Charlie B. Tan, Edan Toledo, Benjamin Ellis, Jakob N. Foerster, Ferenc Huszár

    Abstract: Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the outer-loop application of updates using gradient ascent with unity learning rate. Using this insight we propose outer proximal policy optimization (outer-PPO); a fr… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  13. arXiv:2409.10588  [pdf, ps, other

    q-bio.PE cs.AI cs.GT cs.MA

    ADIOS: Antibody Development via Opponent Shaping

    Authors: Sebastian Towers, Aleksandra Kalisz, Philippe A. Robert, Alicia Higueruelo, Francesca Vianello, Ming-Han Chloe Tsai, Harrison Steel, Jakob N. Foerster

    Abstract: Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. Th… ▽ More

    Submitted 6 June, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: Accepted at ICML 2025

    MSC Class: 92-08 ACM Class: I.2.1; J.3

  14. arXiv:2407.07082  [pdf, other

    cs.LG cs.AI

    Can Learned Optimization Make Reinforcement Learning Less Difficult?

    Authors: Alexander David Goldie, Chris Lu, Matthew Thomas Jackson, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whet… ▽ More

    Submitted 15 April, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Added Metadata for Neurips 2024

    Journal ref: Advances in Neural Information Processing Systems 37 (2024) 5454-5497

  15. arXiv:2407.04811  [pdf, other

    cs.LG

    Simplifying Deep Temporal Difference Learning

    Authors: Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, Mario Martin

    Abstract: Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a large replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target netw… ▽ More

    Submitted 21 April, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

  16. arXiv:2406.12589  [pdf, other

    cs.LG

    Discovering Minimal Reinforcement Learning Environments

    Authors: Jarek Liesen, Chris Lu, Andrei Lupu, Jakob N. Foerster, Henning Sprekeler, Robert T. Lange

    Abstract: Reinforcement learning (RL) agents are commonly trained and evaluated in the same environment. In contrast, humans often train in a specialized environment before being evaluated, such as studying a book before taking an exam. The potential of such specialized training environments is still vastly underexplored, despite their capacity to dramatically speed up training. The framework of synthetic… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  17. arXiv:2406.11905  [pdf, other

    cs.NE cs.LG

    EvIL: Evolution Strategies for Generalisable Imitation Learning

    Authors: Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster

    Abstract: Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e.g. demonstrations collected in simulation but deployment in the real world). Compared to policy-centric approaches to IL like behavioural cloning, reward-centric approaches like inverse reinforcement learning (IRL) often… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, ICML 2024

  18. arXiv:2406.03428  [pdf, other

    cs.LG

    HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

    Authors: Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie, Philip Torr, João F. Henriques, Jakob N. Foerster

    Abstract: Benchmarks have been essential for driving progress in machine learning. A better understanding of LLM capabilities on real world tasks is vital for safe development. Designing adequate LLM benchmarks is challenging: Data from real-world tasks is hard to collect, public availability of static evaluation data results in test data contamination and benchmark overfitting, and periodically generating… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  19. arXiv:2402.05828  [pdf, other

    cs.LG cs.AI

    Discovering Temporally-Aware Reinforcement Learning Algorithms

    Authors: Matthew Thomas Jackson, Chris Lu, Louis Kirsch, Robert Tjarko Lange, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while sti… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Published at ICLR 2024

  20. arXiv:2401.11488  [pdf, other

    eess.SY cs.LG physics.app-ph

    HARDCORE: H-field and power loss estimation for arbitrary waveforms with residual, dilated convolutional neural networks in ferrite cores

    Authors: Wilhelm Kirchgässner, Nikolas Förster, Till Piepenbrock, Oliver Schweins, Oliver Wallscheid

    Abstract: The MagNet Challenge 2023 calls upon competitors to develop data-driven models for the material-specific, waveform-agnostic estimation of steady-state power losses in toroidal ferrite cores. The following HARDCORE (H-field and power loss estimation for Arbitrary waveforms with Residual, Dilated convolutional neural networks in ferrite COREs) approach shows that a residual convolutional neural netw… ▽ More

    Submitted 23 January, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Competition submission version, slightly change author order

  21. arXiv:2311.10090  [pdf, other

    cs.LG cs.AI cs.MA

    JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

    Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Ravi Hammond, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

    Abstract: Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research. Traditionally, RL environments run on the CPU, which limits their scalability with typical academic compute. However, recent advancements in JAX have enabled the wider use of hardware acceleration, enabling massively parallel RL train… ▽ More

    Submitted 2 November, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  22. arXiv:2310.02782  [pdf, other

    cs.LG cs.AI

    Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

    Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), th… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  23. arXiv:2306.01460  [pdf, other

    cs.LG

    ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

    Authors: Andrew Jesson, Chris Lu, Gunshi Gupta, Nicolas Beltran-Velez, Angelos Filos, Jakob Nicolaus Foerster, Yarin Gal

    Abstract: This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm: (1) applying a ReLU function to advantage estimates, (2) spectral normalization of actor-critic weights, and (3) incorporating \emph{dropout as a Bayesian approximation}. We prove… ▽ More

    Submitted 10 October, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  24. arXiv:2303.10733  [pdf, other

    cs.AI cs.MA

    Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

    Authors: Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

    Abstract: By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be co… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: The 11th International Conference on Learning Representations (ICLR)

  25. arXiv:2212.07489  [pdf, other

    cs.LG cs.MA

    SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

    Authors: Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson

    Abstract: The availability of challenging benchmarks has played a key role in the recent progress of machine learning. In cooperative multi-agent reinforcement learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular testbed for centralised training with decentralised execution. However, after years of sustained improvement on SMAC, algorithms now achieve near-perfect performance. In this w… ▽ More

    Submitted 17 October, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  26. arXiv:2210.16175  [pdf, other

    cs.GT cs.AI

    Game-Theoretical Perspectives on Active Equilibria: A Preferred Solution Concept over Nash Equilibria

    Authors: Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob N. Foerster, Gerald Tesauro, Jonathan P. How

    Abstract: Multiagent learning settings are inherently more difficult than single-agent learning because each agent interacts with other simultaneously learning agents in a shared environment. An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies toward desirable behaviors from each agent's perspective. Importantly, if eac… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  27. arXiv:2210.10125  [pdf, other

    cs.LG cs.AI cs.MA

    Proximal Learning With Opponent-Learning Awareness

    Authors: Stephen Zhao, Chris Lu, Roger Baker Grosse, Jakob Nicolaus Foerster

    Abstract: Learning With Opponent-Learning Awareness (LOLA) (Foerster et al. [2018a]) is a multi-agent reinforcement learning algorithm that typically learns reciprocity-based cooperation in partially competitive environments. However, LOLA often fails to learn such behaviour on more complex policy spaces parameterized by neural networks, partly because the update rule is sensitive to the policy parameteriza… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 24 pages (10 pages main paper), 5 figures, to be published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  28. arXiv:2210.06171  [pdf, other

    cs.LG

    Learning to Optimize Quasi-Newton Methods

    Authors: Isaac Liao, Rumen R. Dangovski, Jakob N. Foerster, Marin Soljačić

    Abstract: Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step, but it is unclear what the best preconditioner matrix is. This paper introduces a novel machine learning optimizer called LODO, which tries to online meta-learn t… ▽ More

    Submitted 11 September, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    ACM Class: I.2.6

  29. arXiv:2207.12322  [pdf, other

    cs.AI cs.LG

    Self-Explaining Deviations for Coordination

    Authors: Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster

    Abstract: Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world. In this paper, we focus on a specific subclass of coordination problems in which humans are able to discover self-explaining deviations (SEDs). SEDs are actions that deviate from the common understanding of what reasonable behavior would be in normal circumstances. They are taken with the intention of ca… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  30. arXiv:2207.10170  [pdf, other

    cs.AI

    Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

    Authors: Tim Franzmeyer, Stephen McAleer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt

    Abstract: Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detect… ▽ More

    Submitted 6 May, 2024; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: ICLR 2024 Spotlight (top 5%)

  31. arXiv:2207.07166  [pdf, other

    cs.AI cs.LG cs.MA

    K-level Reasoning for Zero-Shot Coordination in Hanabi

    Authors: Brandon Cui, Hengyuan Hu, Luis Pineda, Jakob N. Foerster

    Abstract: The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together. However, optimal SP policies commonly contain arbitrary conventions ("handshakes") and are not compatible with other, independently trained agents or humans. This latter desiderata was recently formalized by Hu et al. 2020 as the zero-shot coordi… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Neurips 2021. 15 pages. 2 figures

    Journal ref: Advances in Neural Information Processing Systems 2021. Vol 34. 8215--8228

  32. arXiv:2203.03535  [pdf, other

    cs.LG cs.AI cs.MA

    Influencing Long-Term Behavior in Multiagent Reinforcement Learning

    Authors: Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob N. Foerster, Michael Everett, Chuangchuang Sun, Gerald Tesauro, Jonathan P. How

    Abstract: The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors jointly affect the environment's transition and reward dynamics. An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and in… ▽ More

    Submitted 15 October, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: Accepted to NeurIPS 2022. The earlier version was presented at the Gamification and Multiagent Solutions Workshop (ICLR 2022) with a spotlight. Code at https://github.com/dkkim93/further and videos at https://sites.google.com/view/further-marl

  33. arXiv:2002.04676  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

    Authors: Dmitrii Beloborodov, A. E. Ulanov, Jakob N. Foerster, Shimon Whiteson, A. I. Lvovsky

    Abstract: Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The age… ▽ More

    Submitted 14 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: Submitted to ICML 2020. 9 pages, 3 pdf figures. V2: fixed acknowledgements

    Journal ref: Machine Learning: Science and Technology, 2, 025009 (2021)

  34. arXiv:1912.02288  [pdf, other

    cs.AI

    Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

    Authors: Hengyuan Hu, Jakob N Foerster

    Abstract: In recent years we have seen fast progress on a number of benchmark problems in AI, with modern methods achieving near or super human performance in Go, Poker and Dota. One common aspect of all of these challenges is that they are by design adversarial or, technically speaking, zero-sum. In contrast to these settings, success in the real world commonly requires humans to collaborate and communicat… ▽ More

    Submitted 12 May, 2021; v1 submitted 4 December, 2019; originally announced December 2019.

  35. arXiv:1910.10537  [pdf, other

    cs.LG cs.AI

    Robust Visual Domain Randomization for Reinforcement Learning

    Authors: Reda Bahi Slaoui, William R. Clements, Jakob N. Foerster, Sébastien Toth

    Abstract: Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization i… ▽ More

    Submitted 6 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: Accepted at the BeTR-RL Workshop at ICLR 2020

  36. arXiv:1909.04063  [pdf, other

    cs.LG cs.AI stat.ML

    Exploratory Combinatorial Optimization with Reinforcement Learning

    Authors: Thomas D. Barrett, William R. Clements, Jakob N. Foerster, A. I. Lvovsky

    Abstract: Many real-world problems can be reduced to combinatorial optimization on a graph, where the subset or ordering of vertices that maximize some objective function must be found. With such tasks often NP-hard and analytically intractable, reinforcement learning (RL) has shown promise as a framework with which efficient heuristic methods to tackle these problems can be learned. Previous works construc… ▽ More

    Submitted 31 January, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: In Proceedings of the 34th National Conference on Artificial Intelligence, AAAI 2020

    Journal ref: Proceedings of Thirty-fourth AAAI conference on artificial intelligence, 3243-3250 (2020)

  37. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  38. arXiv:1811.01458  [pdf, other

    cs.MA cs.AI cs.LG

    Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

    Authors: Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

    Abstract: When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player,… ▽ More

    Submitted 10 September, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

  39. arXiv:1810.11702  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Multi-Agent Common Knowledge Reinforcement Learning

    Authors: Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

    Abstract: Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons… ▽ More

    Submitted 11 January, 2020; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: Advances in Neural Information Processing Systems, 9924-9935

  40. arXiv:1710.03459  [pdf, ps, other

    physics.ins-det hep-ex

    Complete event-by-event $α$/$γ(β)$ separation in a full-size TeO$_2$ CUORE bolometer by Neganov-Luke-magnified light detection

    Authors: L. Bergé, M. Chapellier, M. de Combarieu, L. Dumoulin, A. Giuliani, M. Gros, P. de Marcillac, S. Marnieros, C. Nones, V. Novati, E. Olivieri, B. Paul, D. V. Poda, T. Redon, B. Siebenborn, A. S. Zolotarova, E. Armengaud, C. Augier, A. Benoît, J. Billard, A. Broniatowski, P. Camus, A. Cazes, F. Charlieux, M. De Jesus , et al. (19 additional authors not shown)

    Abstract: In the present work, we describe the results obtained with a large ($\approx 133$ cm$^3$) TeO$_2$ bolometer, with a view to a search for neutrinoless double-beta decay ($0νββ$) of $^{130}$Te. We demonstrate an efficient $α$ particle discrimination (99.9\%) with a high acceptance of the $0νββ$ signal (about 96\%), expected at $\approx 2.5$ MeV. This unprecedented result was possible thanks to the s… ▽ More

    Submitted 25 April, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: The second version reflects the changes made after PRC referees' comments

    Journal ref: Phys. Rev. C 97, 032501(R) (2018)

  41. arXiv:1709.04326  [pdf, other

    cs.AI cs.GT

    Learning with Opponent-Learning Awareness

    Authors: Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

    Abstract: Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstab… ▽ More

    Submitted 19 September, 2018; v1 submitted 13 September, 2017; originally announced September 2017.

  42. arXiv:1707.04308  [pdf, other

    physics.ins-det astro-ph.CO

    Optimizing EDELWEISS detectors for low-mass WIMP searches

    Authors: EDELWEISS Collaboration, Q. Arnaud, E. Armengaud, C. Augier, A. Benoît, L. Bergé, J. Billard, A. Broniatowski, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, M. De Jésus, L. Dumoulin, K. Eitel, N. Foerster, J. Gascon, A. Giuliani, M. Gros, L. Hehn, Y. Jin, A. Juillard, M. Kleifges, V. Kozlov, H. Kraus , et al. (18 additional authors not shown)

    Abstract: The physics potential of EDELWEISS detectors for the search of low-mass Weakly Interacting Massive Particles (WIMPs) is studied. Using a data-driven background model, projected exclusion limits are computed using frequentist and multivariate analysis approaches, namely profile likelihood and boosted decision tree. Both current and achievable experimental performance are considered. The optimal str… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

    Comments: 21 pages, 12 figures, submitted to Phys. Rev. D

    Journal ref: Phys. Rev. D 97, 022003 (2018)

  43. arXiv:1706.01070  [pdf, other

    physics.ins-det astro-ph.IM

    Performance of the EDELWEISS-III experiment for direct dark matter searches

    Authors: E. Armengaud, Q. Arnaud, C. Augier, A. Benoît, L. Bergé, T. Bergmann, J. Billard, T. de Boissière, G. Bres, A. Broniatowski, V. Brudanin, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, M. De Jésus, L. Dumoulin, K. Eitel, D. Filosofov, N. Foerster, N. Fourches, G. Garde, J. Gascon, A. Giuliani, M. Grollier , et al. (38 additional authors not shown)

    Abstract: We present the results of measurements demonstrating the efficiency of the EDELWEISS-III array of cryogenic germanium detectors for direct dark matter searches. The experimental setup and the FID (Fully Inter-Digitized) detector array is described, as well as the efficiency of the double measurement of heat and ionization signals in background rejection. For the whole set of 24 FID detectors used… ▽ More

    Submitted 4 June, 2017; originally announced June 2017.

  44. arXiv:1704.01758  [pdf, ps, other

    physics.ins-det nucl-ex

    Development of $^{100}$Mo-containing scintillating bolometers for a high-sensitivity neutrinoless double-beta decay search

    Authors: E. Armengaud, C. Augier, A. S. Barabash, J. W. Beeman, T. B. Bekker, F. Bellini, A. Benoît, L. Bergé, T. Bergmann, J. Billard, R. S. Boiko, A. Broniatowski, V. Brudanin, P. Camus, S. Capelli, L. Cardani, N. Casali, A. Cazes, M. Chapellier, F. Charlieux, D. M. Chernyak, M. de Combarieu, N. Coron, F. A. Danevich, I. Dafinei , et al. (77 additional authors not shown)

    Abstract: This paper reports on the development of a technology involving $^{100}$Mo-enriched scintillating bolometers, compatible with the goals of CUPID, a proposed next-generation bolometric experiment to search for neutrinoless double-beta decay. Large mass ($\sim$1~kg), high optical quality, radiopure $^{100}$Mo-containing zinc and lithium molybdate crystals have been produced and used to develop high… ▽ More

    Submitted 4 October, 2017; v1 submitted 6 April, 2017; originally announced April 2017.

    Comments: 25 pages, 12 figures, 8 tables; submitted to EPJC

    Journal ref: Eur. Phys. J. C 77 (2017) 785

  45. arXiv:1611.09434  [pdf, other

    cs.AI cs.CL cs.LG cs.NE

    Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

    Authors: Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo

    Abstract: There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we ca… ▽ More

    Submitted 12 June, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: ICLR 2107 submission: https://openreview.net/forum?id=H1MjAnqxg

  46. arXiv:1607.04560  [pdf, other

    astro-ph.CO hep-ex nucl-ex physics.ins-det

    Measurement of the cosmogenic activation of germanium detectors in EDELWEISS-III

    Authors: The EDELWEISS Collaboration, E. Armengaud, Q. Arnaud, C. Augier, A. Benoît, L. Bergé, J. Billard, J. Blümer, T. de Boissière, A. Broniatowski, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, M. De Jésus, L. Dumoulin, K. Eitel, N. Foerster, J. Gascon, A. Giuliani, M. Gros, L. Hehn, G. Heuermann, Y. Jin, A. Juillard , et al. (24 additional authors not shown)

    Abstract: We present a measurement of the cosmogenic activation in the germanium cryogenic detectors of the EDELWEISS III direct dark matter search experiment. The decay rates measured in detectors with different exposures to cosmic rays above ground are converted into production rates of different isotopes. The measured production rates in units of nuclei/kg/day are 82 $\pm$ 21 for $^3$H, 2.8 $\pm$ 0.6 for… ▽ More

    Submitted 15 July, 2016; originally announced July 2016.

  47. arXiv:1607.03367  [pdf, other

    astro-ph.CO hep-ex physics.ins-det

    Improved EDELWEISS-III sensitivity for low-mass WIMPs using a profile likelihood approach

    Authors: EDELWEISS Collaboration, L. Hehn, E. Armengaud, Q. Arnaud, C. Augier, A. Benoît, L. Bergé, J. Billard, J. Blümer, T. de Boissière, A. Broniatowski, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, M. De Jésus, L. Dumoulin, K. Eitel, N. Foerster, J. Gascon, A. Giuliani, M. Gros, G. Heuermann, Y. Jin, A. Juillard , et al. (24 additional authors not shown)

    Abstract: We report on a dark matter search for a Weakly Interacting Massive Particle (WIMP) in the mass range $m_χ\in [4, 30]\,\mathrm{GeV}/c^2$ with the EDELWEISS-III experiment. A 2D profile likelihood analysis is performed on data from eight selected detectors with the lowest energy thresholds leading to a combined fiducial exposure of 496 kg-days. External backgrounds from $γ$- and $β$-radiation, recoi… ▽ More

    Submitted 20 September, 2016; v1 submitted 12 July, 2016; originally announced July 2016.

    Comments: 11 pages, 6 figures, 2 tables (updated to accepted version)

    Journal ref: EPJ C (2016) 76:548

  48. arXiv:1606.08097  [pdf, other

    physics.ins-det astro-ph.CO astro-ph.IM

    Signals induced by charge-trapping in EDELWEISS FID detectors: analytical modeling and applications

    Authors: The EDELWEISS Collaboration, Q. Arnaud, E. Armengaud, C. Augier, A. Benoît, L. Bergé, J. Billard, J. Blümer, T. de Boissière, A. Broniatowski, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, L. Dumoulin, K. Eitel, N. Foerster, N. Fourches, J. Gascon, A. Giuliani, M. Gros, L. Hehn, G. Heuermann, M. De Jésus, Y. Jin , et al. (25 additional authors not shown)

    Abstract: The EDELWEISS-III direct dark matter search experiment uses cryogenic HP-Ge detectors Fully covered with Inter-Digitized electrodes (FID). They are operated at low fields ($<1\;\mathrm{V/cm}$), and as a consequence charge-carrier trapping significantly affects both the ionization and heat energy measurements. This paper describes an analytical model of the signals induced by trapped charges in FID… ▽ More

    Submitted 29 June, 2016; v1 submitted 26 June, 2016; originally announced June 2016.

    Comments: 17 pages 12 figures, submitted to JINST, author list updated

  49. arXiv:1605.06676  [pdf, other

    cs.AI cs.LG cs.MA

    Learning to Communicate with Deep Multi-Agent Reinforcement Learning

    Authors: Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

    Abstract: We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communicati… ▽ More

    Submitted 24 May, 2016; v1 submitted 21 May, 2016; originally announced May 2016.

  50. arXiv:1603.05120  [pdf, other

    astro-ph.CO hep-ex physics.ins-det

    Constraints on low-mass WIMPs from the EDELWEISS-III dark matter search

    Authors: EDELWEISS Collaboration, E. Armengaud, Q. Arnaud, C. Augier, A. Benoît, A. Benoît, L. Bergé, T. Bergmann, J. Billard, J. Blümer, T. de Boissière, G. Bres, A. Broniatowski, V. Brudanin, P. Camus, A. Cazes, M. Chapellier, F. Charlieux, L. Dumoulin, K. Eitel, D. Filosofov, N. Foerster, N. Fourches, G. Garde, J. Gascon , et al. (42 additional authors not shown)

    Abstract: We present the results of a search for elastic scattering from galactic dark matter in the form of Weakly Interacting Massive Particles (WIMPs) in the 4-30 GeV/$c^2$ mass range. We make use of a 582 kg-day fiducial exposure from an array of 800 g Germanium bolometers equipped with a set of interleaved electrodes with full surface coverage. We searched specifically for $\sim 2.5-20$ keV nuclear rec… ▽ More

    Submitted 9 May, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

    Comments: Matches published version

    Journal ref: JCAP 05 (2016) 019