Skip to main content

Showing 1–50 of 164 results for author: Foerster, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02554  [pdf, ps, other

    cs.AI cs.LG

    AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

    Authors: Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, Yoram Bachrach

    Abstract: AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search polic… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Code: https://github.com/facebookresearch/aira-dojo

  2. arXiv:2506.22419  [pdf, ps, other

    cs.AI cs.CL cs.LG

    The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

    Authors: Bingchen Zhao, Despoina Magka, Minqi Jiang, Xian Li, Roberta Raileanu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Kelvin Niu, Shagun Sodhani, Michael Shvartsman, Andrei Lupu, Alisia Lupidi, Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Thomas Foster, Lucia Cipolina-Kun, Abhishek Charnalia, Derek Dunfield, Alexander H. Miller, Oisin Mac Aodha, Jakob Foerster, Yoram Bachrach

    Abstract: Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedr… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  3. arXiv:2506.21490  [pdf, ps, other

    cs.AI cs.HC cs.MA

    Ad-Hoc Human-AI Coordination Challenge

    Authors: Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Darius Muglich, Johannes Forkel, Jakob Nicolaus Foerster

    Abstract: Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been… ▽ More

    Submitted 29 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: Published at ICML 2025

  4. arXiv:2506.20664  [pdf, ps, other

    cs.AI cs.CL cs.HC cs.MA

    The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind

    Authors: Andrei Lupu, Timon Willi, Jakob Foerster

    Abstract: As Large Language Models (LLMs) gain agentic abilities, they will have to navigate complex multi-agent scenarios, interacting with human users and other agents in cooperative and competitive settings. This will require new reasoning skills, chief amongst them being theory of mind (ToM), or the ability to reason about the "mental" states of other agents. However, ToM and other multi-agent abilities… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 41 pages, 19 figures

  5. arXiv:2506.18777  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

    Authors: Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, Laura Ruis

    Abstract: Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O e… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.09659  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Intent Factored Generation: Unleashing the Diversity in Your Language Model

    Authors: Eltayeb Ahmed, Uljad Berdica, Martha Elliott, Danijela Horak, Jakob N. Foerster

    Abstract: Obtaining multiple meaningfully diverse, high quality samples from Large Language Models for a fixed prompt remains an open challenge. Current methods for increasing diversity often only operate at the token-level, paraphrasing the same response. This is problematic because it leads to poor exploration on reasoning problems and to unengaging, repetitive conversational agents. To address this we pr… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  7. arXiv:2506.04051  [pdf, ps, other

    cs.CL cs.AI

    High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

    Authors: Tim Franzmeyer, Archie Sravankumar, Lijuan Liu, Yuning Mao, Rui Hou, Sinong Wang, Jakob N. Foerster, Luke Zettlemoyer, Madian Khabsa

    Abstract: Large Language Models (LLMs) currently respond to every prompt. However, they can produce incorrect answers when they lack knowledge or capability -- a problem known as hallucination. We instead propose post-training an LLM to generate content only when confident in its correctness and to otherwise (partially) abstain. Specifically, our method, HALT, produces capability-aligned post-training data… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  8. arXiv:2506.01687  [pdf, ps, other

    cs.CL

    StochasTok: Improving Fine-Grained Subword Understanding in LLMs

    Authors: Anya Sims, Thom Foster, Klara Kaleb, Tuan-Duy H. Nguyen, Joseph Lee, Jakob N. Foerster, Yee Whye Teh, Cong Lu

    Abstract: Subword-level understanding is integral to numerous tasks, including understanding multi-digit numbers, spelling mistakes, abbreviations, rhyming, and wordplay. Despite this, current large language models (LLMs) still often struggle with seemingly simple subword-level tasks like How many 'r's in 'strawberry'?. A key factor behind these failures is tokenization which obscures the fine-grained struc… ▽ More

    Submitted 10 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  9. arXiv:2505.22442  [pdf, ps, other

    cs.LG cs.AI

    SOReL and TOReL: Two Methods for Fully Offline Reinforcement Learning

    Authors: Mattie Fellows, Clarisse Wibault, Uljad Berdica, Johannes Forkel, Michael A. Osborne, Jakob N. Foerster

    Abstract: Sample efficiency remains a major obstacle for real world adoption of reinforcement learning (RL): success has been limited to settings where simulators provide access to essentially unlimited environment interactions, which in reality are typically costly or dangerous to obtain. Offline RL in principle offers a solution by exploiting offline data to learn a near-optimal policy before deployment.… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  10. arXiv:2505.20659  [pdf, ps, other

    cs.LG

    An Optimisation Framework for Unsupervised Environment Design

    Authors: Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster

    Abstract: For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing st… ▽ More

    Submitted 9 July, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Reinforcement Learning Conference 2025

  11. arXiv:2504.11453  [pdf, other

    cs.LG cs.AI cs.RO

    A Clean Slate for Offline Reinforcement Learning

    Authors: Matthew Thomas Jackson, Uljad Berdica, Jarek Liesen, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: Progress in offline reinforcement learning (RL) has been impeded by ambiguous problem definitions and entangled algorithmic designs, resulting in inconsistent implementations, insufficient ablations, and unfair evaluations. Although offline RL explicitly avoids environment interaction, prior methods frequently employ extensive, undocumented online evaluation for hyperparameter tuning, complicating… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  12. arXiv:2504.08066  [pdf, other

    cs.AI cs.CL cs.LG

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    Authors: Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, David Ha

    Abstract: AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scien… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  13. arXiv:2503.17821  [pdf, other

    cs.AI

    OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

    Authors: Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, Jakob Nicolaus Foerster

    Abstract: AI agents hold the potential to transform everyday life by helping humans achieve their goals. To do this successfully, agents need to be able to coordinate with novel partners without prior interaction, a setting known as zero-shot coordination (ZSC). Overcooked has become one of the most popular benchmarks for evaluating coordination capabilities of AI agents and learning algorithms. In this wor… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  14. arXiv:2503.10492  [pdf, other

    quant-ph cond-mat.mes-hall cs.LG physics.comp-ph

    Meta-learning characteristics and dynamics of quantum systems

    Authors: Lucas Schorling, Pranav Vaidhyanathan, Jonas Schuff, Miguel J. Carballido, Dominik Zumbühl, Gerard Milburn, Florian Marquardt, Jakob Foerster, Michael A. Osborne, Natalia Ares

    Abstract: While machine learning holds great promise for quantum technologies, most current methods focus on predicting or controlling a specific quantum system. Meta-learning approaches, however, can adapt to new systems for which little data is available, by leveraging knowledge obtained from previous data associated with similar systems. In this paper, we meta-learn dynamics and characteristics of closed… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 6+1 pages, 4 figures. L. Schorling and P. Vaidhyanathan contributed equally to this work

  15. arXiv:2502.14499  [pdf, other

    cs.CL cs.AI cs.LG

    MLGym: A New Framework and Benchmark for Advancing AI Research Agents

    Authors: Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon, Vincent Moens, Amar Budhiraja, Despoina Magka, Vladislav Vorotilov, Gaurav Chaurasia, Dieuwke Hupkes, Ricardo Silveira Cabral, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach, William Yang Wang, Roberta Raileanu

    Abstract: We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. MLGym-bench consists of 13 diverse and open-ended AI research tasks from diverse domains such as computer vision,… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 35 pages, 12 figures, 10 tables

  16. arXiv:2502.14143  [pdf, other

    cs.MA cs.AI cs.CY cs.ET cs.LG

    Multi-Agent Risks from Advanced AI

    Authors: Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran , et al. (19 additional authors not shown)

    Abstract: The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Cooperative AI Foundation, Technical Report #1

  17. arXiv:2502.12272  [pdf

    cs.LG cs.AI cs.CL

    Learning to Reason at the Frontier of Learnability

    Authors: Thomas Foster, Jakob Foerster

    Abstract: Reinforcement learning is now widely adopted as the final stage of large language model training, especially for reasoning-style tasks such as maths problems. Typically, models attempt each question many times during a single training step and attempt to learn from their successes and failures. However, we demonstrate that throughout training with two popular algorithms (PPO and VinePPO) on two wi… ▽ More

    Submitted 24 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  18. arXiv:2502.09172  [pdf, ps, other

    cs.LG cs.CE q-fin.CP q-fin.TR

    LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book Data

    Authors: Peer Nagy, Sascha Frey, Kang Li, Bidipta Sarkar, Svitlana Vyetrenko, Stefan Zohren, Ani Calinescu, Jakob Foerster

    Abstract: While financial data presents one of the most challenging and interesting sequence modelling tasks due to high noise, heavy tails, and strategic interactions, progress in this area has been hindered by the lack of consensus on quantitative evaluation paradigms. To address this, we present LOB-Bench, a benchmark, implemented in python, designed to evaluate the quality and realism of generative mess… ▽ More

    Submitted 16 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Journal ref: Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025

  19. arXiv:2502.01711  [pdf, other

    cs.MA

    Expected Return Symmetries

    Authors: Darius Muglich, Johannes Forkel, Elise van der Pol, Jakob Foerster

    Abstract: Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move "left'' or "right'', and where a reward o… ▽ More

    Submitted 12 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Published at ICLR 2025

  20. arXiv:2502.00757  [pdf, ps, other

    cs.CR cs.AI cs.NE

    AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

    Authors: J Rosser, Jakob Nicolaus Foerster

    Abstract: Scaffolding Large Language Models (LLMs) into multi-agent systems often improves performance on complex tasks, but the safety impact of such scaffolds has not been thoroughly explored. We introduce AgentBreeder, a framework for multi-objective self-improving evolutionary search over scaffolds. We evaluate discovered scaffolds on widely recognized reasoning, mathematics, and safety benchmarks and c… ▽ More

    Submitted 25 June, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

    MSC Class: 68T42; 68T50 ACM Class: I.2.11

  21. arXiv:2502.00075  [pdf, other

    cs.CL cs.LG

    BTS: Harmonizing Specialized Experts into a Generalist LLM

    Authors: Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis

    Abstract: We present Branch-Train-Stitch (BTS), an efficient and flexible training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model. Following Li et al., we start with a single seed language model which is branched into domain-specific (e.g., coding or math) experts with continual pretraining. BTS combines experts into a generalist mode… ▽ More

    Submitted 31 January, 2025; originally announced February 2025.

  22. arXiv:2412.17113  [pdf, other

    cs.LG math.OC

    Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

    Authors: Benjamin Ellis, Matthew T. Jackson, Andrei Lupu, Alexander D. Goldie, Mattie Fellows, Shimon Whiteson, Jakob Foerster

    Abstract: In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat th… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

  23. arXiv:2412.09810  [pdf, other

    cs.LG

    The Complexity Dynamics of Grokking

    Authors: Branton DeMoss, Silvia Sapora, Jakob Foerster, Nick Hawes, Ingmar Posner

    Abstract: We investigate the phenomenon of generalization through the lens of compression. In particular, we study the complexity dynamics of neural networks to explain grokking, where networks suddenly transition from memorizing to generalizing solutions long after over-fitting the training data. To this end we introduce a new measure of intrinsic complexity for neural networks based on the theory of Kolmo… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  24. arXiv:2411.13543  [pdf, other

    cs.AI

    BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

    Authors: Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

    Abstract: Large Language Models (LLMs) and Vision Language Models (VLMs) possess extensive knowledge and exhibit promising reasoning abilities, however, they still struggle to perform well in complex, dynamic environments. Real-world tasks require handling intricate interactions, advanced spatial reasoning, long-term planning, and continuous exploration of new strategies-areas in which we lack effective met… ▽ More

    Submitted 1 April, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Published as a conference paper at ICLR 2025

  25. arXiv:2411.06568  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning Objectives for Preference Optimization

    Authors: Carlo Alfano, Silvia Sapora, Jakob Nicolaus Foerster, Patrick Rebeschini, Yee Whye Teh

    Abstract: Evaluating preference optimization (PO) algorithms on LLM alignment is a challenging task that presents prohibitive costs, noise, and several variables like model size and hyper-parameters. In this work, we show that it is possible to gain insights on the efficacy of PO algorithm on much simpler benchmarks. We design a diagnostic suite of MuJoCo tasks and datasets, which we use to systematically e… ▽ More

    Submitted 4 February, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

  26. arXiv:2411.04976  [pdf, other

    cs.LG

    Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

    Authors: Usman Anwar, Ashish Pandian, Jia Wan, David Krueger, Jakob Foerster

    Abstract: Zero-shot coordination (ZSC) is a popular setting for studying the ability of reinforcement learning (RL) agents to coordinate with novel partners. Prior ZSC formulations assume the $\textit{problem setting}$ is common knowledge: each agent knows the underlying Dec-POMDP, knows others have this knowledge, and so on ad infinitum. However, this assumption rarely holds in complex real-world settings,… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

  27. arXiv:2411.03069  [pdf, ps, other

    cs.LO

    Conformance Games for Graded Semantics

    Authors: Jonas Forster, Lutz Schröder, Paul Wild

    Abstract: Game-theoretic characterizations of process equivalences traditionally form a central topic in concurrency; for example, most equivalences on the classical linear-time / branching-time spectrum come with such characterizations. Recent work on so-called graded semantics has led to a generic behavioural equivalence game that covers the mentioned games on the linear-time~/ branching-time spectrum and… ▽ More

    Submitted 27 January, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

    MSC Class: 68Q85 (Primary) ACM Class: F.3.1; F.3.2; F.4.1

  28. arXiv:2411.00666  [pdf, other

    cs.LG cs.AI

    Beyond the Boundaries of Proximal Policy Optimization

    Authors: Charlie B. Tan, Edan Toledo, Benjamin Ellis, Jakob N. Foerster, Ferenc Huszár

    Abstract: Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the outer-loop application of updates using gradient ascent with unity learning rate. Using this insight we propose outer proximal policy optimization (outer-PPO); a fr… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  29. arXiv:2410.23208  [pdf, other

    cs.LG cs.AI

    Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

    Authors: Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster

    Abstract: While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a gene… ▽ More

    Submitted 3 March, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 Oral. The first two authors contributed equally. Project page located at: https://kinetix-env.github.io/

  30. arXiv:2410.21159  [pdf, other

    cs.HC cs.AI

    CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

    Authors: Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

    Abstract: We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (with 337 use cases each) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendatio… ▽ More

    Submitted 29 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    MSC Class: 68T05 ACM Class: I.2.0; I.2.7; K.4.2; H.5.2; I.2.6

  31. Reinforcement Learning Controllers for Soft Robots using Learned Environments

    Authors: Uljad Berdica, Matthew Jackson, Niccolò Enrico Veronese, Jakob Foerster, Perla Maiolino

    Abstract: Soft robotic manipulators offer operational advantage due to their compliant and deformable structures. However, their inherently nonlinear dynamics presents substantial challenges. Traditional analytical methods often depend on simplifying assumptions, while learning-based techniques can be computationally demanding and limit the control policies to existing data. This paper introduces a novel ap… ▽ More

    Submitted 25 October, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: soft manipulator, reinforcement learning, learned controllers

    Journal ref: 2024 IEEE 7th International Conference on Soft Robotics (RoboSoft), San Diego, CA, USA, 2024, pp. 933-939

  32. arXiv:2410.03608  [pdf, other

    cs.AI cs.CL cs.HC cs.LG

    TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation

    Authors: Jonathan Cook, Tim Rocktäschel, Jakob Foerster, Dennis Aumiller, Alex Wang

    Abstract: Given the widespread adoption and usage of Large Language Models (LLMs), it is crucial to have flexible and interpretable evaluations of their instruction-following ability. Preference judgments between model outputs have become the de facto evaluation standard, despite distilling complex, multi-faceted preferences into a single ranking. Furthermore, as human annotation is slow and costly, LLMs ar… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  33. arXiv:2409.10588  [pdf, ps, other

    q-bio.PE cs.AI cs.GT cs.MA

    ADIOS: Antibody Development via Opponent Shaping

    Authors: Sebastian Towers, Aleksandra Kalisz, Philippe A. Robert, Alicia Higueruelo, Francesca Vianello, Ming-Han Chloe Tsai, Harrison Steel, Jakob N. Foerster

    Abstract: Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. Th… ▽ More

    Submitted 6 June, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: Accepted at ICML 2025

    MSC Class: 92-08 ACM Class: I.2.1; J.3

  34. arXiv:2409.08239  [pdf, other

    cs.CL cs.AI

    Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

    Authors: Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli

    Abstract: Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage. In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costly human annotations. Source2Synth takes as input a custom data source and produces synthetic data points with intermediate reasoning steps grounded in… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  35. arXiv:2409.00853  [pdf, other

    cs.AI cs.NE

    JaxLife: An Open-Ended Agentic Simulator

    Authors: Chris Lu, Michael Beukman, Michael Matthews, Jakob Foerster

    Abstract: Human intelligence emerged through the process of natural selection and evolution on Earth. We investigate what it would take to re-create this process in silico. While past work has often focused on low-level processes (such as simulating physics or chemistry), we instead take a more targeted approach, aiming to evolve agents that can accumulate open-ended culture and technologies across generati… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  36. arXiv:2408.15099  [pdf, other

    cs.LG cs.AI cs.RO

    No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

    Authors: Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda, Nick Hawes, Jakob Foerster

    Abstract: What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select… ▽ More

    Submitted 29 October, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  37. arXiv:2408.08274  [pdf, other

    cs.LG

    BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

    Authors: Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Ustun, Acyr Locatelli

    Abstract: The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  38. arXiv:2408.06292  [pdf, other

    cs.AI cs.CL cs.LG

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Authors: Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, David Ha

    Abstract: One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehen… ▽ More

    Submitted 31 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  39. arXiv:2407.07082  [pdf, other

    cs.LG cs.AI

    Can Learned Optimization Make Reinforcement Learning Less Difficult?

    Authors: Alexander David Goldie, Chris Lu, Matthew Thomas Jackson, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whet… ▽ More

    Submitted 15 April, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Added Metadata for Neurips 2024

    Journal ref: Advances in Neural Information Processing Systems 37 (2024) 5454-5497

  40. arXiv:2407.04811  [pdf, other

    cs.LG

    Simplifying Deep Temporal Difference Learning

    Authors: Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, Mario Martin

    Abstract: Q-learning played a foundational role in the field reinforcement learning (RL). However, TD algorithms with off-policy data, such as Q-learning, or nonlinear function approximation like deep neural networks require several additional tricks to stabilise training, primarily a large replay buffer and target networks. Unfortunately, the delayed updating of frozen network parameters in the target netw… ▽ More

    Submitted 21 April, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

  41. arXiv:2406.18420  [pdf, other

    cs.LG cs.AI

    Mixture of Experts in a Mixture of RL settings

    Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

    Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  42. arXiv:2406.15042  [pdf, other

    cs.LG cs.AI

    Behaviour Distillation

    Authors: Andrei Lupu, Chris Lu, Jarek Liesen, Robert Tjarko Lange, Jakob Foerster

    Abstract: Dataset distillation aims to condense large datasets into a small number of synthetic examples that can be used as drop-in replacements when training new models. It has applications to interpretability, neural architecture search, privacy, and continual learning. Despite strong successes in supervised domains, such methods have not yet been extended to reinforcement learning, where the lack of a f… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Published as a conference paper at ICLR 2024

  43. arXiv:2406.12589  [pdf, other

    cs.LG

    Discovering Minimal Reinforcement Learning Environments

    Authors: Jarek Liesen, Chris Lu, Andrei Lupu, Jakob N. Foerster, Henning Sprekeler, Robert T. Lange

    Abstract: Reinforcement learning (RL) agents are commonly trained and evaluated in the same environment. In contrast, humans often train in a specialized environment before being evaluated, such as studying a book before taking an exam. The potential of such specialized training environments is still vastly underexplored, despite their capacity to dramatically speed up training. The framework of synthetic… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  44. arXiv:2406.11905  [pdf, other

    cs.NE cs.LG

    EvIL: Evolution Strategies for Generalisable Imitation Learning

    Authors: Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster

    Abstract: Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e.g. demonstrations collected in simulation but deployment in the real world). Compared to policy-centric approaches to IL like behavioural cloning, reward-centric approaches like inverse reinforcement learning (IRL) often… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, ICML 2024

  45. arXiv:2406.08414  [pdf, other

    cs.LG

    Discovering Preference Optimization Algorithms with and for Large Language Models

    Authors: Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange

    Abstract: Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervised learning task using manually-crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of… ▽ More

    Submitted 2 November, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  46. arXiv:2406.03428  [pdf, other

    cs.LG

    HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

    Authors: Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie, Philip Torr, João F. Henriques, Jakob N. Foerster

    Abstract: Benchmarks have been essential for driving progress in machine learning. A better understanding of LLM capabilities on real world tasks is vital for safe development. Designing adequate LLM benchmarks is challenging: Data from real-world tasks is hard to collect, public availability of static evaluation data results in test data contamination and benchmark overfitting, and periodically generating… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  47. arXiv:2406.00392  [pdf, other

    cs.AI

    Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

    Authors: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster

    Abstract: Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approac… ▽ More

    Submitted 28 October, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  48. arXiv:2405.19540  [pdf, other

    cs.IT cs.CR

    Computing Low-Entropy Couplings for Large-Support Distributions

    Authors: Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Foerster, J. Zico Kolter

    Abstract: Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limita… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  49. arXiv:2405.16137  [pdf, other

    cs.RO

    Comparison between Behavior Trees and Finite State Machines

    Authors: Matteo Iovino, Julian Förster, Pietro Falco, Jen Jen Chung, Roland Siegwart, Christian Smith

    Abstract: Behavior Trees (BTs) were first conceived in the computer games industry as a tool to model agent behavior, but they received interest also in the robotics community as an alternative policy design to Finite State Machines (FSMs). The advantages of BTs over FSMs had been highlighted in many works, but there is no thorough practical comparison of the two designs. Such a comparison is particularly r… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE Transactions on Robotics (T-RO). arXiv admin note: text overlap with arXiv:2209.07392

  50. arXiv:2405.08597  [pdf, other

    cs.LG

    Risks and Opportunities of Open-Source Generative AI

    Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster

    Abstract: Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This reg… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Extension of arXiv:2404.17047