Skip to main content

Showing 1–27 of 27 results for author: Schmidhuber, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.23068  [pdf, ps, other

    cs.LG cs.AI stat.AP

    Curious Causality-Seeking Agents Learn Meta Causal World

    Authors: Zhiyu Zhao, Haoxuan Li, Haifeng Zhang, Jun Wang, Francesco Faccio, Jürgen Schmidhuber, Mengyue Yang

    Abstract: When building a world model, a common assumption is that the environment has a single, unchanging underlying causal rule, like applying Newton's laws to every situation. In reality, what appears as a drifting causal mechanism is often the manifestation of a fixed underlying mechanism seen through a narrow observational window. This brings about a problem that, when building a world model, even sub… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: 33 pages

  2. arXiv:2502.05672  [pdf, other

    stat.ML cs.AI cs.LG cs.NE eess.SY

    On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

    Authors: Miroslav Štrupl, Oleg Szehr, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

    Abstract: This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. These algorithms performed competitively across various benchmarks, from games to robotic tasks, but their theoretical understanding is limited to specific environmental conditions. This work initiates a theore… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: 85 pages in main text + 4 pages of references + 26 pages of appendices, 12 figures in main text + 2 figures in appendices; source code available at https://github.com/struplm/eUDRL-GCSL-ODT-Convergence-public

    MSC Class: 68T07 ACM Class: I.2.6; I.5.1

  3. arXiv:2412.03624  [pdf, other

    cs.AI cs.CL cs.LG cs.MA stat.ML

    How to Correctly do Semantic Backpropagation on Language-based Agentic Systems

    Authors: Wenyi Wang, Hisham A. Alyahya, Dylan R. Ashley, Oleg Serikov, Dmitrii Khizbullin, Francesco Faccio, Jürgen Schmidhuber

    Abstract: Language-based agentic systems have shown great promise in recent years, transitioning from solving small-scale research problems to being deployed in challenging real-world tasks. However, optimizing these systems often requires substantial manual labor. Recent studies have demonstrated that these systems can be represented as computational graphs, enabling automatic optimization. Despite these a… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: 11 pages in main text + 2 pages of references + 15 pages of appendices, 2 figures in main text + 17 figures in appendices, 2 tables in main text + 1 table in appendices, 2 algorithms in main text; source code available at https://github.com/HishamAlyahya/semantic_backprop

    MSC Class: 68T07 ACM Class: I.2.6; I.2.11

  4. arXiv:2212.14392  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Eliminating Meta Optimization Through Self-Referential Meta Learning

    Authors: Louis Kirsch, Jürgen Schmidhuber

    Abstract: Meta Learning automates the search for learning algorithms. At the same time, it creates a dependency on human engineering on the meta-level, where meta learning algorithms need to be designed. In this paper, we investigate self-referential meta learning systems that modify themselves without the need for explicit meta optimization. We discuss the relationship of such systems to in-context and mem… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: The first version appeared at ICML 2022, DARL Workshop

  5. arXiv:2207.01570  [pdf, other

    cs.LG stat.ML

    Goal-Conditioned Generators of Deep Policies

    Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under Review

  6. arXiv:2207.01566  [pdf, other

    cs.LG stat.ML

    General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

    Authors: Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

    Abstract: Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Netw… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under review

  7. arXiv:2205.06595  [pdf, other

    stat.ML cs.AI cs.LG

    Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

    Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava

    Abstract: Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching perfo… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: presented at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making; 5 pages in main text + 1 page of references + 3 pages of appendices, 1 figure in main text; source code available at https://github.com/struplm/UDRL-GCSL-counterexample.git

    MSC Class: 68T05 ACM Class: I.2.6

  8. arXiv:2107.09088  [pdf, other

    stat.ML cs.AI cs.LG

    Reward-Weighted Regression Converges to a Global Optimum

    Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

    Abstract: Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic im… ▽ More

    Submitted 23 February, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 7 pages in main text + 2 pages of references + 6 pages of appendices, 1 figure in main text + 1 figure in appendices; source code available at https://github.com/dylanashley/reward-weighted-regression

    MSC Class: 68T05 ACM Class: I.2.6

  9. arXiv:2012.14905  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Meta Learning Backpropagation And Improving It

    Authors: Louis Kirsch, Jürgen Schmidhuber

    Abstract: Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to reprogram fast weights, Hebbian plasticity, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VSML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashio… ▽ More

    Submitted 13 March, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Updated to the NeurIPS 2021 camera ready; fixed typo in eq 4

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  10. arXiv:2010.03635  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Relational Inference

    Authors: Aleksandar Stanić, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may… ▽ More

    Submitted 14 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to AAAI 2021

    ACM Class: I.2.6

  11. Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

    Authors: Aditya Ramesh, Paulo Rauber, Michelangelo Conserva, Jürgen Schmidhuber

    Abstract: An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical… ▽ More

    Submitted 3 November, 2023; v1 submitted 9 July, 2020; originally announced July 2020.

    Journal ref: Neural Computation. 2022 Oct 7;34(11):2232-72

  12. arXiv:2006.09226  [pdf, other

    cs.LG cs.AI stat.ML

    Parameter-Based Value Functions

    Authors: Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can ge… ▽ More

    Submitted 13 August, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2021

  13. arXiv:1910.06611  [pdf, other

    cs.LG stat.ML

    Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

    Authors: Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao

    Abstract: We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, cal… ▽ More

    Submitted 4 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

  14. arXiv:1910.05231  [pdf, other

    cs.LG stat.ML

    R-SQAIR: Relational Sequential Attend, Infer, Repeat

    Authors: Aleksandar Stanić, Jürgen Schmidhuber

    Abstract: Traditional sequential multi-object attention models rely on a recurrent mechanism to infer object relations. We propose a relational extension (R-SQAIR) of one such attention model (SQAIR) by endowing it with a module with strong relational inductive bias that computes in parallel pairwise interactions between inferred objects. Two recently proposed relational modules are studied on tasks of unsu… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

    Comments: 4 page workshop paper accepted at the NeurIPS 2019 Workshop on Perception as Generative Reasoning: Structure, Causality, Probability

    ACM Class: I.2.6

  15. arXiv:1910.04098  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Improving Generalization in Meta Reinforcement Learning using Learned Objectives

    Authors: Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn. Unlike recent meta-RL algorithms,… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Accepted to ICLR 2020

    ACM Class: I.2.6

  16. arXiv:1906.05915  [pdf, other

    cs.LG stat.ML

    Recurrent Neural Processes

    Authors: Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer

    Abstract: We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models. RNPs model the state space with Neural Processes. Given time series observed on fast real-world time scales but containing slow long-term variabilities, RNPs may derive appropriate slow latent time scales. They do so in an efficient manner by establishing conditional indep… ▽ More

    Submitted 5 November, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

  17. arXiv:1906.01035  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    A Perspective on Objects and Systematic Generalization in Model-Based RL

    Authors: Sjoerd van Steenkiste, Klaus Greff, Jürgen Schmidhuber

    Abstract: In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted to the ICML 2019 workshop on Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI

    ACM Class: I.2.6

  18. arXiv:1905.12506  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Are Disentangled Representations Helpful for Abstract Visual Reasoning?

    Authors: Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem

    Abstract: A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representa… ▽ More

    Submitted 7 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted to NeurIPS 2019

    MSC Class: I.2.6 ACM Class: I.2.6

  19. arXiv:1811.12143  [pdf, other

    cs.LG cs.NE stat.ML

    Learning to Reason with Third-Order Tensor Products

    Authors: Imanol Schlag, Jürgen Schmidhuber

    Abstract: We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data. This improves symbolic interpretation and systematic generalisation. Our architecture is trained end-to-end through gradient descent on a variety of simple natural language reasoning tasks, significantly outperforming the latest state-of-the-art models in single-task… ▽ More

    Submitted 8 January, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

  20. arXiv:1809.01999  [pdf, other

    cs.LG stat.ML

    Recurrent World Models Facilitate Policy Evolution

    Authors: David Ha, Jürgen Schmidhuber

    Abstract: A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an enviro… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: To appear at NIPS 2018, selected for an oral presentation. arXiv admin note: substantial text overlap with arXiv:1803.10122

  21. World Models

    Authors: David Ha, Jürgen Schmidhuber

    Abstract: We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We c… ▽ More

    Submitted 9 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

  22. arXiv:1708.03498  [pdf, other

    cs.LG cs.NE stat.ML

    Neural Expectation Maximization

    Authors: Klaus Greff, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on… ▽ More

    Submitted 4 November, 2017; v1 submitted 11 August, 2017; originally announced August 2017.

    Comments: Accepted to NIPS 2017

    ACM Class: I.2.6

  23. arXiv:1305.0423  [pdf, other

    cs.LG cs.AI stat.ML

    Testing Hypotheses by Regularized Maximum Mean Discrepancy

    Authors: Somayeh Danafar, Paola M. V. Rancoita, Tobias Glasmachers, Kevin Whittingstall, Juergen Schmidhuber

    Abstract: Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis tes… ▽ More

    Submitted 2 May, 2013; originally announced May 2013.

  24. arXiv:1209.6048  [pdf, other

    stat.ME

    Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

    Authors: Yi Sun, Faustino Gomez, Juergen Schmidhuber

    Abstract: We present a new way of converting a reversible finite Markov chain into a non-reversible one, with a theoretical guarantee that the asymptotic variance of the MCMC estimator based on the non-reversible chain is reduced. The method is applicable to any reversible chain whose states are not connected through a tree, and can be interpreted graphically as inserting vortices into the state transition… ▽ More

    Submitted 26 September, 2012; originally announced September 2012.

    Comments: Published in NIPS 2010

  25. arXiv:1206.4623  [pdf

    cs.LG stat.ML

    On the Size of the Online Kernel Sparsification Dictionary

    Authors: Yi Sun, Faustino Gomez, Juergen Schmidhuber

    Abstract: We analyze the size of the dictionary constructed from online kernel sparsification, using a novel formula that expresses the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Using this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we show that under certain tec… ▽ More

    Submitted 18 June, 2012; originally announced June 2012.

    Comments: ICML2012

  26. arXiv:1106.4487  [pdf, ps, other

    stat.ML cs.NE

    Natural Evolution Strategies

    Authors: Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jürgen Schmidhuber

    Abstract: This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms. NES maintains a parameterized distribution on the set of solution candidates, and the natural gradient is used to update the distribution's parameters in the direction of higher expected fitness. We intr… ▽ More

    Submitted 22 June, 2011; originally announced June 2011.

  27. arXiv:1103.5708  [pdf, other

    cs.AI stat.ML

    Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

    Authors: Yi Sun, Faustino Gomez, Juergen Schmidhuber

    Abstract: To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.

    Submitted 29 March, 2011; originally announced March 2011.