-
On the Structure of Information
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Shannon information and Shannon entropy are undoubtedly the most commonly used quantitative measures of information, cropping up in the literature across a broad variety of disciplines, often in contexts unrelated to coding theory. Here, we generalize the original idea behind Shannon entropy as the cost of encoding a sample of a random variable in terms of the required codeword length, to arbitrar…
▽ More
Shannon information and Shannon entropy are undoubtedly the most commonly used quantitative measures of information, cropping up in the literature across a broad variety of disciplines, often in contexts unrelated to coding theory. Here, we generalize the original idea behind Shannon entropy as the cost of encoding a sample of a random variable in terms of the required codeword length, to arbitrary loss functions by considering the optimally achievable loss given a certain level of knowledge about the random variable. By formalizing knowledge in terms of the measure-theoretic notion of sub-$σ$-algebras, we arrive at a general notion of uncertainty reduction that includes entropy and information as special cases: entropy is the reduction of uncertainty from no (or partial) knowledge to full knowledge about a random variable, whereas information is uncertainty reduction from no (or partial) knowledge to partial knowledge. As examples, we get Shannon information and entropy when measuring loss in terms of message length, variance for square error loss, and more generally, for the Bregman loss, we get Bregman information. Moreover, we show that appealing properties of Shannon entropy and information extend to the general case, including well-known relations involving the KL divergence, which are extended to divergences of proper scoring rules.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Automatic Bat Call Classification using Transformer Networks
Authors:
Frank Fundel,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data…
▽ More
Automatically identifying bat species from their echolocation calls is a difficult but important task for monitoring bats and the ecosystem they live in. Major challenges in automatic bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models suffer from relatively poor performance on real-life data due to being trained on single call datasets and, moreover, are often too slow for real-time classification. Here, we propose a Transformer architecture for multi-label classification with potential applications in real-time classification scenarios. We train our model on synthetically generated multi-species recordings by merging multiple bats calls into a single recording with multiple simultaneous calls. Our approach achieves a single species accuracy of 88.92% (F1-score of 84.23%) and a multi species macro F1-score of 74.40% on our test set. In comparison to three other tools on the independent and publicly available dataset ChiroVox, our model achieves at least 25.82% better accuracy for single species classification and at least 6.9% better macro F1-score for multi species classification.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Hierarchically Structured Task-Agnostic Continual Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learnin…
▽ More
One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Countability constraints in order-theoretic approaches to computability
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Computability on uncountable sets has no standard formalization, unlike that on countable sets, which is given by Turing machines. Some of the approaches to define computability in these sets rely on order-theoretic structures to translate such notions from Turing machines to uncountable spaces. Since these machines are used as a baseline for computability in these approaches, countability restric…
▽ More
Computability on uncountable sets has no standard formalization, unlike that on countable sets, which is given by Turing machines. Some of the approaches to define computability in these sets rely on order-theoretic structures to translate such notions from Turing machines to uncountable spaces. Since these machines are used as a baseline for computability in these approaches, countability restrictions on the ordered structures are fundamental. Here, we show several relations between the usual countability restrictions in order-theoretic theories of computability and some more common order-theoretic countability constraints, like order density properties and functional characterizations of the order structure in terms of multi-utilities. As a result, we show how computability can be introduced in some order structures via countability order density and multi-utility constraints.
△ Less
Submitted 28 May, 2024; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Computation as uncertainty reduction: a simplified order-theoretic framework
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
Although there is a somewhat standard formalization of computability on countable sets given by Turing machines, the same cannot be said about uncountable sets. Among the approaches to define computability in these sets, order-theoretic structures have proven to be useful. Here, we discuss the mathematical structure needed to define computability using order-theoretic concepts. In particular, we i…
▽ More
Although there is a somewhat standard formalization of computability on countable sets given by Turing machines, the same cannot be said about uncountable sets. Among the approaches to define computability in these sets, order-theoretic structures have proven to be useful. Here, we discuss the mathematical structure needed to define computability using order-theoretic concepts. In particular, we introduce a more general framework and discuss its limitations compared to the previous one in domain theory. We expose four features in which the stronger requirements in the domain-theoretic structure allow to improve upon the more general framework: computable elements, computable functions, model dependence of computability and complexity theory. Crucially, we show computability of elements in uncountable spaces can be defined in this new setup, and argue why this is not the case for computable functions. Moreover, we show the stronger setup diminishes the dependence of computability on the chosen order-theoretic structure and that, although a suitable complexity theory can be defined in the stronger framework and the more general one posesses a notion of computable elements, there appears to be no proper notion of element complexity in the latter.
△ Less
Submitted 6 September, 2022; v1 submitted 28 June, 2022;
originally announced June 2022.
-
On a geometrical notion of dimension for partially ordered sets
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify the degree of incomparability and, thus, is regarded as a measure of complexity for partial orders. However, despite its usefulness, its definition is somewhat disconnected from the geometrical idea of dimension, where, essentially, the number of dimensions indicates how many real lines are required to r…
▽ More
The well-known notion of dimension for partial orders by Dushnik and Miller allows to quantify the degree of incomparability and, thus, is regarded as a measure of complexity for partial orders. However, despite its usefulness, its definition is somewhat disconnected from the geometrical idea of dimension, where, essentially, the number of dimensions indicates how many real lines are required to represent the underlying partially ordered set.
Here, we introduce a variation of the Dushnik-Miller notion of dimension that is closer to geometry, the Debreu dimension, and show the following main results: (i) how to construct its building blocks under some countability restrictions, (ii) its relation to other notions of dimension in the literature, and (iii), as an application of the above, we improve on the classification of preordered spaces through real-valued monotones.
△ Less
Submitted 2 September, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
The classification of preordered spaces in terms of monotones: complexity and optimization
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
The study of complexity and optimization in decision theory involves both partial and complete characterizations of preferences over decision spaces in terms of real-valued monotones. With this motivation, and following the recent introduction of new classes of monotones, like injective monotones or strict monotone multi-utilities, we present the classification of preordered spaces in terms of bot…
▽ More
The study of complexity and optimization in decision theory involves both partial and complete characterizations of preferences over decision spaces in terms of real-valued monotones. With this motivation, and following the recent introduction of new classes of monotones, like injective monotones or strict monotone multi-utilities, we present the classification of preordered spaces in terms of both the existence and cardinality of real-valued monotones and the cardinality of the quotient space. In particular, we take advantage of a characterization of real-valued monotones in terms of separating families of increasing sets in order to obtain a more complete classification consisting of classes that are strictly different from each other. As a result, we gain new insight into both complexity and optimization, and clarify their interplay in preordered spaces.
△ Less
Submitted 14 August, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Mixture-of-Variational-Experts for Continual Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and devel…
▽ More
One weakness of machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning (CL) paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We discuss this principle from a Bayesian perspective and show its connections to previous approaches to CL. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method in continual supervised learning and in continual reinforcement learning.
△ Less
Submitted 1 March, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Representing preorders with injective monotones
Authors:
Pedro Hack,
Daniel A. Braun,
Sebastian Gottwald
Abstract:
We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We show that the class of preorders for which they exist lies in between the class of preorders with strict monotones and preorders with countable multi-utilities, improving upon the known classification of preordered spaces through real-valued monotones. We extend several well-known results for strict mon…
▽ More
We introduce a new class of real-valued monotones in preordered spaces, injective monotones. We show that the class of preorders for which they exist lies in between the class of preorders with strict monotones and preorders with countable multi-utilities, improving upon the known classification of preordered spaces through real-valued monotones. We extend several well-known results for strict monotones (Richter-Peleg functions) to injective monotones, we provide a construction of injective monotones from countable multi-utilities, and relate injective monotones to classic results concerning Debreu denseness and order separability. Along the way, we connect our results to Shannon entropy and the uncertainty preorder, obtaining new insights into how they are related. In particular, we show how injective montones can be used to generalize some appealing properties of Jaynes' maximum entropy principle, which is considered a basis for statistical inference and serves as a justification for many regularization techniques that appear throughout machine learning and decision theory.
△ Less
Submitted 24 November, 2021; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts
Authors:
Peter Bellmann,
Heinke Hihn,
Daniel A. Braun,
Friedhelm Schwenker
Abstract:
In many real-world pattern recognition scenarios, such as in medical applications, the corresponding classification tasks can be of an imbalanced nature. In the current study, we focus on binary, imbalanced classification tasks, i.e.~binary classification tasks in which one of the two classes is under-represented (minority class) in comparison to the other class (majority class). In the literature…
▽ More
In many real-world pattern recognition scenarios, such as in medical applications, the corresponding classification tasks can be of an imbalanced nature. In the current study, we focus on binary, imbalanced classification tasks, i.e.~binary classification tasks in which one of the two classes is under-represented (minority class) in comparison to the other class (majority class). In the literature, many different approaches have been proposed, such as under- or oversampling, to counter class imbalance. In the current work, we introduce a novel method, which addresses the issues of class imbalance. To this end, we first transfer the binary classification task to an equivalent regression task. Subsequently, we generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced, with respect to the redefined target label set. We evaluate our approach on a number of publicly available data sets in combination with Support Vector Machines. Moreover, we compare our proposed method to one of the most popular oversampling techniques (SMOTE). Based on the detailed discussion of the presented outcomes of our experimental evaluation, we provide promising ideas for future research directions.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Specialization in Hierarchical Learning Systems
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information…
▽ More
Joining multiple decision-makers together is a powerful way to obtain more sophisticated decision-making systems, but requires to address the questions of division of labor and specialization. We investigate in how far information constraints in hierarchies of experts not only provide a principled method for regularization but also to enforce specialization. In particular, we devise an information-theoretically motivated on-line learning rule that allows partitioning of the problem space into multiple sub-problems that can be solved by the individual experts. We demonstrate two different ways to apply our method: (i) partitioning problems based on individual data samples and (ii) based on sets of data samples representing tasks. Approach (i) equips the system with the ability to solve complex decision-making problems by finding an optimal combination of local expert decision-makers. Approach (ii) leads to decision-makers specialized in solving families of tasks, which equips the system with the ability to solve meta-learning problems. We show the broad applicability of our approach on a range of problems including classification, regression, density estimation, and reinforcement learning problems, both in the standard machine learning setup and in a meta-learning setting.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
The Two Kinds of Free Energy and the Bayesian Revolution
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current mo…
▽ More
The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current models of intelligent agency, that can both be considered as applications of Bayesian inference to the problem of action selection: one that appears when trading off accuracy and uncertainty based on a general maximum entropy principle, and one that formulates action selection in terms of minimizing an error measure that quantifies deviations of beliefs and policies from given reference models. The first approach provides a normative rule for action selection in the face of model uncertainty or when information processing capabilities are limited. The second approach directly aims to formulate the action selection problem as an inference problem in the context of Bayesian brain theories, also known as Active Inference in the literature. We elucidate the main ideas and discuss critical technical and conceptual issues revolving around these two notions of free energy that both claim to apply at all levels of decision-making, from the high-level deliberation of reasoning down to the low-level information processing of perception.
△ Less
Submitted 6 December, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Hierarchical Expert Networks for Meta-Learning
Authors:
Heinke Hihn,
Daniel A. Braun
Abstract:
The goal of meta-learning is to train a model on a variety of learning tasks, such that it can adapt to new problems within only a few iterations. Here we propose a principled information-theoretic model that optimally partitions the underlying problem space such that specialized expert decision-makers solve the resulting sub-problems. To drive this specialization we impose the same kind of inform…
▽ More
The goal of meta-learning is to train a model on a variety of learning tasks, such that it can adapt to new problems within only a few iterations. Here we propose a principled information-theoretic model that optimally partitions the underlying problem space such that specialized expert decision-makers solve the resulting sub-problems. To drive this specialization we impose the same kind of information processing constraints both on the partitioning and the expert decision-makers. We argue that this specialization leads to efficient adaptation to new tasks. To demonstrate the generality of our approach we evaluate three meta-learning domains: image classification, regression, and reinforcement learning.
△ Less
Submitted 9 September, 2020; v1 submitted 31 October, 2019;
originally announced November 2019.
-
An Information-theoretic On-line Learning Principle for Specialization in Hierarchical Decision-Making Systems
Authors:
Heinke Hihn,
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an informati…
▽ More
Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an information-theoretic principle that drives division of labor and specialization when decision-makers with information constraints are joined together. We devise an on-line learning rule of this principle that learns a partitioning of the problem space such that it can be solved by specialized linear policies. We demonstrate the approach for decision-making problems whose complexity exceeds the capabilities of individual decision-makers, but can be solved by combining the decision-makers optimally. The strength of the model is that it is abstract and principled, yet has direct applications in classification, regression, reinforcement learning and adaptive control.
△ Less
Submitted 5 December, 2019; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Bounded rational decision-making from elementary computations that reduce uncertainty
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
In its most basic form, decision-making can be viewed as a computational process that progressively eliminates alternatives, thereby reducing uncertainty. Such processes are generally costly, meaning that the amount of uncertainty that can be reduced is limited by the amount of available computational resources. Here, we introduce the notion of elementary computation based on a fundamental princip…
▽ More
In its most basic form, decision-making can be viewed as a computational process that progressively eliminates alternatives, thereby reducing uncertainty. Such processes are generally costly, meaning that the amount of uncertainty that can be reduced is limited by the amount of available computational resources. Here, we introduce the notion of elementary computation based on a fundamental principle for probability transfers that reduce uncertainty. Elementary computations can be considered as the inverse of Pigou-Dalton transfers applied to probability distributions, closely related to the concepts of majorization, T-transforms, and generalized entropies that induce a preorder on the space of probability distributions. As a consequence we can define resource cost functions that are order-preserving and therefore monotonic with respect to the uncertainty reduction. This leads to a comprehensive notion of decision-making processes with limited resources. Along the way, we prove several new results on majorization theory, as well as on entropy and divergence measures.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Systems of bounded rational agents with information-theoretic constraints
Authors:
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Specialization and hierarchical organization are important features of efficient collaboration in economical, artificial, and biological systems. Here, we investigate the hypothesis that both features can be explained by the fact that each entity of such a system is limited in a certain way. We propose an information-theoretic approach based on a Free Energy principle, in order to computationally…
▽ More
Specialization and hierarchical organization are important features of efficient collaboration in economical, artificial, and biological systems. Here, we investigate the hypothesis that both features can be explained by the fact that each entity of such a system is limited in a certain way. We propose an information-theoretic approach based on a Free Energy principle, in order to computationally analyze systems of bounded rational agents that deal with such limitations optimally. We find that specialization allows to focus on fewer tasks, thus leading to a more efficient execution, but in turn requires coordination in hierarchical structures of specialized experts and coordinating units. Our results suggest that hierarchical architectures of specialized units at lower levels that are coordinated by units at higher levels are optimal, given that each unit's information-processing capability is limited and conforms to constraints on complexity costs.
△ Less
Submitted 16 September, 2018;
originally announced September 2018.
-
Bounded Rational Decision-Making with Adaptive Neural Network Priors
Authors:
Heinke Hihn,
Sebastian Gottwald,
Daniel A. Braun
Abstract:
Bounded rationality investigates utility-optimizing decision-makers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents' prior and posterior policy. Between prior and posterior lies an anytime deliberati…
▽ More
Bounded rationality investigates utility-optimizing decision-makers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents' prior and posterior policy. Between prior and posterior lies an anytime deliberation process that can be instantiated by sample-based evaluations of the utility function through Markov Chain Monte Carlo (MCMC) optimization. The most simple model assumes a fixed prior and can relate abstract information-theoretic processing costs to the number of sample evaluations. However, more advanced models would also address the question of learning, that is how the prior is adapted over time such that generated prior proposals become more efficient. In this work we investigate generative neural networks as priors that are optimized concurrently with anytime sample-based decision-making processes such as MCMC. We evaluate this approach on toy examples.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
An information-theoretic on-line update principle for perception-action coupling
Authors:
Zhen Peng,
Tim Genewein,
Felix Leibfried,
Daniel A. Braun
Abstract:
Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems [Bogh et al., 2016]. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow [Genewein et al., 2015] and formulate a constrained optimization prob…
▽ More
Inspired by findings of sensorimotor coupling in humans and animals, there has recently been a growing interest in the interaction between action and perception in robotic systems [Bogh et al., 2016]. Here we consider perception and action as two serial information channels with limited information-processing capacity. We follow [Genewein et al., 2015] and formulate a constrained optimization problem that maximizes utility under limited information-processing capacity in the two channels. As a solution we obtain an optimal perceptual channel and an optimal action channel that are coupled such that perceptual information is optimized with respect to downstream processing in the action module. The main novelty of this study is that we propose an online optimization procedure to find bounded-optimal perception and action channels in parameterized serial perception-action systems. In particular, we implement the perceptual channel as a multi-layer neural network and the action channel as a multinomial distribution. We illustrate our method in a NAO robot simulator with a simplified cup lifting task.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Authors:
Jordi Grau-Moya,
Felix Leibfried,
Tim Genewein,
Daniel A. Braun
Abstract:
Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we c…
▽ More
Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.
△ Less
Submitted 7 April, 2016;
originally announced April 2016.
-
Bounded Rational Decision-Making in Feedforward Neural Networks
Authors:
Felix Leibfried,
Daniel Alexander Braun
Abstract:
Bounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each…
▽ More
Bounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each neuron is considered as a bounded rational decision-maker or the network as a whole. In the update rules, bounded rationality translates into information-theoretically motivated types of regularization in weight space. In experiments on the MNIST benchmark classification task for handwritten digits, we show that such information-theoretic regularization successfully prevents overfitting across different architectures and attains results that are competitive with other recent techniques like dropout, dropconnect and Bayes by backprop, for both ordinary and convolutional neural networks.
△ Less
Submitted 23 May, 2016; v1 submitted 26 February, 2016;
originally announced February 2016.
-
Information-Theoretic Bounded Rationality
Authors:
Pedro A. Ortega,
Daniel A. Braun,
Justin Dyer,
Kee-Eung Kim,
Naftali Tishby
Abstract:
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the…
▽ More
Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.
-
Adaptive information-theoretic bounded rational decision-making with parametric priors
Authors:
Jordi Grau-Moya,
Daniel A. Braun
Abstract:
Deviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing…
▽ More
Deviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing costs are measured by the relative entropy between a posterior decision strategy and a given fixed prior strategy. In the case of multiple environments, it can be shown that there is an optimal prior rendering the bounded rationality problem equivalent to the rate distortion problem for lossy compression in information theory. Accordingly, the optimal prior and posterior strategies can be computed by the well-known Blahut-Arimoto algorithm which requires the computation of partition sums over all possible outcomes and cannot be applied straightforwardly to continuous problems. Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory. Importantly, the update rule avoids typical infeasible operations such as the computation of partition sums. We show in simulations a proof of concept for discrete action and environment domains. This approach is not only interesting as a generic computational method, but might also provide a more realistic model of human decision-making processes occurring on a fast and a slow time scale.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
Bounded Rational Decision-Making in Changing Environments
Authors:
Jordi Grau-Moya,
Daniel A. Braun
Abstract:
A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility.…
▽ More
A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and illustrate with simulations its relationship with computational resources.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
Abstraction in decision-makers with limited information processing capabilities
Authors:
Tim Genewein,
Daniel A. Braun
Abstract:
A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly for…
▽ More
A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for decision making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.
△ Less
Submitted 19 December, 2013; v1 submitted 16 December, 2013;
originally announced December 2013.
-
Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior p…
▽ More
Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Here we discuss three important features of this approach. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.
△ Less
Submitted 18 March, 2013;
originally announced March 2013.
-
A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
Authors:
Pedro A. Ortega,
Jordi Grau-Moya,
Tim Genewein,
David Balduzzi,
Daniel A. Braun
Abstract:
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly…
▽ More
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. We illustrate the effectiveness of our model by optimizing a noisy, high-dimensional, non-convex objective function.
△ Less
Submitted 10 November, 2012; v1 submitted 8 June, 2012;
originally announced June 2012.
-
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized se…
▽ More
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments. We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.
△ Less
Submitted 17 May, 2012;
originally announced May 2012.
-
Information, Utility & Bounded Rationality
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades…
▽ More
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered.
△ Less
Submitted 28 July, 2011;
originally announced July 2011.
-
An axiomatic formalization of bounded rationality based on a utility-information equivalence
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigne…
▽ More
Classic decision-theory is based on the maximum expected utility (MEU) principle, but crucially ignores the resource costs incurred when determining optimal decisions. Here we propose an axiomatic framework for bounded decision-making that considers resource costs. Agents are formalized as probability measures over input-output streams. We postulate that any such probability measure can be assigned a corresponding conjugate utility function based on three axioms: utilities should be real-valued, additive and monotonic mappings of probabilities. We show that these axioms enforce a unique conversion law between utility and probability (and thereby, information). Moreover, we show that this relation can be characterized as a variational principle: given a utility function, its conjugate probability measure maximizes a free utility functional. Transformations of probability measures can then be formalized as a change in free utility due to the addition of new constraints expressed by a target utility function. Accordingly, one obtains a criterion to choose a probability measure that trades off the maximization of a target utility function and the cost of the deviation from a reference distribution. We show that optimal control, adaptive estimation and adaptive control problems can be solved this way in a resource-efficient way. When resource costs are ignored, the MEU principle is recovered. Our formalization might thus provide a principled approach to bounded rationality that establishes a close link to information theory.
△ Less
Submitted 6 July, 2010;
originally announced July 2010.
-
Convergence of Bayesian Control Rule
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves t…
▽ More
Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves the convergence of the Bayesian control rule under two sufficient assumptions: boundedness, which is an ergodicity condition; and consistency, which is an instantiation of the sure-thing principle.
△ Less
Submitted 16 February, 2010;
originally announced February 2010.
-
A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a se…
▽ More
Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation.
△ Less
Submitted 7 February, 2010;
originally announced February 2010.
-
A conversion between utility and information
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely…
▽ More
Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely determined by the negative information content. To analyze stochastic processes, we define the utility of a realization as its reward rate. Under this interpretation, we show that the expected utility of a stochastic process is its negative entropy rate. Furthermore, we apply our results to analyze agent-environment interactions. We show that the expected utility that will actually be achieved by the agent is given by the negative cross-entropy from the input-output (I/O) distribution of the coupled interaction system and the agent's I/O distribution. Thus, our results allow for an information-theoretic interpretation of the notion of utility and the characterization of agent-environment interactions in terms of entropy dynamics.
△ Less
Submitted 30 December, 2009; v1 submitted 26 November, 2009;
originally announced November 2009.
-
A Bayesian Rule for Adaptive Control based on Causal Interventions
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a `possible world', but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compati…
▽ More
Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a `possible world', but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compatible with the true world. A natural measure of adaptation can be obtained by the Kullback-Leibler (KL) divergence between the I/O distribution of the true world and the I/O distribution expected by the agent that is uncertain about possible worlds. In the case of pure input streams, the Bayesian mixture provides a well-known solution for this problem. We show, however, that in the case of I/O streams this solution breaks down, because outputs are issued by the agent itself and require a different probabilistic syntax as provided by intervention calculus. Based on this calculus, we obtain a Bayesian control rule that allows modeling adaptive behavior with mixture distributions over I/O streams. This rule might allow for a novel approach to adaptive control based on a minimum KL-principle.
△ Less
Submitted 30 December, 2009; v1 submitted 26 November, 2009;
originally announced November 2009.
-
A Minimum Relative Entropy Principle for Learning and Acting
Authors:
Pedro A. Ortega,
Daniel A. Braun
Abstract:
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. I…
▽ More
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.
△ Less
Submitted 10 April, 2010; v1 submitted 20 October, 2008;
originally announced October 2008.