-
Factored space models: Towards causality between levels of abstraction
Authors:
Scott Garrabrant,
Matthias Georg Mayer,
Magdalena Wache,
Leon Lang,
Sam Eisenstat,
Holger Dell
Abstract:
Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on mathematical models for causality, most of which is focused on causal graphs. Causal graphs are a powerful tool for a wide range of applications, in particular when the relevant variables are known and at the same level of abstraction. However, the given variables can also be unstructure…
▽ More
Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on mathematical models for causality, most of which is focused on causal graphs. Causal graphs are a powerful tool for a wide range of applications, in particular when the relevant variables are known and at the same level of abstraction. However, the given variables can also be unstructured data, like pixels of an image. Meanwhile, the causal variables, such as the positions of objects in the image, can be arbitrary deterministic functions of the given variables. Moreover, the causal variables may form a hierarchy of abstractions, in which the macro-level variables are deterministic functions of the micro-level variables. Causal graphs are limited when it comes to modeling this kind of situation. In the presence of deterministic relationships there is generally no causal graph that satisfies both the Markov condition and the faithfulness condition. We introduce factored space models as an alternative to causal graphs which naturally represent both probabilistic and deterministic relationships at all levels of abstraction. Moreover, we introduce structural independence and establish that it is equivalent to statistical independence in every distribution that factorizes over the factored space. This theorem generalizes the classical soundness and completeness theorem for d-separation.
△ Less
Submitted 20 December, 2024; v1 submitted 3 December, 2024;
originally announced December 2024.
-
Temporal Inference with Finite Factored Sets
Authors:
Scott Garrabrant
Abstract:
We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm - though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use of factored sets, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog of d-separat…
▽ More
We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm - though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use of factored sets, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog of d-separation for factored sets, conditional orthogonality, and we demonstrate that this notion is equivalent to conditional independence in all probability distributions on a finite factored set.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Cartesian Frames
Authors:
Scott Garrabrant,
Daniel A. Herrmann,
Josiah Lopez-Wild
Abstract:
We introduce a novel framework, the theory of Cartesian frames (CF), that gives powerful tools for manipulating sets of acts. The CF framework takes as its most fundamental building block that an agent can freely choose from a set of available actions. The framework uses the mathematics of Chu spaces to develop a calculus of those sets of actions, how those actions change at various levels of desc…
▽ More
We introduce a novel framework, the theory of Cartesian frames (CF), that gives powerful tools for manipulating sets of acts. The CF framework takes as its most fundamental building block that an agent can freely choose from a set of available actions. The framework uses the mathematics of Chu spaces to develop a calculus of those sets of actions, how those actions change at various levels of description, and how different agents' actions can combine when agents work in concert. We discuss how this framework might provide an illuminating perspective on issues in decision theory and formal epistemology.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Risks from Learned Optimization in Advanced Machine Learning Systems
Authors:
Evan Hubinger,
Chris van Merwijk,
Vladimir Mikulik,
Joar Skalse,
Scott Garrabrant
Abstract:
We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances…
▽ More
We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.
△ Less
Submitted 1 December, 2021; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Embedded Agency
Authors:
Abram Demski,
Scott Garrabrant
Abstract:
Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts.
We provide an informal survey of obstacles to formal…
▽ More
Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts.
We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. Such agents must optimize an environment that is not of type "function"; they must rely on models that fit within the modeled environment; and they must reason about themselves as just another physical system, made of parts that can be modified and that can work at cross purposes.
△ Less
Submitted 6 October, 2020; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Categorizing Variants of Goodhart's Law
Authors:
David Manheim,
Scott Garrabrant
Abstract:
There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly be…
▽ More
There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.
△ Less
Submitted 24 February, 2019; v1 submitted 12 March, 2018;
originally announced March 2018.
-
A Formal Approach to the Problem of Logical Non-Omniscience
Authors:
Scott Garrabrant,
Tsvi Benson-Tilsen,
Andrew Critch,
Nate Soares,
Jessica Taylor
Abstract:
We present the logical induction criterion for computable algorithms that assign probabilities to every logical statement in a given formal language, and refine those probabilities over time. The criterion is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence phi is associated with a stock that is worth $1 per share if phi is true and nothing otherwise, and w…
▽ More
We present the logical induction criterion for computable algorithms that assign probabilities to every logical statement in a given formal language, and refine those probabilities over time. The criterion is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence phi is associated with a stock that is worth $1 per share if phi is true and nothing otherwise, and we interpret the belief-state of a logically uncertain reasoner as a set of market prices, where pt_N(phi)=50% means that on day N, shares of phi may be bought or sold from the reasoner for 50%. A market is then called a logical inductor if (very roughly) there is no polynomial-time computable trading strategy with finite risk tolerance that earns unbounded profits in that market over time. We then describe how this single criterion implies a number of desirable properties of bounded reasoners; for example, logical inductors outpace their underlying deductive process, perform universal empirical induction given enough time to think, and place strong trust in their own reasoning process.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Logical Induction
Authors:
Scott Garrabrant,
Tsvi Benson-Tilsen,
Andrew Critch,
Nate Soares,
Jessica Taylor
Abstract:
We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that o…
▽ More
We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that our algorithm, an instance of what we call a logical inductor, satisfies a number of intuitive desiderata, including: (1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time; (2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom; and (3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference. For example, if a given computer program only ever produces outputs in a certain range, a logical inductor learns this fact in a timely manner; and if late digits in the decimal expansion of $π$ are difficult to predict, then a logical inductor learns to assign $\approx 10\%$ probability to "the $n$th digit of $π$ is a 7" for large $n$. Logical inductors also learn to trust their future beliefs more than their current beliefs, and their beliefs are coherent in the limit (whenever $φ\implies ψ$, $\mathbb{P}_\infty(φ) \le \mathbb{P}_\infty(ψ)$, and so on); and logical inductors strictly dominate the universal semimeasure in the limit.
These properties and many others all follow from a single logical induction criterion, which is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence $φ$ is associated with a stock that is worth \$1 per share if [...]
△ Less
Submitted 7 December, 2020; v1 submitted 12 September, 2016;
originally announced September 2016.
-
Inductive Coherence
Authors:
Scott Garrabrant,
Benya Fallenstein,
Abram Demski,
Nate Soares
Abstract:
While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities…
▽ More
While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation. To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce inductive coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion.
△ Less
Submitted 7 October, 2016; v1 submitted 18 April, 2016;
originally announced April 2016.
-
Asymptotic Convergence in Online Learning with Unbounded Delays
Authors:
Scott Garrabrant,
Nate Soares,
Jessica Taylor
Abstract:
We study the problem of predicting the results of computations that are too expensive to run, via the observation of the results of smaller computations. We model this as an online learning problem with delayed feedback, where the length of the delay is unbounded, which we study mainly in a stochastic setting. We show that in this setting, consistency is not possible in general, and that optimal f…
▽ More
We study the problem of predicting the results of computations that are too expensive to run, via the observation of the results of smaller computations. We model this as an online learning problem with delayed feedback, where the length of the delay is unbounded, which we study mainly in a stochastic setting. We show that in this setting, consistency is not possible in general, and that optimal forecasters might not have average regret going to zero. However, it is still possible to give algorithms that converge asymptotically to Bayes-optimal predictions, by evaluating forecasters on specific sparse independent subsequences of their predictions. We give an algorithm that does this, which converges asymptotically on good behavior, and give very weak bounds on how long it takes to converge. We then relate our results back to the problem of predicting large computations in a deterministic setting.
△ Less
Submitted 7 September, 2016; v1 submitted 18 April, 2016;
originally announced April 2016.
-
Asymptotic Logical Uncertainty and The Benford Test
Authors:
Scott Garrabrant,
Siddharth Bhaskar,
Abram Demski,
Joanna Garrabrant,
George Koleszarik,
Evan Lloyd
Abstract:
We give an algorithm A which assigns probabilities to logical sentences. For any simple infinite sequence of sentences whose truth-values appear indistinguishable from a biased coin that outputs "true" with probability p, we have that the sequence of probabilities that A assigns to these sentences converges to p.
We give an algorithm A which assigns probabilities to logical sentences. For any simple infinite sequence of sentences whose truth-values appear indistinguishable from a biased coin that outputs "true" with probability p, we have that the sequence of probabilities that A assigns to these sentences converges to p.
△ Less
Submitted 12 October, 2015;
originally announced October 2015.
-
Pattern avoidance is not P-recursive
Authors:
Scott Garrabrant,
Igor Pak
Abstract:
Let $F \subset S_k$ be a finite set of permutations and let $C_n(F)$ denote the number of permutations $σ$ in $S_n$ avoiding the set of patterns $F$. The Noonan-Zeilberger conjecture states that the sequence ${C_n(F)}$ is P-recursive. We use Computability Theory to disprove this conjecture.
Let $F \subset S_k$ be a finite set of permutations and let $C_n(F)$ denote the number of permutations $σ$ in $S_n$ avoiding the set of patterns $F$. The Noonan-Zeilberger conjecture states that the sequence ${C_n(F)}$ is P-recursive. We use Computability Theory to disprove this conjecture.
△ Less
Submitted 24 May, 2015;
originally announced May 2015.
-
Words in Linear Groups, Random Walks, Automata and P-Recursiveness
Authors:
Scott Garrabrant,
Igor Pak
Abstract:
Fix a finite set $S \subset {GL}(k,\mathbb{Z})$. Denote by $a_n$ the number of products of matrices in $S$ of length $n$ that are equal to 1. We show that the sequence $\{a_n\}$ is not always P-recursive. This answers a question of Kontsevich.
Fix a finite set $S \subset {GL}(k,\mathbb{Z})$. Denote by $a_n$ the number of products of matrices in $S$ of length $n$ that are equal to 1. We show that the sequence $\{a_n\}$ is not always P-recursive. This answers a question of Kontsevich.
△ Less
Submitted 23 February, 2015;
originally announced February 2015.
-
Counting With Irrational Tiles
Authors:
Scott Garrabrant,
Igor Pak
Abstract:
We introduce and study the number of tilings of unit height rectangles with irrational tiles. We prove that the class of sequences of these numbers coincides with the class of diagonals of N-rational generating functions and a class of certain binomial multisums. We then give asymptotic applications and establish connections to hypergeometric functions and Catalan numbers.
We introduce and study the number of tilings of unit height rectangles with irrational tiles. We prove that the class of sequences of these numbers coincides with the class of diagonals of N-rational generating functions and a class of certain binomial multisums. We then give asymptotic applications and establish connections to hypergeometric functions and Catalan numbers.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
Cofinite Induced Subgraphs of Impartial Combinatorial Games: An Analysis of CIS-Nim
Authors:
Scott M. Garrabrant,
Eric J. Friedman,
Adam Scott Landsberg
Abstract:
Given an impartial combinatorial game G, we create a class of related games (CIS-G) by specifying a finite set of positions in G and forbidding players from moving to those positions (leaving all other game rules unchanged). Such modifications amount to taking cofinite induced subgraphs (CIS) of the original game graph. Some recent numerical/heuristic work has suggested that the underlying structu…
▽ More
Given an impartial combinatorial game G, we create a class of related games (CIS-G) by specifying a finite set of positions in G and forbidding players from moving to those positions (leaving all other game rules unchanged). Such modifications amount to taking cofinite induced subgraphs (CIS) of the original game graph. Some recent numerical/heuristic work has suggested that the underlying structure and behavior of such "CIS-games" can shed new light on, and bears interesting relationships with, the original games from which they are derived. In this paper we present an analytical treatment of the cofinite induced subgraphs associated with the game of (three-heap) Nim. This constitutes one of the simplest nontrivial cases of a CIS game. Our main finding is that although the structure of the winning strategies in games of CIS-Nim can differ greatly from that of Nim, CIS-Nim games inherit a type of period-two scale invariance from the original game of Nim.
△ Less
Submitted 1 January, 2012;
originally announced January 2012.
-
Using TPA to count linear extensions
Authors:
Jacqueline Banks,
Scott Garrabrant,
Mark L. Huber,
Anne Perizzolo
Abstract:
A linear extension of a poset $P$ is a permutation of the elements of the set that respects the partial order. Let $L(P)$ denote the number of linear extensions. It is a #P complete problem to determine $L(P)$ exactly for an arbitrary poset, and so randomized approximation algorithms that draw randomly from the set of linear extensions are used. In this work, the set of linear extensions is embedd…
▽ More
A linear extension of a poset $P$ is a permutation of the elements of the set that respects the partial order. Let $L(P)$ denote the number of linear extensions. It is a #P complete problem to determine $L(P)$ exactly for an arbitrary poset, and so randomized approximation algorithms that draw randomly from the set of linear extensions are used. In this work, the set of linear extensions is embedded in a larger state space with a continuous parameter ?. The introduction of a continuous parameter allows for the use of a more efficient method for approximating $L(P)$ called TPA. Our primary result is that it is possible to sample from this continuous embedding in time that as fast or faster than the best known methods for sampling uniformly from linear extensions. For a poset containing $n$ elements, this means we can approximate $L(P)$ to within a factor of $1 + ε$ with probability at least $1 - δ$ using an expected number of random bits and comparisons in the poset which is at most $O(n^3(ln n)(ln L(P))ε^{-2}\ln δ^{-1}).$
△ Less
Submitted 30 June, 2017; v1 submitted 24 October, 2010;
originally announced October 2010.
-
Upper bounds in the Ohtsuki-Riley-Sakuma partial order on 2-bridge knots
Authors:
Scott M. Garrabrant,
Jim Hoste,
Patrick D. Shanahan
Abstract:
In this paper we use continued fractions to study a partial order on the set of 2-bridge knots derived from the work of Ohtsuki, Riley, and Sakuma. We establish necessary and sufficient conditions for any set of 2-bridge knots to have an upper bound with respect to the partial order. Moreover, given any 2-bridge knot K we characterize all other 2-bridge knots J such that {K, J} has an upper bound.…
▽ More
In this paper we use continued fractions to study a partial order on the set of 2-bridge knots derived from the work of Ohtsuki, Riley, and Sakuma. We establish necessary and sufficient conditions for any set of 2-bridge knots to have an upper bound with respect to the partial order. Moreover, given any 2-bridge knot K we characterize all other 2-bridge knots J such that {K, J} has an upper bound. As an application we answer a question of Suzuki, showing that there is no upper bound for the set consisting of the trefoil and figure-eight knots.
△ Less
Submitted 20 January, 2011; v1 submitted 19 July, 2010;
originally announced July 2010.