-
Stochastic Mathematical Systems
Authors:
David H. Wolpert,
David B. Kinney
Abstract:
We introduce a framework that can be used to model both mathematics and human reasoning about mathematics. This framework involves {stochastic mathematical systems} (SMSs), which are stochastic processes that generate pairs of questions and associated answers (with no explicit referents). We use the SMS framework to define normative conditions for mathematical reasoning, by defining a ``calibratio…
▽ More
We introduce a framework that can be used to model both mathematics and human reasoning about mathematics. This framework involves {stochastic mathematical systems} (SMSs), which are stochastic processes that generate pairs of questions and associated answers (with no explicit referents). We use the SMS framework to define normative conditions for mathematical reasoning, by defining a ``calibration'' relation between a pair of SMSs. The first SMS is the human reasoner, and the second is an ``oracle'' SMS that can be interpreted as deciding whether the question-answer pairs of the reasoner SMS are valid. To ground thinking, we understand the answers to questions given by this oracle to be the answers that would be given by an SMS representing the entire mathematical community in the infinite long run of the process of asking and answering questions. We then introduce a slight extension of SMSs to allow us to model both the physical universe and human reasoning about the physical universe. We then define a slightly different calibration relation appropriate for the case of scientific reasoning. In this case the first SMS represents a human scientist predicting the outcome of future experiments, while the second SMS represents the physical universe in which the scientist is embedded, with the question-answer pairs of that SMS being specifications of the experiments that will occur and the outcome of those experiments, respectively. Next we derive conditions justifying two important patterns of inference in both mathematical and scientific reasoning: i) the practice of increasing one's degree of belief in a claim as one observes increasingly many lines of evidence for that claim, and ii) abduction, the practice of inferring a claim's probability of being correct from its explanatory power with respect to some other claim that is already taken to hold for independent reasons.
△ Less
Submitted 14 March, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes
Authors:
David H. Wolpert,
David Kinney
Abstract:
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally stochastic process. That is, on our model, whether or not a given formula is deemed a theorem in some axiomatic system is not a matter of certainty, but is instead governed by a probability distribution. We then show that this framework gives a compelling account of several aspects of mathe…
▽ More
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally stochastic process. That is, on our model, whether or not a given formula is deemed a theorem in some axiomatic system is not a matter of certainty, but is instead governed by a probability distribution. We then show that this framework gives a compelling account of several aspects of mathematical practice. These include: 1) the way in which mathematicians generate research programs, 2) the applicability of Bayesian models of mathematical heuristics, 3) the role of abductive reasoning in mathematics, 4) the way in which multiple proofs of a proposition can strengthen our degree of belief in that proposition, and 5) the nature of the hypothesis that there are multiple formal systems that are isomorphic to physically possible universes. Thus, by embracing a model of mathematics as not perfectly predictable, we generate a new and fruitful perspective on the epistemology and practice of mathematics.
△ Less
Submitted 28 October, 2020;
originally announced December 2020.
-
Estimating Functions of Distributions Defined over Spaces of Unknown Size
Authors:
David H. Wolpert,
Simon DeDeo
Abstract:
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size $m$ and the Dirichlet prior's concentration parameter $c$, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, $P(c, m)$, obeys a simple "Irrelevance of Unseen Variables" (IUV) desideratum iff…
▽ More
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size $m$ and the Dirichlet prior's concentration parameter $c$, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, $P(c, m)$, obeys a simple "Irrelevance of Unseen Variables" (IUV) desideratum iff $P(c, m) = P(c) P(m)$. Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly) used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations.
△ Less
Submitted 18 November, 2013;
originally announced November 2013.
-
What does Newcomb's paradox teach us?
Authors:
David H. Wolpert,
Gregory Benford
Abstract:
In Newcomb's paradox you choose to receive either the contents of a particular closed box, or the contents of both that closed box and another one. Before you choose, a prediction algorithm deduces your choice, and fills the two boxes based on that deduction. Newcomb's paradox is that game theory appears to provide two conflicting recommendations for what choice you should make in this scenario. W…
▽ More
In Newcomb's paradox you choose to receive either the contents of a particular closed box, or the contents of both that closed box and another one. Before you choose, a prediction algorithm deduces your choice, and fills the two boxes based on that deduction. Newcomb's paradox is that game theory appears to provide two conflicting recommendations for what choice you should make in this scenario. We analyze Newcomb's paradox using a recent extension of game theory in which the players set conditional probability distributions in a Bayes net. We show that the two game theory recommendations in Newcomb's scenario have different presumptions for what Bayes net relates your choice and the algorithm's prediction. We resolve the paradox by proving that these two Bayes nets are incompatible. We also show that the accuracy of the algorithm's prediction, the focus of much previous work, is irrelevant. In addition we show that Newcomb's scenario only provides a contradiction between game theory's expected utility and dominance principles if one is sloppy in specifying the underlying Bayes net. We also show that Newcomb's paradox is time-reversal invariant; both the paradox and its resolution are unchanged if the algorithm makes its `prediction' after you make your choice rather than before.
△ Less
Submitted 5 March, 2010;
originally announced March 2010.
-
A Predictive Theory of Games
Authors:
David H. Wolpert
Abstract:
Conventional noncooperative game theory hypothesizes that the joint strategy of a set of players in a game must satisfy an "equilibrium concept". All other joint strategies are considered impossible; the only issue is what equilibrium concept is "correct". This hypothesis violates the desiderata underlying probability theory. Indeed, probability theory renders moot the problem of what equilibriu…
▽ More
Conventional noncooperative game theory hypothesizes that the joint strategy of a set of players in a game must satisfy an "equilibrium concept". All other joint strategies are considered impossible; the only issue is what equilibrium concept is "correct". This hypothesis violates the desiderata underlying probability theory. Indeed, probability theory renders moot the problem of what equilibrium concept is correct - every joint strategy can arise with non-zero probability. Rather than a first-principles derivation of an equilibrium concept, game theory requires a first-principles derivation of a distribution over joint (mixed) strategies. This paper shows how information theory can provide such a distribution over joint strategies. If a scientist external to the game wants to distill such a distribution to a point prediction, that prediction should be set by decision theory, using their (!) loss function. So the predicted joint strategy - the "equilibrium concept" - varies with the external scientist's loss function. It is shown here that in many games, having a probability distribution with support restricted to Nash equilibria - as stipulated by conventional game theory - is impossible. It is also show how to: i) Derive an information-theoretic quantification of a player's degree of rationality; ii) Derive bounded rationality as a cost of computation; iii) Elaborate the close formal relationship between game theory and statistical physics; iv) Use this relationship to extend game theory to allow stochastically varying numbers of players.
△ Less
Submitted 7 December, 2005;
originally announced December 2005.
-
Metrics for more than two points at once
Authors:
David H. Wolpert
Abstract:
The conventional definition of a topological metric over a space specifies properties that must be obeyed by any measure of "how separated" two points in that space are. Here it is shown how to extend that definition, and in particular the triangle inequality, to concern arbitrary numbers of points. Such a measure of how separated the points within a collection are can be bootstrapped, to measur…
▽ More
The conventional definition of a topological metric over a space specifies properties that must be obeyed by any measure of "how separated" two points in that space are. Here it is shown how to extend that definition, and in particular the triangle inequality, to concern arbitrary numbers of points. Such a measure of how separated the points within a collection are can be bootstrapped, to measure "how separated" from each other are two (or more) collections. The measure presented here also allows fractional membership of an element in a collection. This means it directly concerns measures of ``how spread out" a probability distribution over a space is. When such a measure is bootstrapped to compare two collections, it allows us to measure how separated two probability distributions are, or more generally, how separated a distribution of distributions is.
△ Less
Submitted 15 April, 2004;
originally announced April 2004.
-
On the computational capabilities of physical systems part II: relationship with conventional computer science
Authors:
David H. Wolpert
Abstract:
In the first of this pair of papers, it was proven that that no physical computer can correctly carry out all computational tasks that can be posed to it. The generality of this result follows from its use of a novel definition of computation, ``physical computation''. This second paper of the pair elaborates the mathematical structure and impossibility results associated with physical computati…
▽ More
In the first of this pair of papers, it was proven that that no physical computer can correctly carry out all computational tasks that can be posed to it. The generality of this result follows from its use of a novel definition of computation, ``physical computation''. This second paper of the pair elaborates the mathematical structure and impossibility results associated with physical computation. Analogues of Chomsky hierarcy results concerning universal Turing Machines and the Halting theorem are derived, as are results concerning the (im)possibility of certain kinds of error-correcting codes. In addition, an analogue of algorithmic information complexity, ``prediction complexity'', is elaborated. A task-independent bound is derived on how much the prediction complexity of a computational task can differ for two different universal physical computers used to solve that task, a bound similar to the ``encoding'' bound governing how much the algorithm information complexity of a Turing machine calculation can differ for two universal Turing machines. Finally, it is proven that either the Hamiltonian of our universe proscribes a certain type of computation, or prediction complexity is unique (unlike algorithmic information complexity).
△ Less
Submitted 22 May, 2000;
originally announced May 2000.