-
Learning non-Gaussian graphical models via Hessian scores and triangular transport
Authors:
Ricardo Baptista,
Youssef Marzouk,
Rebecca E. Morrison,
Olivier Zahm
Abstract:
Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families o…
▽ More
Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions, most existing methods fail to consistently recover the graph structure for non-Gaussian data. Here we propose an algorithm for learning the Markov structure of continuous and non-Gaussian distributions. To characterize conditional independence, we introduce a score based on integrated Hessian information from the joint log-density, and we prove that this score upper bounds the conditional mutual information for a general class of distributions. To compute the score, our algorithm SING estimates the density using a deterministic coupling, induced by a triangular transport map, and iteratively exploits sparse structure in the map to reveal sparsity in the graph. For certain non-Gaussian datasets, we show that our algorithm recovers the graph structure even with a biased approximation to the density. Among other examples, we apply SING to learn the dependencies between the states of a chaotic dynamical system with local interactions.
△ Less
Submitted 25 February, 2023; v1 submitted 8 January, 2021;
originally announced January 2021.
-
Embedded model discrepancy: A case study of Zika modeling
Authors:
Rebecca E. Morrison,
Americo Cunha Jr
Abstract:
Mathematical models of epidemiological systems enable investigation of and predictions about potential disease outbreaks. However, commonly used models are often highly simplified representations of incredibly complex systems. Because of these simplifications, the model output, of say new cases of a disease over time, or when an epidemic will occur, may be inconsistent with available data. In this…
▽ More
Mathematical models of epidemiological systems enable investigation of and predictions about potential disease outbreaks. However, commonly used models are often highly simplified representations of incredibly complex systems. Because of these simplifications, the model output, of say new cases of a disease over time, or when an epidemic will occur, may be inconsistent with available data. In this case, we must improve the model, especially if we plan to make decisions based on it that could affect human health and safety, but direct improvements are often beyond our reach. In this work, we explore this problem through a case study of the Zika outbreak in Brazil in 2016. We propose an embedded discrepancy operator---a modification to the model equations that requires modest information about the system and is calibrated by all relevant data. We show that the new enriched model demonstrates greatly increased consistency with real data. Moreover, the method is general enough to easily apply to many other mathematical models in epidemiology.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Embedded discrepancy operators in reduced models of interacting species
Authors:
Rebecca E Morrison
Abstract:
In many applications of interacting systems, we are only interested in the dynamic behavior of a subset of all possible active species. For example, this is true in combustion models (many transient chemical species are not of interest in a given reaction) and in epidemiological models (only certain critical populations are truly consequential). Thus it is common to use greatly reduced models, in…
▽ More
In many applications of interacting systems, we are only interested in the dynamic behavior of a subset of all possible active species. For example, this is true in combustion models (many transient chemical species are not of interest in a given reaction) and in epidemiological models (only certain critical populations are truly consequential). Thus it is common to use greatly reduced models, in which only the interactions among the species of interest are retained. However, reduction introduces a model error, or discrepancy, which typically is not well characterized. In this work, we explore the use of an embedded and statistically calibrated discrepancy operator to represent model error. The operator is embedded within the differential equations of the model, which allows the action of the operator to be interpretable. Moreover, it is constrained by available physical information, and calibrated over many scenarios. These qualities of the discrepancy model---interpretability, physical-consistency, and robustness to different scenarios---are intended to support reliable predictions under extrapolative conditions.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
Authors:
Rebecca E. Morrison,
Ricardo Baptista,
Youssef Marzouk
Abstract:
We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new a…
▽ More
We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.
△ Less
Submitted 6 November, 2017; v1 submitted 2 November, 2017;
originally announced November 2017.
-
Representing model inadequacy: A stochastic operator approach
Authors:
Rebecca E Morrison,
Todd A Oliver,
Robert D Moser
Abstract:
Mathematical models of physical systems are subject to many uncertainties such as measurement errors and uncertain initial and boundary conditions. After accounting for these uncertainties, it is often revealed that discrepancies between the model output and the observations remain; if so, the model is said to be inadequate. In practice, the inadequate model may be the best that is available or tr…
▽ More
Mathematical models of physical systems are subject to many uncertainties such as measurement errors and uncertain initial and boundary conditions. After accounting for these uncertainties, it is often revealed that discrepancies between the model output and the observations remain; if so, the model is said to be inadequate. In practice, the inadequate model may be the best that is available or tractable, and so despite its inadequacy the model may be used to make predictions of unobserved quantities. In this case, a representation of the inadequacy is necessary, so the impact of the observed discrepancy can be determined. We investigate this problem in the context of chemical kinetics and propose a new technique to account for model inadequacy that is both probabilistic and physically meaningful. A stochastic inadequacy operator $\mathcal{S}$ is introduced which is embedded in the ODEs describing the evolution of chemical species concentrations and which respects certain physical constraints such as conservation laws. The parameters of $\mathcal{S}$ are governed by probability distributions, which in turn are characterized by a set of hyperparameters. The model parameters and hyperparameters are calibrated using high-dimensional hierarchical Bayesian inference. We apply the method to a typical problem in chemical kinetics---the reaction mechanism of hydrogen combustion.
△ Less
Submitted 22 May, 2018; v1 submitted 6 April, 2016;
originally announced April 2016.