Search | arXiv e-print repository

Probability trees and the value of a single intervention

Abstract: The most fundamental problem in statistical causality is determining causal relationships from limited data. Probability trees, which combine prior causal structures with Bayesian updates, have been suggested as a possible solution. In this work, we quantify the information gain from a single intervention and show that both the anticipated information gain, prior to making an intervention, and the… ▽ More The most fundamental problem in statistical causality is determining causal relationships from limited data. Probability trees, which combine prior causal structures with Bayesian updates, have been suggested as a possible solution. In this work, we quantify the information gain from a single intervention and show that both the anticipated information gain, prior to making an intervention, and the expected gain from an intervention have simple expressions. This results in an active-learning method that simply selects the intervention with the highest anticipated gain, which we illustrate through several examples. Our work demonstrates how probability trees, and Bayesian estimation of their parameters, offer a simple yet viable approach to fast causal induction. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: As presented at the proceedings of the AAAI Workshop on Information Theoretic Causal Inference and Discovery (ITCI'22), 2022

MSC Class: 68T37 ACM Class: I.5.1

arXiv:2205.08178 [pdf, other]

Active learning of causal probability trees

Authors: Tue Herlau

Abstract: The past two decades have seen a growing interest in combining causal information, commonly represented using causal graphs, with machine learning models. Probability trees provide a simple yet powerful alternative representation of causal information. They enable both computation of intervention and counterfactuals, and are strictly more general, since they allow context-dependent causal dependen… ▽ More The past two decades have seen a growing interest in combining causal information, commonly represented using causal graphs, with machine learning models. Probability trees provide a simple yet powerful alternative representation of causal information. They enable both computation of intervention and counterfactuals, and are strictly more general, since they allow context-dependent causal dependencies. Here we present a Bayesian method for learning probability trees from a combination of interventional and observational data. The method quantifies the expected information gain from an intervention, and selects the interventions with the largest gain. We demonstrate the efficiency of the method on simulated and real data. An effective method for learning probability trees on a limited interventional budget will greatly expand their applicability. △ Less

Submitted 17 May, 2022; originally announced May 2022.

MSC Class: 68T37 ACM Class: I.2.4

arXiv:1508.02905 [pdf, other]

doi 10.1016/j.procs.2022.03.105

Bayesian Dropout

Authors: Tue Herlau, Morten Mørup, Mikkel N. Schmidt

Abstract: Dropout has recently emerged as a powerful and simple method for training neural networks preventing co-adaptation by stochastically omitting neurons. Dropout is currently not grounded in explicit modelling assumptions which so far has precluded its adoption in Bayesian modelling. Using Bayesian entropic reasoning we show that dropout can be interpreted as optimal inference under constraints. We d… ▽ More Dropout has recently emerged as a powerful and simple method for training neural networks preventing co-adaptation by stochastically omitting neurons. Dropout is currently not grounded in explicit modelling assumptions which so far has precluded its adoption in Bayesian modelling. Using Bayesian entropic reasoning we show that dropout can be interpreted as optimal inference under constraints. We demonstrate this on an analytically tractable regression model providing a Bayesian interpretation of its mechanism for regularizing and preventing co-adaptation as well as its connection to other Bayesian techniques. We also discuss two general approximate techniques for applying Bayesian dropout for general models, one based on an analytical approximation and the other on stochastic variational techniques. These techniques are then applied to a Baysian logistic regression problem and are shown to improve performance as the model become more misspecified. Our framework roots dropout as a theoretically justified and practical tool for statistical modelling allowing Bayesians to tap into the benefits of dropout training. △ Less

Submitted 17 May, 2022; v1 submitted 12 August, 2015; originally announced August 2015.

Comments: 21 pages, 3 figures. Manuscript prepared 2014 and awaiting submission

MSC Class: 68T37 ACM Class: I.5.1

Journal ref: Procedia Computer Science 201 (2022) 771-776

arXiv:1507.02925 [pdf, other]

Completely random measures for modelling block-structured networks

Authors: Tue Herlau, Mikkel N. Schmidt, Morten Mørup

Abstract: Many statistical methods for network data parameterize the edge-probability by attributing latent traits to the vertices such as block structure and assume exchangeability in the sense of the Aldous-Hoover representation theorem. Empirical studies of networks indicate that many real-world networks have a power-law distribution of the vertices which in turn implies the number of edges scale slower… ▽ More Many statistical methods for network data parameterize the edge-probability by attributing latent traits to the vertices such as block structure and assume exchangeability in the sense of the Aldous-Hoover representation theorem. Empirical studies of networks indicate that many real-world networks have a power-law distribution of the vertices which in turn implies the number of edges scale slower than quadratically in the number of vertices. These assumptions are fundamentally irreconcilable as the Aldous-Hoover theorem implies quadratic scaling of the number of edges. Recently Caron and Fox (2014) proposed the use of a different notion of exchangeability due to Kallenberg (2009) and obtained a network model which admits power-law behaviour while retaining desirable statistical properties, however this model does not capture latent vertex traits such as block-structure. In this work we re-introduce the use of block-structure for network models obeying Kallenberg's notion of exchangeability and thereby obtain a model which admits the inference of block-structure and edge inhomogeneity. We derive a simple expression for the likelihood and an efficient sampling method. The obtained model is not significantly more difficult to implement than existing approaches to block-modelling and performs well on real network datasets. △ Less

Submitted 4 December, 2015; v1 submitted 10 July, 2015; originally announced July 2015.

arXiv:1411.7864 [pdf, other]

Efficient inference of overlapping communities in complex networks

Authors: Bjarne Ørum Fruergaard, Tue Herlau

Abstract: We discuss two views on extending existing methods for complex network modeling which we dub the communities first and the networks first view, respectively. Inspired by the networks first view that we attribute to White, Boorman, and Breiger (1976)[1], we formulate the multiple-networks stochastic blockmodel (MNSBM), which seeks to separate the observed network into subnetworks of different types… ▽ More We discuss two views on extending existing methods for complex network modeling which we dub the communities first and the networks first view, respectively. Inspired by the networks first view that we attribute to White, Boorman, and Breiger (1976)[1], we formulate the multiple-networks stochastic blockmodel (MNSBM), which seeks to separate the observed network into subnetworks of different types and where the problem of inferring structure in each subnetwork becomes easier. We show how this model is specified in a generative Bayesian framework where parameters can be inferred efficiently using Gibbs sampling. The result is an effective multiple-membership model without the drawbacks of introducing complex definitions of "groups" and how they interact. We demonstrate results on the recovery of planted structure in synthetic networks and show very encouraging results on link prediction performances using multiple-networks models on a number of real-world network data sets. △ Less

Submitted 28 November, 2014; originally announced November 2014.

arXiv:1406.0071 [pdf, other]

Adaptive Reconfiguration Moves for Dirichlet Mixtures

Authors: Tue Herlau, Morten Mørup, Yee Whye Teh, Mikkel N. Schmidt

Abstract: Bayesian mixture models are widely applied for unsupervised learning and exploratory data analysis. Markov chain Monte Carlo based on Gibbs sampling and split-merge moves are widely used for inference in these models. However, both methods are restricted to limited types of transitions and suffer from torpid mixing and low accept rates even for problems of modest size. We propose a method that con… ▽ More Bayesian mixture models are widely applied for unsupervised learning and exploratory data analysis. Markov chain Monte Carlo based on Gibbs sampling and split-merge moves are widely used for inference in these models. However, both methods are restricted to limited types of transitions and suffer from torpid mixing and low accept rates even for problems of modest size. We propose a method that considers a broader range of transitions that are close to equilibrium by exploiting multiple chains in parallel and using the past states adaptively to inform the proposal distribution. The method significantly improves on Gibbs and split-merge sampling as quantified using convergence diagnostics and acceptance rates. Adaptive MCMC methods which use past states to inform the proposal distribution has given rise to many ingenious sampling schemes for continuous problems and the present work can be seen as an important first step in bringing these benefits to partition-based problems △ Less

Submitted 31 May, 2014; originally announced June 2014.

Comments: Manuscript is in preparation, 26 pages including figures

arXiv:1311.2520 [pdf, other]

doi 10.1103/PhysRevE.90.032819

The Infinite Degree Corrected Stochastic Block Model

Authors: Tue Herlau, Mikkel N. Schmidt, Morten Mørup

Abstract: In Stochastic blockmodels, which are among the most prominent statistical models for cluster analysis of complex networks, clusters are defined as groups of nodes with statistically similar link probabilities within and between groups. A recent extension by Karrer and Newman incorporates a node degree correction to model degree heterogeneity within each group. Although this demonstrably leads to b… ▽ More In Stochastic blockmodels, which are among the most prominent statistical models for cluster analysis of complex networks, clusters are defined as groups of nodes with statistically similar link probabilities within and between groups. A recent extension by Karrer and Newman incorporates a node degree correction to model degree heterogeneity within each group. Although this demonstrably leads to better performance on several networks it is not obvious whether modelling node degree is always appropriate or necessary. We formulate the degree corrected stochastic blockmodel as a non-parametric Bayesian model, incorporating a parameter to control the amount of degree correction which can then be inferred from data. Additionally, our formulation yields principled ways of inferring the number of groups as well as predicting missing links in the network which can be used to quantify the model's predictive performance. On synthetic data we demonstrate that including the degree correction yields better performance both on recovering the true group structure and predicting missing links when degree heterogeneity is present, whereas performance is on par for data with no degree heterogeneity within clusters. On seven real networks (with no ground truth group structure available) we show that predictive performance is about equal whether or not degree correction is included; however, for some networks significantly fewer clusters are discovered when correcting for degree indicating that the data can be more compactly explained by clusters of heterogenous degree nodes. △ Less

Submitted 30 May, 2014; v1 submitted 11 November, 2013; originally announced November 2013.

Comments: Originally presented at the Complex Networks workshop NIPS 2013

Journal ref: Phys. Rev. E 90, 032819 (2014)

arXiv:1311.1033 [pdf, other]

Nonparametric Bayesian models of hierarchical structure in complex networks

Authors: Mikkel N. Schmidt, Tue Herlau, Morten Mørup

Abstract: Analyzing and understanding the structure of complex relational data is important in many applications including analysis of the connectivity in the human brain. Such networks can have prominent patterns on different scales, calling for a hierarchically structured model. We propose two non-parametric Bayesian hierarchical network models based on Gibbs fragmentation tree priors, and demonstrate the… ▽ More Analyzing and understanding the structure of complex relational data is important in many applications including analysis of the connectivity in the human brain. Such networks can have prominent patterns on different scales, calling for a hierarchically structured model. We propose two non-parametric Bayesian hierarchical network models based on Gibbs fragmentation tree priors, and demonstrate their ability to capture nested patterns in simulated networks. On real networks we demonstrate detection of hierarchical structure and show predictive performance on par with the state of the art. We envision that our methods can be employed in exploratory analysis of large scale complex networks for example to model human brain connectivity. △ Less

Submitted 21 November, 2013; v1 submitted 5 November, 2013; originally announced November 2013.

Showing 1–8 of 8 results for author: Herlau, T