-
Continuous Mixtures of Tractable Probabilistic Models
Authors:
Alvaro H. C. Correia,
Gennaro Gala,
Erik Quaeghebeur,
Cassio de Campos,
Robert Peharz
Abstract:
Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented…
▽ More
Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.
△ Less
Submitted 24 March, 2023; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Hypothesis tests for multiple responses regression: effect of probiotics on addiction and binge eating disorder
Authors:
Lineu Alberto Cavazani de Freitas,
Ligia de Oliveira Carlos,
Antônio Carlos Ligocki Campos,
Wagner Hugo Bonat
Abstract:
Clinical trials are common in medical research where multiple non-Gaussian responses and time-dependent observations are frequent. The analysis of data from these studies requires statistical modeling techniques that take these characteristics into account. We propose a general strategy based on the Wald statistics to perform hypothesis tests like ANOVAs, MANOVAs and multiple comparison tests on r…
▽ More
Clinical trials are common in medical research where multiple non-Gaussian responses and time-dependent observations are frequent. The analysis of data from these studies requires statistical modeling techniques that take these characteristics into account. We propose a general strategy based on the Wald statistics to perform hypothesis tests like ANOVAs, MANOVAs and multiple comparison tests on regression and dispersion parameters of multivariate covariance generalized linear models (McGLMs). McGLMs provide a general statistical modeling framework for normal and non-normal multivariate data analysis along with a wide range of correlation structures. We design different simulation scenarios to verify the properties of the proposed tests. The results are promising showing that the proposed tests present the levels of confidence close to the specified one for all simulation study scenarios. Complementary to the proposal, we developed implementations in the R language to carry out the tests presented, the codes are available in the supplementary material. The proposal is motivated by the analysis of a clinical trial that aims to evaluate the effect of the use of probiotics in the control of addiction and binge eating disorder in patients undergoing bariatric surgery. The subjects were separated into two groups (placebo and treatment) and evaluated at three different times. The results indicate that addiction and binge eating disorder reduce over time, but there is no difference between groups at each time point.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Bayesian Modelling of Multivalued Power Curves from an Operational Wind Farm
Authors:
L. A. Bull,
P. A. Gardner,
T. J. Rogers,
N. Dervilis,
E. J. Cross,
E. Papatheou,
A. E. Maguire,
C. Campos,
K. Worden
Abstract:
Power curves capture the relationship between wind speed and output power for a specific wind turbine. Accurate regression models of this function prove useful in monitoring, maintenance, design, and planning. In practice, however, the measurements do not always correspond to the ideal curve: power curtailments will appear as (additional) functional components. Such multivalued relationships canno…
▽ More
Power curves capture the relationship between wind speed and output power for a specific wind turbine. Accurate regression models of this function prove useful in monitoring, maintenance, design, and planning. In practice, however, the measurements do not always correspond to the ideal curve: power curtailments will appear as (additional) functional components. Such multivalued relationships cannot be modelled by conventional regression, and the associated data are usually removed during pre-processing. The current work suggests an alternative method to infer multivalued relationships in curtailed power data. Using a population-based approach, an overlapping mixture of probabilistic regression models is applied to signals recorded from turbines within an operational wind farm. The model is shown to provide an accurate representation of practical power data across the population.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Bayesian Kernelised Test of (In)dependence with Mixed-type Variables
Authors:
Alessio Benavoli,
Cassio de Campos
Abstract:
A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is…
▽ More
A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixed-type variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixed-type variables.
△ Less
Submitted 9 May, 2021;
originally announced May 2021.
-
Towards Robust Classification with Deep Generative Forests
Authors:
Alvaro H. C. Correia,
Robert Peharz,
Cassio de Campos
Abstract:
Decision Trees and Random Forests are among the most widely used machine learning models, and often achieve state-of-the-art performance in tabular, domain-agnostic datasets. Nonetheless, being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions. In this paper, we exploit Generative Forests (GeFs), a recent class of deep probabilistic models th…
▽ More
Decision Trees and Random Forests are among the most widely used machine learning models, and often achieve state-of-the-art performance in tabular, domain-agnostic datasets. Nonetheless, being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions. In this paper, we exploit Generative Forests (GeFs), a recent class of deep probabilistic models that addresses these issues by extending Random Forests to generative models representing the full joint distribution over the feature space. We demonstrate that GeFs are uncertainty-aware classifiers, capable of measuring the robustness of each prediction as well as detecting out-of-distribution samples.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Joints in Random Forests
Authors:
Alvaro H. C. Correia,
Robert Peharz,
Cassio de Campos
Abstract:
Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative mo…
▽ More
Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features.
△ Less
Submitted 19 November, 2020; v1 submitted 25 June, 2020;
originally announced June 2020.
-
On Pruning for Score-Based Bayesian Network Structure Learning
Authors:
Alvaro H. C. Correia,
James Cussens,
Cassio de Campos
Abstract:
Many algorithms for score-based Bayesian network structure learning (BNSL), in particular exact ones, take as input a collection of potentially optimal parent sets for each variable in the data. Constructing such collections naively is computationally intensive since the number of parent sets grows exponentially with the number of variables. Thus, pruning techniques are not only desirable but esse…
▽ More
Many algorithms for score-based Bayesian network structure learning (BNSL), in particular exact ones, take as input a collection of potentially optimal parent sets for each variable in the data. Constructing such collections naively is computationally intensive since the number of parent sets grows exponentially with the number of variables. Thus, pruning techniques are not only desirable but essential. While good pruning rules exist for the Bayesian Information Criterion (BIC), current results for the Bayesian Dirichlet equivalent uniform (BDeu) score reduce the search space very modestly, hampering the use of the (often preferred) BDeu. We derive new non-trivial theoretical upper bounds for the BDeu score that considerably improve on the state-of-the-art. Since the new bounds are mathematically proven to be tighter than previous ones and at little extra computational cost, they are a promising addition to BNSL methods.
△ Less
Submitted 2 August, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Entropy-based Pruning for Learning Bayesian Networks using BIC
Authors:
Cassio P. de Campos,
Mauro Scanagatta,
Giorgio Corani,
Marco Zaffalon
Abstract:
For decomposable score-based structure learning of Bayesian networks, existing approaches first compute a collection of candidate parent sets for each variable and then optimize over this collection by choosing one parent set for each variable without creating directed cycles while maximizing the total score. We target the task of constructing the collection of candidate parent sets when the score…
▽ More
For decomposable score-based structure learning of Bayesian networks, existing approaches first compute a collection of candidate parent sets for each variable and then optimize over this collection by choosing one parent set for each variable without creating directed cycles while maximizing the total score. We target the task of constructing the collection of candidate parent sets when the score of choice is the Bayesian Information Criterion (BIC). We provide new non-trivial results that can be used to prune the search space of candidate parent sets of each node. We analyze how these new results relate to previous ideas in the literature both theoretically and empirically. We show in experiments with UCI data sets that gains can be significant. Since the new pruning rules are easy to implement and have low computational costs, they can be promptly integrated into all state-of-the-art methods for structure learning of Bayesian networks.
△ Less
Submitted 19 July, 2017;
originally announced July 2017.
-
Learning Bayesian Networks with Incomplete Data by Augmentation
Authors:
Tameem Adel,
Cassio P. de Campos
Abstract:
We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not…
▽ More
We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. To the best of our knowledge, this is the first exact algorithm for this problem. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the benefits of learning Bayesian networks with such new approach.
△ Less
Submitted 8 October, 2016; v1 submitted 27 August, 2016;
originally announced August 2016.
-
Advances in Learning Bayesian Networks of Bounded Treewidth
Authors:
Siqi Nie,
Denis Deratani Maua,
Cassio Polpo de Campos,
Qiang Ji
Abstract:
This work presents novel algorithms for learning Bayesian network structures with bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed-integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in uniformly sampling $k$-trees (maximal graphs of treewidth $k$), and subsequently selecting,…
▽ More
This work presents novel algorithms for learning Bayesian network structures with bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed-integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in uniformly sampling $k$-trees (maximal graphs of treewidth $k$), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that $k$-tree. Some properties of these methods are discussed and proven. The approaches are empirically compared to each other and to a state-of-the-art method for learning bounded treewidth structures on a collection of public data sets with up to 100 variables. The experiments show that our exact algorithm outperforms the state of the art, and that the approximate approach is fairly accurate.
△ Less
Submitted 6 June, 2014; v1 submitted 5 June, 2014;
originally announced June 2014.
-
Confidence Statements for Ordering Quantiles
Authors:
Carlos A. de B. Pereira,
Cassio P. de Campos,
Adriano Polpo
Abstract:
This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using…
▽ More
This work proposes Quor, a simple yet effective nonparametric method to compare independent samples with respect to corresponding quantiles of their populations. The method is solely based on the order statistics of the samples, and independence is its only requirement. All computations are performed using exact distributions with no need for any asymptotic considerations, and yet can be run using a fast quadratic-time dynamic programming idea. Computational performance is essential in high-dimensional domains, such as gene expression data. We describe the approach and discuss on the most important assumptions, building a parallel with assumptions and properties of widely used techniques for the same problem. Experiments using real data from biomedical studies are performed to empirically compare Quor and other methods in a classification task over a selection of high-dimensional data sets.
△ Less
Submitted 17 July, 2014; v1 submitted 21 December, 2012;
originally announced December 2012.
-
Inference in Polytrees with Sets of Probabilities
Authors:
Jose Carlos Ferreira da Rocha,
Fabio Gagliardi Cozman,
Cassio Polpo de Campos
Abstract:
Inferences in directed acyclic graphs associated with probability sets and probability intervals are NP-hard, even for polytrees. In this paper we focus on such inferences, and propose: 1) a substantial improvement on Tessems A / R algorithm FOR polytrees WITH probability intervals; 2) a new algorithm FOR direction - based local search(IN sets OF probability) that improves ON e…
▽ More
Inferences in directed acyclic graphs associated with probability sets and probability intervals are NP-hard, even for polytrees. In this paper we focus on such inferences, and propose: 1) a substantial improvement on Tessems A / R algorithm FOR polytrees WITH probability intervals; 2) a new algorithm FOR direction - based local search(IN sets OF probability) that improves ON existing methods; 3) a collection OF branch - AND - bound algorithms that combine the previous techniques.The first two techniques lead TO approximate solutions, WHILE branch - AND - bound procedures can produce either exact OR approximate solutions.We report ON dramatic improvements ON existing techniques FOR inference WITH probability sets AND intervals, IN SOME cases reducing the computational effort BY many orders OF magnitude.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Belief Updating and Learning in Semi-Qualitative Probabilistic Networks
Authors:
Cassio Polpo de Campos,
Fabio Gagliardi Cozman
Abstract:
This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum…
▽ More
This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Anytime Marginal MAP Inference
Authors:
Denis Maua,
Cassio De Campos
Abstract:
This paper presents a new anytime algorithm for the marginal MAP problem in graphical models. The algorithm is described in detail, its complexity and convergence rate are studied, and relations to previous theoretical results for the problem are discussed. It is shown that the algorithm runs in polynomial-time if the underlying graph of the model has bounded tree-width, and that it provides guara…
▽ More
This paper presents a new anytime algorithm for the marginal MAP problem in graphical models. The algorithm is described in detail, its complexity and convergence rate are studied, and relations to previous theoretical results for the problem are discussed. It is shown that the algorithm runs in polynomial-time if the underlying graph of the model has bounded tree-width, and that it provides guarantees to the lower and upper bounds obtained within a fixed amount of computational resources. Experiments with both real and synthetic generated models highlight its main characteristics and show that it compares favorably against Park and Darwiche's systematic search, particularly in the case of problems with many MAP variables and moderate tree-width.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Improving parameter learning of Bayesian nets from incomplete data
Authors:
Giorgio Corani,
Cassio P. De Campos
Abstract:
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe…
▽ More
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
△ Less
Submitted 12 October, 2011;
originally announced October 2011.
-
Solving Limited Memory Influence Diagrams
Authors:
Denis Deratani Mauá,
Cassio Polpo de Campos,
Marco Zaffalon
Abstract:
We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables an…
▽ More
We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and $10^{64}$ solutions. We show that the problem is NP-hard even if the underlying graph structure of the problem has small treewidth and the variables take on a bounded number of states, but that a fully polynomial time approximation scheme exists for these cases. Moreover, we show that the bound on the number of states is a necessary condition for any efficient approximation scheme.
△ Less
Submitted 9 September, 2011; v1 submitted 8 September, 2011;
originally announced September 2011.
-
New Results for the MAP Problem in Bayesian Networks
Authors:
Cassio P. de Campos
Abstract:
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes stru…
▽ More
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.
△ Less
Submitted 29 July, 2010; v1 submitted 22 July, 2010;
originally announced July 2010.