-
Cooperative Sheaf Neural Networks
Authors:
André Ribeiro,
Ana Luiza Tenório,
Juan Belieni,
Amauri H. Souza,
Diego Mesquita
Abstract:
Sheaf diffusion has recently emerged as a promising design pattern for graph representation learning due to its inherent ability to handle heterophilic data and avoid oversmoothing. Meanwhile, cooperative message passing has also been proposed as a way to enhance the flexibility of information diffusion by allowing nodes to independently choose whether to propagate/gather information from/to neigh…
▽ More
Sheaf diffusion has recently emerged as a promising design pattern for graph representation learning due to its inherent ability to handle heterophilic data and avoid oversmoothing. Meanwhile, cooperative message passing has also been proposed as a way to enhance the flexibility of information diffusion by allowing nodes to independently choose whether to propagate/gather information from/to neighbors. A natural question ensues: is sheaf diffusion capable of exhibiting this cooperative behavior? Here, we provide a negative answer to this question. In particular, we show that existing sheaf diffusion methods fail to achieve cooperative behavior due to the lack of message directionality. To circumvent this limitation, we introduce the notion of cellular sheaves over directed graphs and characterize their in- and out-degree Laplacians. We leverage our construction to propose Cooperative Sheaf Neural Networks (CSNNs). Theoretically, we characterize the receptive field of CSNN and show it allows nodes to selectively attend (listen) to arbitrarily far nodes while ignoring all others in their path, potentially mitigating oversquashing. Our experiments show that CSNN presents overall better performance compared to prior art on sheaf diffusion as well as cooperative graph neural networks.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Differentially Private Selection using Smooth Sensitivity
Authors:
Iago Chaves,
Victor Farias,
Amanda Perez,
Diego Mesquita,
Javam Machado
Abstract:
Differentially private selection mechanisms offer strong privacy guarantees for queries aiming to identify the top-scoring element r from a finite set R, based on a dataset-dependent utility function. While selection queries are fundamental in data science, few mechanisms effectively ensure their privacy. Furthermore, most approaches rely on global sensitivity to achieve differential privacy (DP),…
▽ More
Differentially private selection mechanisms offer strong privacy guarantees for queries aiming to identify the top-scoring element r from a finite set R, based on a dataset-dependent utility function. While selection queries are fundamental in data science, few mechanisms effectively ensure their privacy. Furthermore, most approaches rely on global sensitivity to achieve differential privacy (DP), which can introduce excessive noise and impair downstream inferences. To address this limitation, we propose the Smooth Noisy Max (SNM) mechanism, which leverages smooth sensitivity to yield provably tighter (upper bounds on) expected errors compared to global sensitivity-based methods. Empirical results demonstrate that SNM is more accurate than state-of-the-art differentially private selection methods in three applications: percentile selection, greedy decision trees, and random forests.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Streaming Bayes GFlowNets
Authors:
Tiago da Silva,
Daniel Augusto de Souza,
Diego Mesquita
Abstract:
Bayes' rule naturally allows for inference refinement in a streaming fashion, without the need to recompute posteriors from scratch whenever new data arrives. In principle, Bayesian streaming is straightforward: we update our prior with the available data and use the resulting posterior as a prior when processing the next data chunk. In practice, however, this recipe entails i) approximating an in…
▽ More
Bayes' rule naturally allows for inference refinement in a streaming fashion, without the need to recompute posteriors from scratch whenever new data arrives. In principle, Bayesian streaming is straightforward: we update our prior with the available data and use the resulting posterior as a prior when processing the next data chunk. In practice, however, this recipe entails i) approximating an intractable posterior at each time step; and ii) encapsulating results appropriately to allow for posterior propagation. For continuous state spaces, variational inference (VI) is particularly convenient due to its scalability and the tractability of variational posteriors. For discrete state spaces, however, state-of-the-art VI results in analytically intractable approximations that are ill-suited for streaming settings. To enable streaming Bayesian inference over discrete parameter spaces, we propose streaming Bayes GFlowNets (abbreviated as SB-GFlowNets) by leveraging the recently proposed GFlowNets -- a powerful class of amortized samplers for discrete compositional objects. Notably, SB-GFlowNet approximates the initial posterior using a standard GFlowNet and subsequently updates it using a tailored procedure that requires only the newly observed data. Our case studies in linear preference learning and phylogenetic inference showcase the effectiveness of SB-GFlowNets in sampling from an unnormalized posterior in a streaming setting. As expected, we also observe that SB-GFlowNets is significantly faster than repeatedly training a GFlowNet from scratch to sample from the full posterior.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
On Divergence Measures for Training GFlowNets
Authors:
Tiago da Silva,
Eliezer de Souza da Silva,
Diego Mesquita
Abstract:
Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects, with applications in generative modeling for tasks in fields such as causal discovery, NLP, and drug discovery. Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) an…
▽ More
Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects, with applications in generative modeling for tasks in fields such as causal discovery, NLP, and drug discovery. Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) and a target (backward policy) distribution, which enforces certain flow-matching conditions. While this training procedure is closely related to variational inference (VI), directly attempting standard Kullback-Leibler (KL) divergence minimization can lead to proven biased and potentially high-variance estimators. Therefore, we first review four divergence measures, namely, Renyi-$α$'s, Tsallis-$α$'s, reverse and forward KL's, and design statistically efficient estimators for their stochastic gradients in the context of training GFlowNets. Then, we verify that properly minimizing these divergences yields a provably correct and empirically effective training scheme, often leading to significantly faster convergence than previously proposed optimization. To achieve this, we design control variates based on the REINFORCE leave-one-out and score-matching estimators to reduce the variance of the learning objectives' gradients. Our work contributes by narrowing the gap between GFlowNets training and generalized variational approximations, paving the way for algorithmic ideas informed by the divergence minimization viewpoint.
△ Less
Submitted 21 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Embarrassingly Parallel GFlowNets
Authors:
Tiago da Silva,
Luiz Max Carvalho,
Amauri Souza,
Samuel Kaski,
Diego Mesquita
Abstract:
GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standar…
▽ More
GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form $R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ -- e.g., in parallel or federated Bayes, where each $R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each $R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed \emph{aggregating balance} condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the EP-GFlowNets's effectiveness on many tasks, including parallel Bayesian phylogenetics, multi-objective multiset, sequence generation, and federated Bayesian structure learning.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
In-n-Out: Calibrating Graph Neural Networks for Link Prediction
Authors:
Erik Nascimento,
Diego Mesquita,
Samuel Kaski,
Amauri H Souza
Abstract:
Deep neural networks are notoriously miscalibrated, i.e., their outputs do not reflect the true probability of the event we aim to predict. While networks for tabular or image data are usually overconfident, recent works have shown that graph neural networks (GNNs) show the opposite behavior for node-level classification. But what happens when we are predicting links? We show that, in this case, G…
▽ More
Deep neural networks are notoriously miscalibrated, i.e., their outputs do not reflect the true probability of the event we aim to predict. While networks for tabular or image data are usually overconfident, recent works have shown that graph neural networks (GNNs) show the opposite behavior for node-level classification. But what happens when we are predicting links? We show that, in this case, GNNs often exhibit a mixed behavior. More specifically, they may be overconfident in negative predictions while being underconfident in positive ones. Based on this observation, we propose IN-N-OUT, the first-ever method to calibrate GNNs for link prediction. IN-N-OUT is based on two simple intuitions: i) attributing true/false labels to an edge while respecting a GNNs prediction should cause but small fluctuations in that edge's embedding; and, conversely, ii) if we label that same edge contradicting our GNN, embeddings should change more substantially. An extensive experimental campaign shows that IN-N-OUT significantly improves the calibration of GNNs in link prediction, consistently outperforming the baselines available -- which are not designed for this specific task.
△ Less
Submitted 8 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Thin and Deep Gaussian Processes
Authors:
Daniel Augusto de Souza,
Alexander Nikitin,
ST John,
Magnus Ross,
Mauricio A. Álvarez,
Marc Peter Deisenroth,
João P. P. Gomes,
Diego Mesquita,
César Lincoln C. Mattos
Abstract:
Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn…
▽ More
Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets
Authors:
Tiago da Silva,
Eliezer Silva,
António Góis,
Dominik Heider,
Samuel Kaski,
Diego Mesquita,
Adèle Ribeiro
Abstract:
Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inf…
▽ More
Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
△ Less
Submitted 1 November, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition
Authors:
Yuling Yao,
Luiz Max Carvalho,
Diego Mesquita,
Yann McLatchie
Abstract:
Combining predictions from different models is a central problem in Bayesian inference and machine learning more broadly. Currently, these predictive distributions are almost exclusively combined using linear mixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts. Such linear mixtures impose idiosyncrasies that might be undesirable for some applications, such as multi…
▽ More
Combining predictions from different models is a central problem in Bayesian inference and machine learning more broadly. Currently, these predictive distributions are almost exclusively combined using linear mixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts. Such linear mixtures impose idiosyncrasies that might be undesirable for some applications, such as multi-modality. While there exist alternative strategies (e.g. geometric bridge or superposition), optimising their parameters usually involves computing an intractable normalising constant repeatedly. We present two novel Bayesian model combination tools. These are generalisations of model stacking, but combine posterior densities by log-linear pooling (locking) and quantum superposition (quacking). To optimise model weights while avoiding the burden of normalising constants, we investigate the Hyvarinen score of the combined posterior predictions. We demonstrate locking with an illustrative example and discuss its practical application with importance sampling.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Distill n' Explain: explaining graph neural networks using simple surrogates
Authors:
Tamara Pereira,
Erik Nascimento,
Lucas E. Resck,
Diego Mesquita,
Amauri Souza
Abstract:
Explaining node predictions in graph neural networks (GNNs) often boils down to finding graph substructures that preserve predictions. Finding these structures usually implies back-propagating through the GNN, bonding the complexity (e.g., number of layers) of the GNN to the cost of explaining it. This naturally begs the question: Can we break this bond by explaining a simpler surrogate GNN? To an…
▽ More
Explaining node predictions in graph neural networks (GNNs) often boils down to finding graph substructures that preserve predictions. Finding these structures usually implies back-propagating through the GNN, bonding the complexity (e.g., number of layers) of the GNN to the cost of explaining it. This naturally begs the question: Can we break this bond by explaining a simpler surrogate GNN? To answer the question, we propose Distill n' Explain (DnX). First, DnX learns a surrogate GNN via knowledge distillation. Then, DnX extracts node or edge-level explanations by solving a simple convex program. We also propose FastDnX, a faster version of DnX that leverages the linear decomposition of our surrogate model. Experiments show that DnX and FastDnX often outperform state-of-the-art GNN explainers while being orders of magnitude faster. Additionally, we support our empirical findings with theoretical results linking the quality of the surrogate model (i.e., distillation error) to the faithfulness of explanations.
△ Less
Submitted 8 March, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Provably expressive temporal graph networks
Authors:
Amauri H. Souza,
Diego Mesquita,
Samuel Kaski,
Vikas Garg
Abstract:
Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TG…
▽ More
Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TGNs). Specifically, novel constructions reveal the inadequacy of MP-TGNs and WA-TGNs, proving that neither category subsumes the other. We extend the 1-WL (Weisfeiler-Leman) test to temporal graphs, and show that the most powerful MP-TGNs should use injective updates, as in this case they become as expressive as the temporal WL. Also, we show that sufficiently deep MP-TGNs cannot benefit from memory, and MP/WA-TGNs fail to compute graph properties such as girth.
These theoretical insights lead us to PINT -- a novel architecture that leverages injective temporal message passing and relative positional features. Importantly, PINT is provably more expressive than both MP-TGNs and WA-TGNs. PINT significantly outperforms existing TGNs on several real-world benchmarks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Parallel MCMC Without Embarrassing Failures
Authors:
Daniel Augusto de Souza,
Diego Mesquita,
Samuel Kaski,
Luigi Acerbi
Abstract:
Embarrassingly parallel Markov Chain Monte Carlo (MCMC) exploits parallel computing to scale Bayesian inference to large datasets by using a two-step approach. First, MCMC is run in parallel on (sub)posteriors defined on data partitions. Then, a server combines local results. While efficient, this framework is very sensitive to the quality of subposterior sampling. Common sampling problems such as…
▽ More
Embarrassingly parallel Markov Chain Monte Carlo (MCMC) exploits parallel computing to scale Bayesian inference to large datasets by using a two-step approach. First, MCMC is run in parallel on (sub)posteriors defined on data partitions. Then, a server combines local results. While efficient, this framework is very sensitive to the quality of subposterior sampling. Common sampling problems such as missing modes or misrepresentation of low-density regions are amplified -- instead of being corrected -- in the combination phase, leading to catastrophic failures. In this work, we propose a novel combination strategy to mitigate this issue. Our strategy, Parallel Active Inference (PAI), leverages Gaussian Process (GP) surrogate modeling and active learning. After fitting GPs to subposteriors, PAI (i) shares information between GP surrogates to cover missing modes; and (ii) uses active sampling to individually refine subposterior approximations. We validate PAI in challenging benchmarks, including heavy-tailed and multi-modal posteriors and a real-world application to computational neuroscience. Empirical results show that PAI succeeds where previous methods catastrophically fail, with a small communication overhead.
△ Less
Submitted 29 March, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
Rethinking pooling in graph neural networks
Authors:
Diego Mesquita,
Amauri H. Souza,
Samuel Kaski
Abstract:
Graph pooling is a central component of a myriad of graph neural network (GNN) architectures. As an inheritance from traditional CNNs, most approaches formulate graph pooling as a cluster assignment problem, extending the idea of local patches in regular grids to graphs. Despite the wide adherence to this design choice, no work has rigorously evaluated its influence on the success of GNNs. In this…
▽ More
Graph pooling is a central component of a myriad of graph neural network (GNN) architectures. As an inheritance from traditional CNNs, most approaches formulate graph pooling as a cluster assignment problem, extending the idea of local patches in regular grids to graphs. Despite the wide adherence to this design choice, no work has rigorously evaluated its influence on the success of GNNs. In this paper, we build upon representative GNNs and introduce variants that challenge the need for locality-preserving representations, either using randomization or clustering on the complement graph. Strikingly, our experiments demonstrate that using these variants does not result in any decrease in performance. To understand this phenomenon, we study the interplay between convolutional layers and the subsequent pooling ones. We show that the convolutions play a leading role in the learned representations. In contrast to the common belief, local pooling is not responsible for the success of GNNs on relevant and widely-used benchmarks.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Federated Stochastic Gradient Langevin Dynamics
Authors:
Khaoula El Mekkaoui,
Diego Mesquita,
Paul Blomstedt,
Samuel Kaski
Abstract:
Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes…
▽ More
Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes the Markov chains to diverge from the true posterior even for very simple models. To alleviate both these problems, we propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates. Notably, conducive gradients are easy to compute, and since we only calculate the approximations once, they incur negligible overhead. We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD). We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails. We also show that FSGLD outperforms DSGLD for non-IID federated data with experiments on metric learning and neural networks.
△ Less
Submitted 14 June, 2021; v1 submitted 23 April, 2020;
originally announced April 2020.
-
Learning GPLVM with arbitrary kernels using the unscented transformation
Authors:
Daniel Augusto R. M. A. de Souza,
Diego Mesquita,
César Lincoln C. Mattos,
João Paulo P. Gomes
Abstract:
Gaussian Process Latent Variable Model (GPLVM) is a flexible framework to handle uncertain inputs in Gaussian Processes (GPs) and incorporate GPs as components of larger graphical models. Nonetheless, the standard GPLVM variational inference approach is tractable only for a narrow family of kernel functions. The most popular implementations of GPLVM circumvent this limitation using quadrature meth…
▽ More
Gaussian Process Latent Variable Model (GPLVM) is a flexible framework to handle uncertain inputs in Gaussian Processes (GPs) and incorporate GPs as components of larger graphical models. Nonetheless, the standard GPLVM variational inference approach is tractable only for a narrow family of kernel functions. The most popular implementations of GPLVM circumvent this limitation using quadrature methods, which may become a computational bottleneck even for relatively low dimensions. For instance, the widely employed Gauss-Hermite quadrature has exponential complexity on the number of dimensions. In this work, we propose using the unscented transformation instead. Overall, this method presents comparable, if not better, performance than offthe-shelf solutions to GPLVM and its computational complexity scales only linearly on dimension. In contrast to Monte Carlo methods, our approach is deterministic and works well with quasi-Newton methods, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. We illustrate the applicability of our method with experiments on dimensionality reduction and multistep-ahead prediction with uncertainty propagation.
△ Less
Submitted 10 November, 2020; v1 submitted 3 July, 2019;
originally announced July 2019.
-
LS-SVR as a Bayesian RBF network
Authors:
Diego P. P. Mesquita,
Luis A. Freitas,
João P. P. Gomes,
César L. C. Mattos
Abstract:
We show theoretical similarities between the Least Squares Support Vector Regression (LS-SVR) model with a Radial Basis Functions (RBF) kernel and maximum a posteriori (MAP) inference on Bayesian RBF networks with a specific Gaussian prior on the regression weights. Although previous works have pointed out similar expressions between those learning approaches, we explicit and formally state the ex…
▽ More
We show theoretical similarities between the Least Squares Support Vector Regression (LS-SVR) model with a Radial Basis Functions (RBF) kernel and maximum a posteriori (MAP) inference on Bayesian RBF networks with a specific Gaussian prior on the regression weights. Although previous works have pointed out similar expressions between those learning approaches, we explicit and formally state the existing correspondences. We empirically demonstrate our result by performing computational experiments with standard regression benchmarks. Our findings open a range of possibilities to improve LS-SVR by borrowing strength from well-established developments in Bayesian methodology.
△ Less
Submitted 2 August, 2019; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Embarrassingly parallel MCMC using deep invertible transformations
Authors:
Diego Mesquita,
Paul Blomstedt,
Samuel Kaski
Abstract:
While MCMC methods have become a main work-horse for Bayesian inference, scaling them to large distributed datasets is still a challenge. Embarrassingly parallel MCMC strategies take a divide-and-conquer stance to achieve this by writing the target posterior as a product of subposteriors, running MCMC for each of them in parallel and subsequently combining the results. The challenge then lies in d…
▽ More
While MCMC methods have become a main work-horse for Bayesian inference, scaling them to large distributed datasets is still a challenge. Embarrassingly parallel MCMC strategies take a divide-and-conquer stance to achieve this by writing the target posterior as a product of subposteriors, running MCMC for each of them in parallel and subsequently combining the results. The challenge then lies in devising efficient aggregation strategies. Current strategies trade-off between approximation quality, and costs of communication and computation. In this work, we introduce a novel method that addresses these issues simultaneously. Our key insight is to introduce a deep invertible transformation to approximate each of the subposteriors. These approximations can be made accurate even for complex distributions and serve as intermediate representations, keeping the total communication cost limited. Moreover, they enable us to sample from the product of the subposteriors using an efficient and stable importance sampling scheme. We demonstrate the approach outperforms available state-of-the-art methods in a range of challenging scenarios, including high-dimensional and heterogeneous subposteriors.
△ Less
Submitted 15 June, 2021; v1 submitted 11 March, 2019;
originally announced March 2019.