-
Understanding uncertainty in Bayesian cluster analysis
Authors:
Cecilia Balocchi,
Sara Wade
Abstract:
The Bayesian approach to clustering is often appreciated for its ability to provide uncertainty in the partition structure. However, summarizing the posterior distribution over the clustering structure can be challenging, due the discrete, unordered nature and massive dimension of the space. While recent advancements provide a single clustering estimate to represent the posterior, this ignores unc…
▽ More
The Bayesian approach to clustering is often appreciated for its ability to provide uncertainty in the partition structure. However, summarizing the posterior distribution over the clustering structure can be challenging, due the discrete, unordered nature and massive dimension of the space. While recent advancements provide a single clustering estimate to represent the posterior, this ignores uncertainty and may even be unrepresentative in instances where the posterior is multimodal. To enhance our understanding of uncertainty, we propose a WASserstein Approximation for Bayesian clusterIng (WASABI), which summarizes the posterior samples with not one, but multiple clustering estimates, each corresponding to a different part of the space of partitions that receives substantial posterior mass. Specifically, we find such clustering estimates by approximating the posterior distribution in a Wasserstein distance sense, equipped with a suitable metric on the partition space. An interesting byproduct is that a locally optimal solution to this problem can be found using a k-medoids-like algorithm on the partition space to divide the posterior samples into different groups, each represented by one of the clustering estimates. Using both synthetic and real datasets, we show that our proposal helps to improve the understanding of uncertainty, particularly when the data clusters are not well separated or when the employed model is misspecified.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Understanding the Trade-offs in Accuracy and Uncertainty Quantification: Architecture and Inference Choices in Bayesian Neural Networks
Authors:
Alisa Sheinkman,
Sara Wade
Abstract:
As modern neural networks get more complex, specifying a model with high predictive performance and sound uncertainty quantification becomes a more challenging task. Despite some promising theoretical results on the true posterior predictive distribution of Bayesian neural networks, the properties of even the most commonly used posterior approximations are often questioned. Computational burdens a…
▽ More
As modern neural networks get more complex, specifying a model with high predictive performance and sound uncertainty quantification becomes a more challenging task. Despite some promising theoretical results on the true posterior predictive distribution of Bayesian neural networks, the properties of even the most commonly used posterior approximations are often questioned. Computational burdens and intractable posteriors expose miscalibrated Bayesian neural networks to poor accuracy and unreliable uncertainty estimates. Approximate Bayesian inference aims to replace unknown and intractable posterior distributions with some simpler but feasible distributions. The dimensions of modern deep models, coupled with the lack of identifiability, make Markov chain Monte Carlo (MCMC) tremendously expensive and unable to fully explore the multimodal posterior. On the other hand, variational inference benefits from improved computational complexity but lacks the asymptotical guarantees of sampling-based inference and tends to concentrate around a single mode. The performance of both approaches heavily depends on architectural choices; this paper aims to shed some light on this by considering the computational costs, accuracy and uncertainty quantification in different scenarios including large width and out-of-sample data. To improve posterior exploration, different model averaging and ensembling techniques are studied, along with their benefits on predictive performance. In our experiments, variational inference overall provided better uncertainty quantification than MCMC; further, stacking and ensembles of variational approximations provided comparable accuracy to MCMC at a much-reduced cost.
△ Less
Submitted 17 June, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Variational Bayesian Bow tie Neural Networks with Shrinkage
Authors:
Alisa Sheinkman,
Sara Wade
Abstract:
Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides a natural framework to overcome such issues and has become the gold standard for uncertainty estimation with deep models, also providing improved accuracy and…
▽ More
Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides a natural framework to overcome such issues and has become the gold standard for uncertainty estimation with deep models, also providing improved accuracy and a framework for tuning critical hyperparameters. However, exact Bayesian inference is challenging, typically involving variational algorithms that impose strong independence and distributional assumptions. Moreover, existing methods are sensitive to the architectural choice of the network. We address these issues by focusing on a stochastic relaxation of the standard feed-forward rectified neural network and using sparsity-promoting priors on the weights of the neural network for increased robustness to architectural design. Thanks to Polya-Gamma data augmentation tricks, which render a conditionally linear and Gaussian model, we derive a fast, approximate variational inference algorithm that avoids distributional assumptions and independence across layers. Suitable strategies to further improve scalability and account for multimodality are considered.
△ Less
Submitted 17 June, 2025; v1 submitted 17 November, 2024;
originally announced November 2024.
-
Covariate-dependent hierarchical Dirichlet processes
Authors:
Huizi Zhang,
Sara Wade,
Natalia Bochkina
Abstract:
Bayesian hierarchical modelling is a natural framework to effectively integrate data and borrow information across groups. In this paper, we address problems related to density estimation and identifying clusters across related groups, by proposing a hierarchical Bayesian approach that incorporates additional covariate information. To achieve flexibility, our approach builds on ideas from Bayesian…
▽ More
Bayesian hierarchical modelling is a natural framework to effectively integrate data and borrow information across groups. In this paper, we address problems related to density estimation and identifying clusters across related groups, by proposing a hierarchical Bayesian approach that incorporates additional covariate information. To achieve flexibility, our approach builds on ideas from Bayesian nonparametrics, combining the hierarchical Dirichlet process with dependent Dirichlet processes. The proposed model is widely applicable, accommodating multiple and mixed covariate types through appropriate kernel functions as well as different output types through suitable likelihoods. This extends our ability to discern the relationship between covariates and clusters, while also effectively borrowing information and quantifying differences across groups. By employing a data augmentation trick, we are able to tackle the intractable normalized weights and construct a Markov chain Monte Carlo algorithm for posterior inference. The proposed method is illustrated on simulated data and two real data sets on single-cell RNA sequencing (scRNA-seq) and calcium imaging. For scRNA-seq data, we show that the incorporation of cell dynamics facilitates the discovery of additional cell subgroups. On calcium imaging data, our method identifies interpretable clusters of time frames with similar neural activity, aligning with the observed behavior of the animal.
△ Less
Submitted 17 April, 2025; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Bayesian dependent mixture models: A predictive comparison and survey
Authors:
Sara Wade,
Vanda Inacio,
Sonia Petrone
Abstract:
For exchangeable data, mixture models are an extremely useful tool for density estimation due to their attractive balance between smoothness and flexibility. When additional covariate information is present, mixture models can be extended for flexible regression by modeling the mixture parameters, namely the weights and atoms, as functions of the covariates. These types of models are interpretable…
▽ More
For exchangeable data, mixture models are an extremely useful tool for density estimation due to their attractive balance between smoothness and flexibility. When additional covariate information is present, mixture models can be extended for flexible regression by modeling the mixture parameters, namely the weights and atoms, as functions of the covariates. These types of models are interpretable and highly flexible, allowing non only the mean but the whole density of the response to change with the covariates, which is also known as density regression. This article reviews Bayesian covariate-dependent mixture models and highlights which data types can be accommodated by the different models along with the methodological and applied areas where they have been used. In addition to being highly flexible, these models are also numerous; we focus on nonparametric constructions and broadly organize them into three categories: 1) joint models of the responses and covariates, 2) conditional models with single-weights and covariate-dependent atoms, and 3) conditional models with covariate-dependent weights. The diversity and variety of the available models in the literature raises the question of how to choose among them for the application at hand. We attempt to shed light on this question through a careful analysis of the predictive equations for the conditional mean and density function as well as predictive comparisons in three simulated data examples.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Shared Differential Clustering across Single-cell RNA Sequencing Datasets with the Hierarchical Dirichlet Process
Authors:
Jinlu Liu,
Sara Wade,
Natalia Bochkina
Abstract:
Single-cell RNA sequencing (scRNA-seq) is powerful technology that allows researchers to understand gene expression patterns at the single-cell level. However, analysing scRNA-seq data is challenging due to issues and biases in data collection. In this work, we construct an integrated Bayesian model that simultaneously addresses normalization, imputation and batch effects and also nonparametricall…
▽ More
Single-cell RNA sequencing (scRNA-seq) is powerful technology that allows researchers to understand gene expression patterns at the single-cell level. However, analysing scRNA-seq data is challenging due to issues and biases in data collection. In this work, we construct an integrated Bayesian model that simultaneously addresses normalization, imputation and batch effects and also nonparametrically clusters cells into groups across multiple datasets. A Gibbs sampler based on a finite-dimensional approximation of the HDP is developed for posterior inference.
△ Less
Submitted 13 December, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Leveraging variational autoencoders for multiple data imputation
Authors:
Breeshey Roskams-Hieter,
Jude Wells,
Sara Wade
Abstract:
Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing da…
▽ More
Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $β$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $β$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $β$-VAEs can avoid false discoveries that arise as artefacts of imputation.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Mixtures of Gaussian Process Experts with SMC$^2$
Authors:
Teemu Härkönen,
Sara Wade,
Kody Law,
Lassi Roininen
Abstract:
Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexit…
▽ More
Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable.
△ Less
Submitted 6 July, 2025; v1 submitted 26 August, 2022;
originally announced August 2022.
-
Bayesian nonparametric scalar-on-image regression via Potts-Gibbs random partition models
Authors:
Mica Teo Shu Xian,
Sara Wade
Abstract:
Scalar-on-image regression aims to investigate changes in a scalar response of interest based on high-dimensional imaging data. We propose a novel Bayesian nonparametric scalar-on-image regression model that utilises the spatial coordinates of the voxels to group voxels with similar effects on the response to have a common coefficient. We employ the Potts-Gibbs random partition model as the prior…
▽ More
Scalar-on-image regression aims to investigate changes in a scalar response of interest based on high-dimensional imaging data. We propose a novel Bayesian nonparametric scalar-on-image regression model that utilises the spatial coordinates of the voxels to group voxels with similar effects on the response to have a common coefficient. We employ the Potts-Gibbs random partition model as the prior for the random partition in which the partition process is spatially dependent, thereby encouraging groups representing spatially contiguous regions. In addition, Bayesian shrinkage priors are utilised to identify the covariates and regions that are most relevant for the prediction. The proposed model is illustrated using the simulated data sets.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Bayesian calibration of simulation models: A tutorial and an Australian smoking behaviour model
Authors:
Stephen Wade,
Marianne F Weber,
Peter Sarich,
Pavla Vaneckova,
Silvia Behar-Harpaz,
Preston J Ngo,
Sonya Cressman,
Coral E Gartner,
John M Murray,
Tony A Blakely,
Emily Banks,
Martin C Tammemagi,
Karen Canfell,
Michael Caruana
Abstract:
Simulation models of epidemiological, biological, ecological, and environmental processes are increasingly being calibrated using Bayesian statistics. The Bayesian approach provides simple rules to synthesise multiple data sources and to calculate uncertainty in model output due to uncertainty in the calibration data. As the number of tutorials and studies published grow, the solutions to common d…
▽ More
Simulation models of epidemiological, biological, ecological, and environmental processes are increasingly being calibrated using Bayesian statistics. The Bayesian approach provides simple rules to synthesise multiple data sources and to calculate uncertainty in model output due to uncertainty in the calibration data. As the number of tutorials and studies published grow, the solutions to common difficulties in Bayesian calibration across these fields have become more apparent, and a step-by-step process for successful calibration across all these fields is emerging. We provide a statement of the key steps in a Bayesian calibration, and we outline analyses and approaches to each step that have emerged from one or more of these applied sciences. Thus we present a synthesis of Bayesian calibration methodologies that cut across a number of scientific disciplines.
To demonstrate these steps and to provide further detail on the computations involved in Bayesian calibration, we calibrated a compartmental model of tobacco smoking behaviour in Australia. We found that the proportion of a birth cohort estimated to take up smoking before they reach age 20 years in 2016 was at its lowest value since the early 20th century, and that quit rates were at their highest. As a novel outcome, we quantified the rate that ex-smokers switched to reporting as a 'never smoker' when surveyed later in life; a phenomenon that, to our knowledge, has never been quantified using cross-sectional survey data.
△ Less
Submitted 7 March, 2022; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data
Authors:
W Yu,
S Wade,
H D Bondell,
L Azizi
Abstract:
High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly on a t…
▽ More
High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly on a two-stage approach performing variable selection and classification separately. We propose in this work a novel Gaussian process discriminant analysis (GPDA) that combines these steps in a unified framework. Our model is a two-layer non-stationary Gaussian process coupled with an Ising prior to identify differentially-distributed locations. Scalable inference is achieved via developing a variational scheme that exploits advances in the use of sparse inverse covariance matrices. We demonstrate the performance of our methodology on simulated datasets and two proteomics datasets: breast cancer and SARS-CoV-2. Our approach distinguishes itself by offering explainability as well as uncertainty quantification in addition to low computational cost, which are crucial to increase trust and social acceptance of data-driven tools.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
On MCMC for variationally sparse Gaussian processes: A pseudo-marginal approach
Authors:
Karla Monterrubio-Gómez,
Sara Wade
Abstract:
Gaussian processes (GPs) are frequently used in machine learning and statistics to construct powerful models. However, when employing GPs in practice, important considerations must be made, regarding the high computational burden, approximation of the posterior, choice of the covariance function and inference of its hyperparmeters. To address these issues, Hensman et al. (2015) combine variational…
▽ More
Gaussian processes (GPs) are frequently used in machine learning and statistics to construct powerful models. However, when employing GPs in practice, important considerations must be made, regarding the high computational burden, approximation of the posterior, choice of the covariance function and inference of its hyperparmeters. To address these issues, Hensman et al. (2015) combine variationally sparse GPs with Markov chain Monte Carlo (MCMC) to derive a scalable, flexible and general framework for GP models. Nevertheless, the resulting approach requires intractable likelihood evaluations for many observation models. To bypass this problem, we propose a pseudo-marginal (PM) scheme that offers asymptotically exact inference as well as computational gains through doubly stochastic estimators for the intractable likelihood and large datasets. In complex models, the advantages of the PM scheme are particularly evident, and we demonstrate this on a two-level GP regression model with a nonparametric covariance function to capture non-stationarity.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Fast Deep Mixtures of Gaussian Process Experts
Authors:
Clement Etienam,
Kody Law,
Sara Wade,
Vitaly Zankin
Abstract:
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the exper…
▽ More
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster-Classify-Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets. Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models.
△ Less
Submitted 30 November, 2023; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Enriched Mixtures of Gaussian Process Experts
Authors:
Charles W. L. Gadd,
Sara Wade,
Alexis Boukouvalas
Abstract:
Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues o…
▽ More
Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues of this approach in multi-dimensional input spaces, namely, poor scalability and the need for an unnecessarily large number of experts, degrading the predictive performance and increasing uncertainty. We construct a novel model to address these issues through a nested partitioning scheme that automatically infers the number of components at both levels. Multiple response types are accommodated through a generalised GP framework, while multiple input types are included through a factorised exponential family structure. We show the effectiveness of our approach in estimating a parsimonious probabilistic description of both synthetic data of increasing dimension and an Alzheimer's challenge dataset.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Colombian Women's Life Patterns: A Multivariate Density Regression Approach
Authors:
Sara Wade,
Raffaella Piccarreta,
Andrea Cremaschi,
Isadora Antoniano-Villalobos
Abstract:
Women in Colombia face difficulties related to the patriarchal traits of their societies and well-known conflict afflicting the country since 1948. In this critical context, our aim is to study the relationship between baseline socio-demographic factors and variables associated to fertility, partnership patterns, and work activity. To best exploit the explanatory structure, we propose a Bayesian m…
▽ More
Women in Colombia face difficulties related to the patriarchal traits of their societies and well-known conflict afflicting the country since 1948. In this critical context, our aim is to study the relationship between baseline socio-demographic factors and variables associated to fertility, partnership patterns, and work activity. To best exploit the explanatory structure, we propose a Bayesian multivariate density regression model, which can accommodate mixed responses with censored, constrained, and binary traits. The flexible nature of the models allows for nonlinear regression functions and non-standard features in the errors, such as asymmetry or multi-modality. The model has interpretable covariate-dependent weights constructed through normalization, allowing for combinations of categorical and continuous covariates. Computational difficulties for inference are overcome through an adaptive truncation algorithm combining adaptive Metropolis-Hastings and sequential Monte Carlo to create a sequence of automatically truncated posterior mixtures. For our study on Colombian women's life patterns, a variety of quantities are visualised and described, and in particular, our findings highlight the detrimental impact of family violence on women's choices and behaviors.
△ Less
Submitted 20 January, 2021; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Posterior Inference for Sparse Hierarchical Non-stationary Models
Authors:
Karla Monterrubio-Gómez,
Lassi Roininen,
Sara Wade,
Theo Damoulas,
Mark Girolami
Abstract:
Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity w…
▽ More
Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity while also addressing the computational burden through a sparse banded representation of the precision matrix. In this setting, efficient Markov chain Monte Carlo (MCMC) sampling is challenging due to the strong coupling a posteriori of the parameters and hyperparameters. We develop and compare three adaptive MCMC schemes and make use of banded matrix operations for faster inference. Furthermore, a novel extension to multi-dimensional settings is proposed through an additive structure that retains the flexibility and scalability of the model, while also inheriting interpretability from the additive approach. A thorough assessment of the efficiency and accuracy of the methods in nonstationary settings is presented for both simulated experiments and a computer emulation problem.
△ Less
Submitted 1 May, 2019; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Pseudo-marginal Bayesian inference for supervised Gaussian process latent variable models
Authors:
Charles Gadd,
Sara Wade,
Akeel Shah,
Dimitris Grammatopoulos
Abstract:
We introduce a Bayesian framework for inference with a supervised version of the Gaussian process latent variable model. The framework overcomes the high correlations between latent variables and hyperparameters by using an unbiased pseudo estimate for the marginal likelihood that approximately integrates over the latent variables. This is used to construct a Markov Chain to explore the posterior…
▽ More
We introduce a Bayesian framework for inference with a supervised version of the Gaussian process latent variable model. The framework overcomes the high correlations between latent variables and hyperparameters by using an unbiased pseudo estimate for the marginal likelihood that approximately integrates over the latent variables. This is used to construct a Markov Chain to explore the posterior of the hyperparameters. We demonstrate the procedure on simulated and real examples, showing its ability to capture uncertainty and multimodality of the hyperparameters and improved uncertainty quantification in predictions when compared with variational inference.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Forward Thinking: Building and Training Neural Networks One Layer at a Time
Authors:
Chris Hettinger,
Tanner Christensen,
Ben Ehlert,
Jeffrey Humpherys,
Tyler Jarvis,
Sean Wade
Abstract:
We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they a…
▽ More
We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving state-of-the-art accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Bayesian cluster analysis: Point estimation and credible balls
Authors:
Sara Wade,
Zoubin Ghahramani
Abstract:
Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is h…
▽ More
Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is how to summarize the posterior; the huge dimension of partition space and difficulties in visualizing it add to this problem. In a Bayesian analysis, the posterior of a real-valued parameter of interest is often summarized by reporting a point estimate such as the posterior mean along with 95% credible intervals to characterize uncertainty. In this paper, we extend these ideas to develop appropriate point estimates and credible sets to summarize the posterior of clustering structure based on decision and information theoretic techniques.
△ Less
Submitted 8 February, 2019; v1 submitted 13 May, 2015;
originally announced May 2015.