-
Occam Factor for Random Graphs: Erdös-Rényi, Independent Edge, and Rank-1 Stochastic Blockmodel
Authors:
Tianyu Wang,
Zachary M. Pisano,
Carey E. Priebe
Abstract:
We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompas…
▽ More
We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompassing priors method, and computing or approximating each models' evidence. We demonstrate how such a criterion may be used to select the most suitable model among the Erdös-Rényi (ER) model, independent edge (IE) model, and rank-1 stochastic blockmodel (SBM). The Erdös-Rényi may be considered as being linearly nested within IE, a fact which permits exponential family results. The rank-1 SBM is not so ideal, so we propose a numerical method to approximate its evidence. We apply this paradigm to brain connectome data. Future work necessitates deriving and equipping additional candidate random graph models with appropriate priors so they may be included in the paradigm.
△ Less
Submitted 7 May, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Conjugate Bayesian analysis of compound-symmetric Gaussian models
Authors:
Zachary M. Pisano
Abstract:
We discuss Bayesian inference for a known-mean Gaussian model with a compound symmetric variance-covariance matrix. Since the space of such matrices is a linear subspace of that of positive definite matrices, we utilize the methods of Pisano (2022) to decompose the usual Wishart conjugate prior and derive a closed-form, three-parameter, bivariate conjugate prior distribution for the compound-symme…
▽ More
We discuss Bayesian inference for a known-mean Gaussian model with a compound symmetric variance-covariance matrix. Since the space of such matrices is a linear subspace of that of positive definite matrices, we utilize the methods of Pisano (2022) to decompose the usual Wishart conjugate prior and derive a closed-form, three-parameter, bivariate conjugate prior distribution for the compound-symmetric half-precision matrix. The off-diagonal entry is found to have a non-central Kummer-Beta distribution conditioned on the diagonal, which is shown to have a gamma distribution generalized with Gauss's hypergeometric function. Such considerations yield a treatment of maximum a posteriori estimation for such matrices in Gaussian settings, including the Bayesian evidence and flexibility penalty attributable to Rougier and Priebe (2019). We also demonstrate how the prior may be utilized to naturally test for the positivity of a common within-class correlation in a random-intercept model using two data-driven examples.
△ Less
Submitted 16 March, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Occam Factor for Gaussian Models With Unknown Variance Structure
Authors:
Zachary M. Pisano,
Daniel Q. Naiman,
Carey E. Priebe
Abstract:
We discuss model selection to determine whether the variance-covariance matrix of a multivariate Gaussian model with known mean should be considered to be a constant diagonal, a non-constant diagonal, or an arbitrary positive definite matrix. Of particular interest is the relationship between Bayesian evidence and the flexibility penalty due to Priebe and Rougier. For the case of an exponential fa…
▽ More
We discuss model selection to determine whether the variance-covariance matrix of a multivariate Gaussian model with known mean should be considered to be a constant diagonal, a non-constant diagonal, or an arbitrary positive definite matrix. Of particular interest is the relationship between Bayesian evidence and the flexibility penalty due to Priebe and Rougier. For the case of an exponential family in canonical form equipped with a conjugate prior for the canonical parameter, flexibility may be exactly decomposed into the usual BIC likelihood penalty and a $O_p(1)$ term, the latter of which we explicitly compute. We also investigate the asymptotics of Bayes factors for linearly nested canonical exponential families equipped with conjugate priors; in particular, we find the exact rates at which Bayes factors correctly diverge in favor of the correct model: linearly and logarithmically in the number of observations when the full and nested models are true, respectively. Such theoretical considerations for the general case permit us to fully express the asymptotic behavior of flexibility and Bayes factors for the variance-covariance structure selection problem when we assume that the prior for the model precision is a member of the gamma/Wishart family of distributions or is uninformative. Simulations demonstrate evidence's immediate and superior performance in model selection compared to approximate criteria such as the BIC. We extend the framework to the multivariate Gaussian linear model with three data-driven examples.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Spectral graph clustering via the Expectation-Solution algorithm
Authors:
Zachary M. Pisano,
Joshua S. Agterberg,
Carey E. Priebe,
Daniel Q. Naiman
Abstract:
The stochastic blockmodel (SBM) models the connectivity within and between disjoint subsets of nodes in networks. Prior work demonstrated that the rows of an SBM's adjacency spectral embedding (ASE) and Laplacian spectral embedding (LSE) both converge in law to Gaussian mixtures where the components are curved exponential families. Maximum likelihood estimation via the Expectation-Maximization (EM…
▽ More
The stochastic blockmodel (SBM) models the connectivity within and between disjoint subsets of nodes in networks. Prior work demonstrated that the rows of an SBM's adjacency spectral embedding (ASE) and Laplacian spectral embedding (LSE) both converge in law to Gaussian mixtures where the components are curved exponential families. Maximum likelihood estimation via the Expectation-Maximization (EM) algorithm for a full Gaussian mixture model (GMM) can then perform the task of clustering graph nodes, albeit without appealing to the components' curvature. Noting that EM is a special case of the Expectation-Solution (ES) algorithm, we propose two ES algorithms that allow us to take full advantage of these curved structures. After presenting the ES algorithm for the general curved-Gaussian mixture, we develop those corresponding to the ASE and LSE limiting distributions. Simulating from artificial SBMs and a brain connectome SBM reveals that clustering graph nodes via our ES algorithms can improve upon that of EM for a full GMM for a wide range of settings.
△ Less
Submitted 3 May, 2022; v1 submitted 30 March, 2020;
originally announced March 2020.