Search | arXiv e-print repository

Occam Factor for Random Graphs: Erdös-Rényi, Independent Edge, and Rank-1 Stochastic Blockmodel

Authors: Tianyu Wang, Zachary M. Pisano, Carey E. Priebe

Abstract: We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompas… ▽ More We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompassing priors method, and computing or approximating each models' evidence. We demonstrate how such a criterion may be used to select the most suitable model among the Erdös-Rényi (ER) model, independent edge (IE) model, and rank-1 stochastic blockmodel (SBM). The Erdös-Rényi may be considered as being linearly nested within IE, a fact which permits exponential family results. The rank-1 SBM is not so ideal, so we propose a numerical method to approximate its evidence. We apply this paradigm to brain connectome data. Future work necessitates deriving and equipping additional candidate random graph models with appropriate priors so they may be included in the paradigm. △ Less

Submitted 7 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2212.13612 [pdf, other]

Conjugate Bayesian analysis of compound-symmetric Gaussian models

Authors: Zachary M. Pisano

Abstract: We discuss Bayesian inference for a known-mean Gaussian model with a compound symmetric variance-covariance matrix. Since the space of such matrices is a linear subspace of that of positive definite matrices, we utilize the methods of Pisano (2022) to decompose the usual Wishart conjugate prior and derive a closed-form, three-parameter, bivariate conjugate prior distribution for the compound-symme… ▽ More We discuss Bayesian inference for a known-mean Gaussian model with a compound symmetric variance-covariance matrix. Since the space of such matrices is a linear subspace of that of positive definite matrices, we utilize the methods of Pisano (2022) to decompose the usual Wishart conjugate prior and derive a closed-form, three-parameter, bivariate conjugate prior distribution for the compound-symmetric half-precision matrix. The off-diagonal entry is found to have a non-central Kummer-Beta distribution conditioned on the diagonal, which is shown to have a gamma distribution generalized with Gauss's hypergeometric function. Such considerations yield a treatment of maximum a posteriori estimation for such matrices in Gaussian settings, including the Bayesian evidence and flexibility penalty attributable to Rougier and Priebe (2019). We also demonstrate how the prior may be utilized to naturally test for the positivity of a common within-class correlation in a random-intercept model using two data-driven examples. △ Less

Submitted 16 March, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: Version submitted for review to Electronic Journal of Statistics in March 2023

arXiv:2105.01566 [pdf, other]

Occam Factor for Gaussian Models With Unknown Variance Structure

Authors: Zachary M. Pisano, Daniel Q. Naiman, Carey E. Priebe

Abstract: We discuss model selection to determine whether the variance-covariance matrix of a multivariate Gaussian model with known mean should be considered to be a constant diagonal, a non-constant diagonal, or an arbitrary positive definite matrix. Of particular interest is the relationship between Bayesian evidence and the flexibility penalty due to Priebe and Rougier. For the case of an exponential fa… ▽ More We discuss model selection to determine whether the variance-covariance matrix of a multivariate Gaussian model with known mean should be considered to be a constant diagonal, a non-constant diagonal, or an arbitrary positive definite matrix. Of particular interest is the relationship between Bayesian evidence and the flexibility penalty due to Priebe and Rougier. For the case of an exponential family in canonical form equipped with a conjugate prior for the canonical parameter, flexibility may be exactly decomposed into the usual BIC likelihood penalty and a $O_p(1)$ term, the latter of which we explicitly compute. We also investigate the asymptotics of Bayes factors for linearly nested canonical exponential families equipped with conjugate priors; in particular, we find the exact rates at which Bayes factors correctly diverge in favor of the correct model: linearly and logarithmically in the number of observations when the full and nested models are true, respectively. Such theoretical considerations for the general case permit us to fully express the asymptotic behavior of flexibility and Bayes factors for the variance-covariance structure selection problem when we assume that the prior for the model precision is a member of the gamma/Wishart family of distributions or is uninformative. Simulations demonstrate evidence's immediate and superior performance in model selection compared to approximate criteria such as the BIC. We extend the framework to the multivariate Gaussian linear model with three data-driven examples. △ Less

Submitted 4 May, 2021; originally announced May 2021.

Comments: 46 pages, 1 figure

MSC Class: Primary 62F07; secondary 62F10

arXiv:2003.13462 [pdf, other]

Spectral graph clustering via the Expectation-Solution algorithm

Authors: Zachary M. Pisano, Joshua S. Agterberg, Carey E. Priebe, Daniel Q. Naiman

Abstract: The stochastic blockmodel (SBM) models the connectivity within and between disjoint subsets of nodes in networks. Prior work demonstrated that the rows of an SBM's adjacency spectral embedding (ASE) and Laplacian spectral embedding (LSE) both converge in law to Gaussian mixtures where the components are curved exponential families. Maximum likelihood estimation via the Expectation-Maximization (EM… ▽ More The stochastic blockmodel (SBM) models the connectivity within and between disjoint subsets of nodes in networks. Prior work demonstrated that the rows of an SBM's adjacency spectral embedding (ASE) and Laplacian spectral embedding (LSE) both converge in law to Gaussian mixtures where the components are curved exponential families. Maximum likelihood estimation via the Expectation-Maximization (EM) algorithm for a full Gaussian mixture model (GMM) can then perform the task of clustering graph nodes, albeit without appealing to the components' curvature. Noting that EM is a special case of the Expectation-Solution (ES) algorithm, we propose two ES algorithms that allow us to take full advantage of these curved structures. After presenting the ES algorithm for the general curved-Gaussian mixture, we develop those corresponding to the ASE and LSE limiting distributions. Simulating from artificial SBMs and a brain connectome SBM reveals that clustering graph nodes via our ES algorithms can improve upon that of EM for a full GMM for a wide range of settings. △ Less

Submitted 3 May, 2022; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: 45 pages, version accepted by Electronic Journal of Statistics

MSC Class: 62-08 (Primary) 62P15; 62P10 (Secondary)

Showing 1–4 of 4 results for author: Pisano, Z M