-
Improving variable selection properties by leveraging external data
Authors:
Paul Rognon-Vael,
David Rossell,
Piotr Zwiernik
Abstract:
Sparse high-dimensional signal recovery is only possible under certain conditions on the number of parameters, sample size, signal strength and underlying sparsity. We show that leveraging external information, as possible with data integration or transfer learning, allows to push these mathematical limits. Specifically, we consider external information that allows splitting parameters into blocks…
▽ More
Sparse high-dimensional signal recovery is only possible under certain conditions on the number of parameters, sample size, signal strength and underlying sparsity. We show that leveraging external information, as possible with data integration or transfer learning, allows to push these mathematical limits. Specifically, we consider external information that allows splitting parameters into blocks, first in a simplified case, the Gaussian sequence model, and then in the general linear regression setting. We show how external information dependent, block-based, $\ell_0$ penalties attain model selection consistency under milder conditions than standard $\ell_0$ penalties, and they also attain faster model recovery rates. We first provide results for oracle-based $\ell_0$ penalties that have access to perfect sparsity and signal strength information. Subsequently, we propose an empirical Bayes data analysis method that does not require oracle information and for which efficient computation is possible via standard MCMC techniques. Our results provide a mathematical basis to justify the use of data integration methods in high-dimensional structural learning.
△ Less
Submitted 29 March, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
Universality of Benign Overfitting in Binary Linear Classification
Authors:
Ichiro Hashimoto,
Stanislav Volgushev,
Piotr Zwiernik
Abstract:
The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in vario…
▽ More
The practical success of deep learning has led to the discovery of several surprising phenomena. One of these phenomena, that has spurred intense theoretical research, is ``benign overfitting'': deep neural networks seem to generalize well in the over-parametrized regime even though the networks show a perfect fit to noisy training data. It is now known that benign overfitting also occurs in various classical statistical models. For linear maximum margin classifiers, benign overfitting has been established theoretically in a class of mixture models with very strong assumptions on the covariate distribution. However, even in this simple setting, many questions remain open. For instance, most of the existing literature focuses on the noiseless case where all true class labels are observed without errors, whereas the more interesting noisy case remains poorly understood. We provide a comprehensive study of benign overfitting for linear maximum margin classifiers. We discover a phase transition in test error bounds for the noisy model which was previously unknown and provide some geometric intuition behind it. We further considerably relax the required covariate assumptions in both, the noisy and noiseless case. Our results demonstrate that benign overfitting of maximum margin classifiers holds in a much wider range of scenarios than was previously known and provide new insights into the underlying mechanisms.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Entropic covariance models
Authors:
Piotr Zwiernik
Abstract:
In covariance matrix estimation, one of the challenges lies in finding a suitable model and an efficient estimation method. Two commonly used modelling approaches in the literature involve imposing linear restrictions on the covariance matrix or its inverse. Another approach considers linear restrictions on the matrix logarithm of the covariance matrix. In this paper, we present a general framewor…
▽ More
In covariance matrix estimation, one of the challenges lies in finding a suitable model and an efficient estimation method. Two commonly used modelling approaches in the literature involve imposing linear restrictions on the covariance matrix or its inverse. Another approach considers linear restrictions on the matrix logarithm of the covariance matrix. In this paper, we present a general framework for linear restrictions on different transformations of the covariance matrix, including the mentioned examples. Our proposed estimation method solves a convex problem and yields an $M$-estimator, allowing for relatively straightforward asymptotic (in general) and finite sample analysis (in the Gaussian case). In particular, we recover standard $\sqrt{n/d}$ rates, where $d$ is the dimension of the underlying model. Our geometric insights allow to extend various recent results in covariance matrix modelling. This includes providing unrestricted parametrizations of the space of correlation matrices, which is alternative to a recent result utilizing the matrix logarithm.
△ Less
Submitted 7 May, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Graphical model inference with external network data
Authors:
Jack Jewson,
Li Li,
Laura Battaglia,
Stephen Hansen,
David Rossell,
Piotr Zwiernik
Abstract:
We consider two applications where we study how dependence structure between many variables is linked to external network data. We first study the interplay between social media connectedness and the co-evolution of the COVID-19 pandemic across USA counties. We next study study how the dependence between stock market returns across firms relates to similarities in economic and policy indicators fr…
▽ More
We consider two applications where we study how dependence structure between many variables is linked to external network data. We first study the interplay between social media connectedness and the co-evolution of the COVID-19 pandemic across USA counties. We next study study how the dependence between stock market returns across firms relates to similarities in economic and policy indicators from text regulatory filings. Both applications are modelled via Gaussian graphical models where one has external network data. We develop spike-and-slab and graphical LASSO frameworks to integrate the network data, both facilitating the interpretation of the graphical model and improving inference. The goal is to detect when the network data relates to the graphical model and, if so, explain how. We found that counties strongly connected on Facebook are more likely to have similar COVID-19 evolution (positive partial correlations), accounting for various factors driving the mean. We also found that the association in stock market returns depends in a stronger fashion on economic than on policy indicators. The examples show that data integration can improve interpretation, statistical accuracy, and out-of-sample prediction, in some instances using significantly sparser graphical models.
△ Less
Submitted 13 November, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Non-Independent Components Analysis
Authors:
Geert Mesters,
Piotr Zwiernik
Abstract:
A seminal result in the ICA literature states that for $AY = \varepsilon$, if the components of $\varepsilon$ are independent and at most one is Gaussian, then $A$ is identified up to sign and permutation of its rows (Comon, 1994). In this paper we study to which extent the independence assumption can be relaxed by replacing it with restrictions on higher order moment or cumulant tensors of…
▽ More
A seminal result in the ICA literature states that for $AY = \varepsilon$, if the components of $\varepsilon$ are independent and at most one is Gaussian, then $A$ is identified up to sign and permutation of its rows (Comon, 1994). In this paper we study to which extent the independence assumption can be relaxed by replacing it with restrictions on higher order moment or cumulant tensors of $\varepsilon$. We document new conditions that establish identification for several non-independent component models, e.g. common variance models, and propose efficient estimation methods based on the identification results. We show that in situations where independence cannot be assumed the efficiency gains can be significant relative to methods that rely on independence.
△ Less
Submitted 19 March, 2024; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Total positivity in multivariate extremes
Authors:
Frank Röttger,
Sebastian Engelke,
Piotr Zwiernik
Abstract:
Positive dependence is present in many real world data sets and has appealing stochastic properties that can be exploited in statistical modeling and in estimation. In particular, the notion of multivariate total positivity of order 2 ($ \mathrm{MTP}_{2} $) is a convex constraint and acts as an implicit regularizer in the Gaussian case. We study positive dependence in multivariate extremes and int…
▽ More
Positive dependence is present in many real world data sets and has appealing stochastic properties that can be exploited in statistical modeling and in estimation. In particular, the notion of multivariate total positivity of order 2 ($ \mathrm{MTP}_{2} $) is a convex constraint and acts as an implicit regularizer in the Gaussian case. We study positive dependence in multivariate extremes and introduce $ \mathrm{EMTP}_{2} $, an extremal version of $ \mathrm{MTP}_{2} $. This notion turns out to appear prominently in extremes, and in fact, it is satisfied by many classical models. For a Hüsler--Reiss distribution, the analogue of a Gaussian distribution in extremes, we show that it is $ \mathrm{EMTP}_{2} $ if and only if its precision matrix is a Laplacian of a connected graph. We propose an estimator for the parameters of the Hüsler--Reiss distribution under $ \mathrm{EMTP}_{2} $ as the solution of a convex optimization problem with Laplacian constraint. We prove that this estimator is consistent and typically yields a sparse model with possibly nondecomposable extremal graphical structure. Applying our methods to a data set of Danube River flows, we illustrate this regularization and the superior performance compared to existing methods.
△ Less
Submitted 14 June, 2023; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Maximum Likelihood Estimation for Brownian Motion Tree Models Based on One Sample
Authors:
Michael Truell,
Jan-Christian Hütter,
Chandler Squires,
Piotr Zwiernik,
Caroline Uhler
Abstract:
We study the problem of maximum likelihood estimation given one data sample ($n=1$) over Brownian Motion Tree Models (BMTMs), a class of Gaussian models on trees. BMTMs are often used as a null model in phylogenetics, where the one-sample regime is common. Specifically, we show that, almost surely, the one-sample BMTM maximum likelihood estimator (MLE) exists, is unique, and corresponds to a fully…
▽ More
We study the problem of maximum likelihood estimation given one data sample ($n=1$) over Brownian Motion Tree Models (BMTMs), a class of Gaussian models on trees. BMTMs are often used as a null model in phylogenetics, where the one-sample regime is common. Specifically, we show that, almost surely, the one-sample BMTM maximum likelihood estimator (MLE) exists, is unique, and corresponds to a fully observed tree. Moreover, we provide a polynomial time algorithm for its exact computation. We also consider the MLE over all possible BMTM tree structures in the one-sample case and show that it exists almost surely, that it coincides with the MLE over diagonally dominant M-matrices, and that it admits a unique closed-form solution that corresponds to a path graph. Finally, we explore statistical properties of the one-sample BMTM MLE through numerical experiments.
△ Less
Submitted 24 November, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Robust estimation of tree structured models
Authors:
Marta Casanellas,
Marina Garrote-López,
Piotr Zwiernik
Abstract:
Consider the problem of learning undirected graphical models on trees from corrupted data. Recently Katiyar et al. showed that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. Their other paper on the Gaussian case follows a similar pattern. By framing this as a special phylogenetic recovery problem we largely generalize these two settings.…
▽ More
Consider the problem of learning undirected graphical models on trees from corrupted data. Recently Katiyar et al. showed that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. Their other paper on the Gaussian case follows a similar pattern. By framing this as a special phylogenetic recovery problem we largely generalize these two settings. Using the framework of linear latent tree models we discuss tree identifiability for binary data under a continuous corruption model. For the Ising and the Gaussian tree model we also provide a characterisation of when the Chow-Liu algorithm consistently learns the underlying tree from the noisy data.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Locally associated graphical models and mixed convex exponential families
Authors:
Steffen Lauritzen,
Piotr Zwiernik
Abstract:
The notion of multivariate total positivity has proved to be useful in finance and psychology but may be too restrictive in other applications. In this paper we propose a concept of local association, where highly connected components in a graphical model are positively associated and study its properties. Our main motivation comes from gene expression data, where graphical models have become a po…
▽ More
The notion of multivariate total positivity has proved to be useful in finance and psychology but may be too restrictive in other applications. In this paper we propose a concept of local association, where highly connected components in a graphical model are positively associated and study its properties. Our main motivation comes from gene expression data, where graphical models have become a popular exploratory tool. The models are instances of what we term mixed convex exponential families and we show that a mixed dual likelihood estimator has simple exact properties for such families as well as asymptotic properties similar to the maximum likelihood estimator. We further relax the positivity assumption by penalizing negative partial correlations in what we term the positive graphical lasso. Finally, we develop a GOLAZO algorithm based on block-coordinate descent that applies to a number of optimization procedures that arise in the context of graphical models, including the estimation problems described above. We derive results on existence of the optimum for such problems.
△ Less
Submitted 9 February, 2022; v1 submitted 11 August, 2020;
originally announced August 2020.
-
Estimating linear covariance models with numerical nonlinear algebra
Authors:
Bernd Sturmfels,
Sascha Timme,
Piotr Zwiernik
Abstract:
Numerical nonlinear algebra is applied to maximum likelihood estimation for Gaussian models defined by linear constraints on the covariance matrix. We examine the generic case as well as special models (e.g. Toeplitz, sparse, trees) that are of interest in statistics. We study the maximum likelihood degree and its dual analogue, and we introduce a new software package LinearCovarianceModels.jl for…
▽ More
Numerical nonlinear algebra is applied to maximum likelihood estimation for Gaussian models defined by linear constraints on the covariance matrix. We examine the generic case as well as special models (e.g. Toeplitz, sparse, trees) that are of interest in statistics. We study the maximum likelihood degree and its dual analogue, and we introduce a new software package LinearCovarianceModels.jl for solving the score equations. All local maxima can thus be computed reliably. In addition we identify several scenarios for which the estimator is a rational function.
△ Less
Submitted 2 September, 2019;
originally announced September 2019.
-
Learning partial correlation graphs and graphical models by covariance queries
Authors:
Gábor Lugosi,
Jakub Truszkowski,
Vasiliki Velona,
Piotr Zwiernik
Abstract:
We study the problem of recovering the structure underlying large Gaussian graphical models or, more generally, partial correlation graphs. In high-dimensional problems it is often too costly to store the entire sample covariance matrix. We propose a new input model in which one can query single entries of the covariance matrix. We prove that it is possible to recover the support of the inverse co…
▽ More
We study the problem of recovering the structure underlying large Gaussian graphical models or, more generally, partial correlation graphs. In high-dimensional problems it is often too costly to store the entire sample covariance matrix. We propose a new input model in which one can query single entries of the covariance matrix. We prove that it is possible to recover the support of the inverse covariance matrix with low query and computational complexity. Our algorithms work in a regime when this support is represented by tree-like graphs and, more generally, for graphs of small treewidth. Our results demonstrate that for large classes of graphs, the structure of the corresponding partial correlation graphs can be determined much faster than even computing the empirical covariance matrix.
△ Less
Submitted 12 October, 2021; v1 submitted 22 June, 2019;
originally announced June 2019.
-
Total positivity in exponential families with application to binary variables
Authors:
Steffen Lauritzen,
Caroline Uhler,
Piotr Zwiernik
Abstract:
We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting th…
▽ More
We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting the space of canonical parameters with a polyhedral cone whose faces correspond to conditional independence relations. Hence MTP2 serves as an implicit regularizer for quadratic exponential families and leads to sparsity in the estimated graphical model. We prove that the maximum likelihood estimator (MLE) in an MTP2 binary exponential family exists if and only if both of the sign patterns $(1,-1)$ and $(-1,1)$ are represented in the sample for every pair of variables; in particular, this implies that the MLE may exist with $n=d$ observations, in stark contrast to unrestricted binary exponential families where $2^d$ observations are required. Finally, we provide a novel and globally convergent algorithm for computing the MLE for MTP2 Ising models similar to iterative proportional scaling and apply it to the analysis of data from two psychological disorders.
△ Less
Submitted 26 July, 2020; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Latent tree models
Authors:
Piotr Zwiernik
Abstract:
Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, an…
▽ More
Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned.
△ Less
Submitted 2 August, 2017;
originally announced August 2017.
-
Maximum likelihood estimation in Gaussian models under total positivity
Authors:
Steffen Lauritzen,
Caroline Uhler,
Piotr Zwiernik
Abstract:
We analyze the problem of maximum likelihood estimation for Gaussian distributions that are multivariate totally positive of order two (MTP2). By exploiting connections to phylogenetics and single-linkage clustering, we give a simple proof that the maximum likelihood estimator (MLE) for such distributions exists based on at least 2 observations, irrespective of the underlying dimension. Slawski an…
▽ More
We analyze the problem of maximum likelihood estimation for Gaussian distributions that are multivariate totally positive of order two (MTP2). By exploiting connections to phylogenetics and single-linkage clustering, we give a simple proof that the maximum likelihood estimator (MLE) for such distributions exists based on at least 2 observations, irrespective of the underlying dimension. Slawski and Hein, who first proved this result, also provided empirical evidence showing that the MTP2 constraint serves as an implicit regularizer and leads to sparsity in the estimated inverse covariance matrix, determining what we name the ML graph. We show that we can find an upper bound for the ML graph by adding edges corresponding to correlations in excess of those explained by the maximum weight spanning forest of the correlation matrix. Moreover, we provide globally convergent coordinate descent algorithms for calculating the MLE under the MTP2 constraint which are structurally similar to iterative proportional scaling. We conclude the paper with a discussion of signed MTP2 distributions.
△ Less
Submitted 26 May, 2018; v1 submitted 13 February, 2017;
originally announced February 2017.
-
The correlation space of Gaussian latent tree models and model selection without fitting
Authors:
Nathaniel Shiers,
Piotr Zwiernik,
John A. D. Aston,
Jim Q. Smith
Abstract:
We provide a complete description of possible covariance matrices consistent with a Gaussian latent tree model for any tree. We then present techniques for utilising these constraints to assess whether observed data is compatible with that Gaussian latent tree model. Our method does not require us first to fit such a tree. We demonstrate the usefulness of the inverse-Wishart distribution for perfo…
▽ More
We provide a complete description of possible covariance matrices consistent with a Gaussian latent tree model for any tree. We then present techniques for utilising these constraints to assess whether observed data is compatible with that Gaussian latent tree model. Our method does not require us first to fit such a tree. We demonstrate the usefulness of the inverse-Wishart distribution for performing preliminary assessments of tree-compatibility using semialgebraic constraints. Using results from Drton et al. (2008) we then provide the appropriate moments required for test statistics for assessing adherence to these equality constraints. These are shown to be effective even for small sample sizes and can be easily adjusted to test either the entire model or only certain macrostructures hypothesized within the tree. We illustrate our exploratory tetrad analysis using a linguistic application and our confirmatory tetrad analysis using a biological application.
△ Less
Submitted 11 April, 2016; v1 submitted 3 August, 2015;
originally announced August 2015.
-
Marginal likelihood and model selection for Gaussian latent tree and forest models
Authors:
Mathias Drton,
Shaowei Lin,
Luca Weihs,
Piotr Zwiernik
Abstract:
Gaussian latent tree models, or more generally, Gaussian latent forest models have Fisher-information matrices that become singular along interesting submodels, namely, models that correspond to subforests. For these singularities, we compute the real log-canonical thresholds (also known as stochastic complexities or learning coefficients) that quantify the large-sample behavior of the marginal li…
▽ More
Gaussian latent tree models, or more generally, Gaussian latent forest models have Fisher-information matrices that become singular along interesting submodels, namely, models that correspond to subforests. For these singularities, we compute the real log-canonical thresholds (also known as stochastic complexities or learning coefficients) that quantify the large-sample behavior of the marginal likelihood in Bayesian inference. This provides the information needed for a recently introduced generalization of the Bayesian information criterion. Our mathematical developments treat the general setting of Laplace integrals whose phase functions are sums of squared differences between monomials and constants. We clarify how in this case real log-canonical thresholds can be computed using polyhedral geometry, and we show how to apply the general theory to the Laplace integrals associated with Gaussian latent tree and forest models. In simulations and a data example, we demonstrate how the mathematical knowledge can be applied in model selection.
△ Less
Submitted 22 December, 2015; v1 submitted 29 December, 2014;
originally announced December 2014.
-
Binary distributions of concentric rings
Authors:
N. Wermuth G. M. Marchetti P. Zwiernik
Abstract:
We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly-spaced concentric rings. Kronecker product characteriza…
▽ More
We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly-spaced concentric rings. Kronecker product characterizations make them computationally attractive for a large number of variables. We study the behaviour of different measures of dependence and derive maximum likelihood estimates when all nodes are observed and when the inner node is hidden.
△ Less
Submitted 29 July, 2014; v1 submitted 22 November, 2013;
originally announced November 2013.
-
The Dependence of Routine Bayesian Model Selection Methods on Irrelevant Alternatives
Authors:
Piotr Zwiernik,
Jim Q. Smith
Abstract:
Bayesian methods - either based on Bayes Factors or BIC - are now widely used for model selection. One property that might reasonably be demanded of any model selection method is that if a model ${M}_{1}$ is preferred to a model ${M}_{0}$, when these two models are expressed as members of one model class $\mathbb{M}$, this preference is preserved when they are embedded in a different class…
▽ More
Bayesian methods - either based on Bayes Factors or BIC - are now widely used for model selection. One property that might reasonably be demanded of any model selection method is that if a model ${M}_{1}$ is preferred to a model ${M}_{0}$, when these two models are expressed as members of one model class $\mathbb{M}$, this preference is preserved when they are embedded in a different class $\mathbb{M}'$. However, we illustrate in this paper that with the usual implementation of these common Bayesian procedures this property does not hold true even approximately. We therefore contend that to use these methods it is first necessary for there to exist a "natural" embedding class. We argue that in any context like the one illustrated in our running example of Bayesian model selection of binary phylogenetic trees there is no such embedding.
△ Less
Submitted 17 August, 2012;
originally announced August 2012.