-
Learning minimal volume uncertainty ellipsoids
Authors:
Itai Alon,
David Arnon,
Ami Wiesel
Abstract:
We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical…
▽ More
We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical cases, we propose a differentiable optimization approach for approximately computing the optimal ellipsoids using a neural network with proper calibration. Compared to existing methods, our network requires less storage and less computations in inference time, leading to accurate yet smaller ellipsoids. We demonstrate these advantages on four real-world localization datasets.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
On the Optimization Landscape of Maximum Mean Discrepancy
Authors:
Itai Alon,
Amir Globerson,
Ami Wiesel
Abstract:
Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex…
▽ More
Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.
△ Less
Submitted 3 May, 2024; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Learning to Estimate Without Bias
Authors:
Tzvi Diskin,
Yonina C. Eldar,
Ami Wiesel
Abstract:
The Gauss Markov theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. The classical approach to designing non-linear MVUEs is through maximum likelihood estimation (MLE) which often involves c…
▽ More
The Gauss Markov theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. The classical approach to designing non-linear MVUEs is through maximum likelihood estimation (MLE) which often involves computationally challenging optimizations. On the other hand, deep learning methods allow for non-linear estimators with fixed computational complexity. Learning based estimators perform optimally on average with respect to their training set but may suffer from significant bias in other parameters. To avoid this, we propose to add a simple bias constraint to the loss function, resulting in an estimator we refer to as Bias Constrained Estimator (BCE). We prove that this yields asymptotic MVUEs that behave similarly to the classical MLEs and asymptotically attain the Cramer Rao bound. We demonstrate the advantages of our approach in the context of signal to noise ratio estimation as well as covariance estimation. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance. Examples include distributed sensor networks and data augmentation in test-time. In such applications, we show that BCE leads to asymptotically consistent estimators.
△ Less
Submitted 29 November, 2023; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Maximin Optimization for Binary Regression
Authors:
Nisan Chiprut,
Amir Globerson,
Ami Wiesel
Abstract:
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradient ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may l…
▽ More
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradient ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may lead to spurious local minima. Interestingly, we prove that this approach is optimal in linear regression with low noise conditions as well as robust regression with a small number of outliers. Practically, the method also performs well in regression with cross entropy loss, as well as non-convex multi-layer neural networks. Taken together our approach highlights the potential of saddle-point optimization for learning constrained models.
△ Less
Submitted 27 November, 2020; v1 submitted 10 October, 2020;
originally announced October 2020.
-
Convex Nonparanormal Regression
Authors:
Yonatan Woodbridge,
Gal Elidan,
Ami Wiesel
Abstract:
Quantifying uncertainty in predictions or, more generally, estimating the posterior conditional distribution, is a core challenge in machine learning and statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for coping with this task. CNR involves a convex optimization of a posterior defined via a rich dictionary of pre-defined non linear transformati…
▽ More
Quantifying uncertainty in predictions or, more generally, estimating the posterior conditional distribution, is a core challenge in machine learning and statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for coping with this task. CNR involves a convex optimization of a posterior defined via a rich dictionary of pre-defined non linear transformations on Gaussians. It can fit an arbitrary conditional distribution, including multimodal and non-symmetric posteriors. For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean which can be used for point-wise predictions. Finally, we demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
△ Less
Submitted 4 April, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Fair Principal Component Analysis and Filter Design
Authors:
Gad Zalcberg,
Ami Wiesel
Abstract:
We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art a…
▽ More
We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art approach to FPCA is via semidefinite relaxation and involves a polynomial yet computationally expensive optimization. To allow scalability, we propose to address FPCA using naive sub-gradient descent. We analyze the landscape of the underlying optimization in the case of orthogonal targets. We prove that the landscape is benign and that all local minima are globally optimal. Interestingly, the SDR approach leads to sub-optimal solutions in this simple case. Finally, we discuss the equivalence between orthogonal FPCA and the design of normalized tight frames.
△ Less
Submitted 1 June, 2021; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Spectral Algorithm for Low-rank Multitask Regression
Authors:
Yotam Gigi,
Ami Wiesel,
Sella Nevo,
Gal Elidan,
Avinatan Hassidim,
Yossi Matias
Abstract:
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank compone…
▽ More
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank component across all tasks, but allows an individual per-task right low-rank component. This dramatically reduces the number of samples needed for accurate estimation. The problem of jointly recovering the common and the local components has a non-convex bi-linear structure. We overcome this hurdle and provide a provably beneficial non-iterative spectral algorithm. Appealingly, the solution has favorable behavior as a function of the number of related tasks and the small number of samples available for each one. We demonstrate the efficacy of our approach for the challenging task of remote river discharge estimation across multiple river sites, where data for each task is naturally scarce. In this scenario sharing a low-rank component between the tasks translates to a shared spectral reflection of the water, which is a true underlying physical model. We also show the benefit of the approach on the markedly different setting of image classification where the common component can be interpreted as the shared convolution filters.
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
ML for Flood Forecasting at Scale
Authors:
Sella Nevo,
Vova Anisimov,
Gal Elidan,
Ran El-Yaniv,
Pete Giencke,
Yotam Gigi,
Avinatan Hassidim,
Zach Moshe,
Mor Schlesinger,
Guy Shalev,
Ajai Tirumali,
Ami Wiesel,
Oleg Zlydenko,
Yossi Matias
Abstract:
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models oft…
▽ More
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many
Authors:
Yotam Gigi,
Gal Elidan,
Avinatan Hassidim,
Yossi Matias,
Zach Moshe,
Sella Nevo,
Guy Shalev,
Ami Wiesel
Abstract:
Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, which can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at differ…
▽ More
Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, which can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at different river locations. A core characteristic of the data at hand (e.g. satellite measurements) is that we have few measurements for many locations, all sharing the same physics that underlie the water discharge. We capture this scenario in a simple but powerful common mechanism regression (CMR) model with a local component as well as a shared one which captures the global discharge mechanism. The resulting learning objective is non-convex, but we show that we can find its global optimum by leveraging the power of joining local measurements across sites. In particular, using a spectral initialization with provable near-optimal accuracy, we can find the optimum using standard descent methods. We demonstrate the efficacy of our approach for the problem of discharge estimation using simulations.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Learning to Detect
Authors:
Neev Samuel,
Tzvi Diskin,
Ami Wiesel
Abstract:
In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We…
▽ More
In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We compare the accuracy and runtime complexity of the purposed approaches and achieve state-of-the-art performance while maintaining low computational requirements. Furthermore, we manage to train a single network to detect over an entire distribution of channels. Finally, we consider detection with soft outputs and show that the networks can easily be modified to produce soft decisions.
△ Less
Submitted 19 May, 2018;
originally announced May 2018.
-
Deep MIMO Detection
Authors:
Neev Samuel,
Tzvi Diskin,
Ami Wiesel
Abstract:
In this paper, we consider the use of deep neural networks in the context of Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction to deep learning and propose a modern neural network architecture suitable for this detection task. First, we consider the case in which the MIMO channel is constant, and we learn a detector for a specific system. Next, we consider the harder ca…
▽ More
In this paper, we consider the use of deep neural networks in the context of Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction to deep learning and propose a modern neural network architecture suitable for this detection task. First, we consider the case in which the MIMO channel is constant, and we learn a detector for a specific system. Next, we consider the harder case in which the parameters are known yet changing and a single detector must be learned for all multiple varying channels. We demonstrate the performance of our deep MIMO detector using numerical simulations in comparison to competing methods including approximate message passing and semidefinite relaxation. The results show that deep networks can achieve state of the art accuracy with significantly lower complexity while providing robustness against ill conditioned channels and mis-specified noise variance.
△ Less
Submitted 4 June, 2017;
originally announced June 2017.
-
Simultaneous penalized M-estimation of covariance matrices using geodesically convex optimization
Authors:
Esa Ollila,
Ilya Soloveychik,
David E. Tyler,
Ami Wiesel
Abstract:
A common assumption when sampling $p$-dimensional observations from $K$ distinct group is the equality of the covariance matrices. In this paper, we propose two penalized $M$-estimation approaches for the estimation of the covariance or scatter matrices under the broader assumption that they may simply be close to each other, and hence roughly deviate from some positive definite "center". The firs…
▽ More
A common assumption when sampling $p$-dimensional observations from $K$ distinct group is the equality of the covariance matrices. In this paper, we propose two penalized $M$-estimation approaches for the estimation of the covariance or scatter matrices under the broader assumption that they may simply be close to each other, and hence roughly deviate from some positive definite "center". The first approach begins by generating a pooled $M$-estimator of scatter based on all the data, followed by a penalised $M$-estimator of scatter for each group, with the penalty term chosen so that the individual scatter matrices are shrunk towards the pooled scatter matrix. In the second approach, we minimize the sum of the individual group $M$-estimation cost functions together with an additive joint penalty term which enforces some similarity between the individual scatter estimators, i.e. shrinkage towards a mutual center. In both approaches, we utilize the concept of geodesic convexity to prove the existence and uniqueness of the penalized solution under general conditions. We consider three specific penalty functions based on the Euclidean, the Riemannian, and the Kullback-Leibler distances. In the second approach, the distance based penalties are shown to lead to estimators of the mutual center that are related to the arithmetic, the Riemannian and the harmonic means of positive definite matrices, respectively. A penalty based on an ellipticity measure is also considered which is particularly useful for shape matrix estimators. Fixed point equations are derived for each penalty function and the benefits of the estimators are illustrated in regularized discriminant analysis problem.
△ Less
Submitted 29 August, 2016;
originally announced August 2016.
-
Joint Inverse Covariances Estimation with Mutual Linear Structure
Authors:
Ilya Soloveychik,
Ami Wiesel
Abstract:
We consider the problem of joint estimation of structured inverse covariance matrices. We perform the estimation using groups of measurements with different covariances of the same unknown structure. Assuming the inverse covariances to span a low dimensional linear subspace in the space of symmetric matrices, our aim is to determine this structure. It is then utilized to improve the estimation of…
▽ More
We consider the problem of joint estimation of structured inverse covariance matrices. We perform the estimation using groups of measurements with different covariances of the same unknown structure. Assuming the inverse covariances to span a low dimensional linear subspace in the space of symmetric matrices, our aim is to determine this structure. It is then utilized to improve the estimation of the inverse covariances. We propose a novel optimization algorithm discovering and exploiting the underlying structure and provide its efficient implementation. Numerical simulations are presented to illustrate the performance benefits of the proposed algorithm.
△ Less
Submitted 19 November, 2015;
originally announced November 2015.
-
Joint Covariance Estimation with Mutual Linear Structure
Authors:
Ilya Soloveychik,
Ami Wiesel
Abstract:
We consider the problem of joint estimation of structured covariance matrices. Assuming the structure is unknown, estimation is achieved using heterogeneous training sets. Namely, given groups of measurements coming from centered populations with different covariances, our aim is to determine the mutual structure of these covariance matrices and estimate them. Supposing that the covariances span a…
▽ More
We consider the problem of joint estimation of structured covariance matrices. Assuming the structure is unknown, estimation is achieved using heterogeneous training sets. Namely, given groups of measurements coming from centered populations with different covariances, our aim is to determine the mutual structure of these covariance matrices and estimate them. Supposing that the covariances span a low dimensional affine subspace in the space of symmetric matrices, we develop a new efficient algorithm discovering the structure and using it to improve the estimation. Our technique is based on the application of principal component analysis in the matrix space. We also derive an upper performance bound of the proposed algorithm in the Gaussian scenario and compare it with the Cramer-Rao lower bound. Numerical simulations are presented to illustrate the performance benefits of the proposed method.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Group Symmetric Robust Covariance Estimation
Authors:
Ilya Soloveychik,
Dmitry Trushin,
Ami Wiesel
Abstract:
In this paper we consider Tyler's robust covariance M-estimator under group symmetry constraints. We assume that the covariance matrix is invariant to the conjugation action of a unitary matrix group, referred to as group symmetry. Examples of group symmetric structures include circulant, perHermitian and proper quaternion matrices. We introduce a group symmetric version of Tyler's estimator (STyl…
▽ More
In this paper we consider Tyler's robust covariance M-estimator under group symmetry constraints. We assume that the covariance matrix is invariant to the conjugation action of a unitary matrix group, referred to as group symmetry. Examples of group symmetric structures include circulant, perHermitian and proper quaternion matrices. We introduce a group symmetric version of Tyler's estimator (STyler) and provide an iterative fixed point algorithm to compute it. The classical results claim that at least n=p+1 sample points in general position are necessary to ensure the existence and uniqueness of Tyler's estimator, where p is the ambient dimension. We show that the STyler requires significantly less samples. In some groups even two samples are enough to guarantee its existence and uniqueness. In addition, in the case of elliptical populations, we provide high probability bounds on the error of the STyler. These too, quantify the advantage of exploiting the symmetry structure. Finally, these theoretical results are supported by numerical simulations.ted by numerical simulations.
△ Less
Submitted 29 September, 2015; v1 submitted 7 December, 2014;
originally announced December 2014.
-
Tyler's Covariance Matrix Estimator in Elliptical Models with Convex Structure
Authors:
Ilya Soloveychik,
Ami Wiesel
Abstract:
We address structured covariance estimation in elliptical distributions by assuming that the covariance is a priori known to belong to a given convex set, e.g., the set of Toeplitz or banded matrices. We consider the General Method of Moments (GMM) optimization applied to robust Tyler's scatter M-estimator subject to these convex constraints. Unfortunately, GMM turns out to be non-convex due to th…
▽ More
We address structured covariance estimation in elliptical distributions by assuming that the covariance is a priori known to belong to a given convex set, e.g., the set of Toeplitz or banded matrices. We consider the General Method of Moments (GMM) optimization applied to robust Tyler's scatter M-estimator subject to these convex constraints. Unfortunately, GMM turns out to be non-convex due to the objective. Instead, we propose a new COCA estimator - a convex relaxation which can be efficiently solved. We prove that the relaxation is tight in the unconstrained case for a finite number of samples, and in the constrained case asymptotically. We then illustrate the advantages of COCA in synthetic simulations with structured compound Gaussian distributions. In these examples, COCA outperforms competing methods such as Tyler's estimator and its projection onto the structure set.
△ Less
Submitted 11 September, 2014; v1 submitted 7 April, 2014;
originally announced April 2014.
-
Compressed matched filter for non-Gaussian noise
Authors:
Jakob Vovnoboy,
Ami Wiesel
Abstract:
We consider estimation of a deterministic unknown parameter vector in a linear model with non-Gaussian noise. In the Gaussian case, dimensionality reduction via a linear matched filter provides a simple low dimensional sufficient statistic which can be easily communicated and/or stored for future inference. Such a statistic is usually unknown in the general non-Gaussian case. Instead, we propose a…
▽ More
We consider estimation of a deterministic unknown parameter vector in a linear model with non-Gaussian noise. In the Gaussian case, dimensionality reduction via a linear matched filter provides a simple low dimensional sufficient statistic which can be easily communicated and/or stored for future inference. Such a statistic is usually unknown in the general non-Gaussian case. Instead, we propose a hybrid matched filter coupled with a randomized compressed sensing procedure, which together create a low dimensional statistic. We also derive a complementary algorithm for robust reconstruction given this statistic. Our recovery method is based on the fast iterative shrinkage and thresholding algorithm which is used for outlier rejection given the compressed data. We demonstrate the advantages of the proposed framework using synthetic simulations.
△ Less
Submitted 3 November, 2013;
originally announced November 2013.
-
Group Symmetry and non-Gaussian Covariance Estimation
Authors:
Ilya Soloveychik,
Ami Wiesel
Abstract:
We consider robust covariance estimation with group symmetry constraints. Non-Gaussian covariance estimation, e.g., Tyler scatter estimator and Multivariate Generalized Gaussian distribution methods, usually involve non-convex minimization problems. Recently, it was shown that the underlying principle behind their success is an extended form of convexity over the geodesics in the manifold of posit…
▽ More
We consider robust covariance estimation with group symmetry constraints. Non-Gaussian covariance estimation, e.g., Tyler scatter estimator and Multivariate Generalized Gaussian distribution methods, usually involve non-convex minimization problems. Recently, it was shown that the underlying principle behind their success is an extended form of convexity over the geodesics in the manifold of positive definite matrices. A modern approach to improve estimation accuracy is to exploit prior knowledge via additional constraints, e.g., restricting the attention to specific classes of covariances which adhere to prior symmetry structures. In this paper, we prove that such group symmetry constraints are also geodesically convex and can therefore be incorporated into various non-Gaussian covariance estimators. Practical examples of such sets include: circulant, persymmetric and complex/quaternion proper structures. We provide a simple numerical technique for finding maximum likelihood estimates under such constraints, and demonstrate their performance advantage using synthetic experiments.
△ Less
Submitted 18 June, 2013;
originally announced June 2013.
-
Multivariate Generalized Gaussian Distribution: Convexity and Graphical Models
Authors:
Teng Zhang,
Ami Wiesel,
Maria Sabrina Grec
Abstract:
We consider covariance estimation in the multivariate generalized Gaussian distribution (MGGD) and elliptically symmetric (ES) distribution. The maximum likelihood optimization associated with this problem is non-convex, yet it has been proved that its global solution can be often computed via simple fixed point iterations. Our first contribution is a new analysis of this likelihood based on geode…
▽ More
We consider covariance estimation in the multivariate generalized Gaussian distribution (MGGD) and elliptically symmetric (ES) distribution. The maximum likelihood optimization associated with this problem is non-convex, yet it has been proved that its global solution can be often computed via simple fixed point iterations. Our first contribution is a new analysis of this likelihood based on geodesic convexity that requires weaker assumptions. Our second contribution is a generalized framework for structured covariance estimation under sparsity constraints. We show that the optimizations can be formulated as convex minimization as long the MGGD shape parameter is larger than half and the sparsity pattern is chordal. These include, for example, maximum likelihood estimation of banded inverse covariances in multivariate Laplace distributions, which are associated with time varying autoregressive processes.
△ Less
Submitted 31 August, 2013; v1 submitted 11 April, 2013;
originally announced April 2013.
-
Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models
Authors:
Zhaoshi Meng,
Dennis Wei,
Ami Wiesel,
Alfred O. Hero III
Abstract:
We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstabl…
▽ More
We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.
△ Less
Submitted 13 August, 2014; v1 submitted 19 March, 2013;
originally announced March 2013.
-
Robust Shrinkage Estimation of High-dimensional Covariance Matrices
Authors:
Yilun Chen,
Ami Wiesel,
Alfred O. Hero III
Abstract:
We address high dimensional covariance estimation for elliptical distributed samples, which are also known as spherically invariant random vectors (SIRV) or compound-Gaussian processes. Specifically we consider shrinkage methods that are suitable for high dimensional problems with a small number of samples (large $p$ small $n$). We start from a classical robust covariance estimator [Tyler(1987)],…
▽ More
We address high dimensional covariance estimation for elliptical distributed samples, which are also known as spherically invariant random vectors (SIRV) or compound-Gaussian processes. Specifically we consider shrinkage methods that are suitable for high dimensional problems with a small number of samples (large $p$ small $n$). We start from a classical robust covariance estimator [Tyler(1987)], which is distribution-free within the family of elliptical distribution but inapplicable when $n<p$. Using a shrinkage coefficient, we regularize Tyler's fixed point iterations. We prove that, for all $n$ and $p$, the proposed fixed point iterations converge to a unique limit regardless of the initial condition. Next, we propose a simple, closed-form and data dependent choice for the shrinkage coefficient, which is based on a minimum mean squared error framework. Simulations demonstrate that the proposed method achieves low estimation error and is robust to heavy-tailed samples. Finally, as a real world application we demonstrate the performance of the proposed technique in the context of activity/intrusion detection using a wireless sensor network.
△ Less
Submitted 27 September, 2010;
originally announced September 2010.
-
Shrinkage Algorithms for MMSE Covariance Estimation
Authors:
Yilun Chen,
Ami Wiesel,
Yonina C. Eldar,
Alfred O. Hero III
Abstract:
We address covariance estimation in the sense of minimum mean-squared error (MMSE) for Gaussian samples. Specifically, we consider shrinkage methods which are suitable for high dimensional problems with a small number of samples (large p small n). First, we improve on the Ledoit-Wolf (LW) method by conditioning on a sufficient statistic. By the Rao-Blackwell theorem, this yields a new estimator…
▽ More
We address covariance estimation in the sense of minimum mean-squared error (MMSE) for Gaussian samples. Specifically, we consider shrinkage methods which are suitable for high dimensional problems with a small number of samples (large p small n). First, we improve on the Ledoit-Wolf (LW) method by conditioning on a sufficient statistic. By the Rao-Blackwell theorem, this yields a new estimator called RBLW, whose mean-squared error dominates that of LW for Gaussian variables. Second, to further reduce the estimation error, we propose an iterative approach which approximates the clairvoyant shrinkage estimator. Convergence of this iterative method is established and a closed form expression for the limit is determined, which is referred to as the oracle approximating shrinkage (OAS) estimator. Both RBLW and OAS estimators have simple expressions and are easily implemented. Although the two methods are developed from different persepctives, their structure is identical up to specified constants. The RBLW estimator provably dominates the LW method. Numerical simulations demonstrate that the OAS approach can perform even better than RBLW, especially when n is much less than p. We also demonstrate the performance of these techniques in the context of adaptive beamforming.
△ Less
Submitted 27 July, 2009;
originally announced July 2009.
-
Decomposable Principal Component Analysis
Authors:
Ami Wiesel,
Alfred O. Hero III
Abstract:
We consider principal component analysis (PCA) in decomposable Gaussian graphical models. We exploit the prior information in these models in order to distribute its computation. For this purpose, we reformulate the problem in the sparse inverse covariance (concentration) domain and solve the global eigenvalue problem using a sequence of local eigenvalue problems in each of the cliques of the de…
▽ More
We consider principal component analysis (PCA) in decomposable Gaussian graphical models. We exploit the prior information in these models in order to distribute its computation. For this purpose, we reformulate the problem in the sparse inverse covariance (concentration) domain and solve the global eigenvalue problem using a sequence of local eigenvalue problems in each of the cliques of the decomposable graph. We demonstrate the application of our methodology in the context of decentralized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical graphical model and distribute the computation of PCA.
△ Less
Submitted 18 August, 2008;
originally announced August 2008.
-
A greedy approach to sparse canonical correlation analysis
Authors:
Ami Wiesel,
Mark Kliger,
Alfred O. Hero III
Abstract:
We consider the problem of sparse canonical correlation analysis (CCA), i.e., the search for two linear combinations, one for each multivariate, that yield maximum correlation using a specified number of variables. We propose an efficient numerical approximation based on a direct greedy approach which bounds the correlation at each stage. The method is specifically designed to cope with large da…
▽ More
We consider the problem of sparse canonical correlation analysis (CCA), i.e., the search for two linear combinations, one for each multivariate, that yield maximum correlation using a specified number of variables. We propose an efficient numerical approximation based on a direct greedy approach which bounds the correlation at each stage. The method is specifically designed to cope with large data sets and its computational complexity depends only on the sparsity levels. We analyze the algorithm's performance through the tradeoff between correlation and parsimony. The results of numerical simulation suggest that a significant portion of the correlation may be captured using a relatively small number of variables. In addition, we examine the use of sparse CCA as a regularization method when the number of available samples is small compared to the dimensions of the multivariates.
△ Less
Submitted 17 January, 2008;
originally announced January 2008.