-
Learning minimal volume uncertainty ellipsoids
Authors:
Itai Alon,
David Arnon,
Ami Wiesel
Abstract:
We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical…
▽ More
We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical cases, we propose a differentiable optimization approach for approximately computing the optimal ellipsoids using a neural network with proper calibration. Compared to existing methods, our network requires less storage and less computations in inference time, leading to accurate yet smaller ellipsoids. We demonstrate these advantages on four real-world localization datasets.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Self-Supervised Learning for Covariance Estimation
Authors:
Tzvi Diskin,
Ami Wiesel
Abstract:
We consider the use of deep learning for covariance estimation. We propose to globally learn a neural network that will then be applied locally at inference time. Leveraging recent advancements in self-supervised foundational models, we train the network without any labeling by simply masking different samples and learning to predict their covariance given their surrounding neighbors. The architec…
▽ More
We consider the use of deep learning for covariance estimation. We propose to globally learn a neural network that will then be applied locally at inference time. Leveraging recent advancements in self-supervised foundational models, we train the network without any labeling by simply masking different samples and learning to predict their covariance given their surrounding neighbors. The architecture is based on the popular attention mechanism. Its main advantage over classical methods is the automatic exploitation of global characteristics without any distributional assumptions or regularization. It can be pre-trained as a foundation model and then be repurposed for various downstream tasks, e.g., adaptive target detection in radar or hyperspectral imagery.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
CFARnet: deep learning for target detection with constant false alarm rate
Authors:
Tzvi Diskin,
Yiftach Beer,
Uri Okun,
Ami Wiesel
Abstract:
We consider the problem of target detection with a constant false alarm rate (CFAR). This constraint is crucial in many practical applications and is a standard requirement in classical composite hypothesis testing. In settings where classical approaches are computationally expensive or where only data samples are given, machine learning methodologies are advantageous. CFAR is less understood in t…
▽ More
We consider the problem of target detection with a constant false alarm rate (CFAR). This constraint is crucial in many practical applications and is a standard requirement in classical composite hypothesis testing. In settings where classical approaches are computationally expensive or where only data samples are given, machine learning methodologies are advantageous. CFAR is less understood in these settings. To close this gap, we introduce a framework of CFAR constrained detectors. Theoretically, we prove that a CFAR constrained Bayes optimal detector is asymptotically equivalent to the classical generalized likelihood ratio test (GLRT). Practically, we develop a deep learning framework for fitting neural networks that approximate it. Experiments of target detection in different setting demonstrate that the proposed CFARnet allows a flexible tradeoff between CFAR and accuracy.
△ Less
Submitted 15 November, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Learning to Detect with Constant False Alarm Rate
Authors:
Tzvi Diskin,
Uri Okun,
Ami Wiesel
Abstract:
We consider the use of machine learning for hypothesis testing with an emphasis on target detection. Classical model-based solutions rely on comparing likelihoods. These are sensitive to imperfect models and are often computationally expensive. In contrast, data-driven machine learning is often more robust and yields classifiers with fixed computational complexity. Learned detectors usually provid…
▽ More
We consider the use of machine learning for hypothesis testing with an emphasis on target detection. Classical model-based solutions rely on comparing likelihoods. These are sensitive to imperfect models and are often computationally expensive. In contrast, data-driven machine learning is often more robust and yields classifiers with fixed computational complexity. Learned detectors usually provide high accuracy with low complexity but do not have a constant false alarm rate (CFAR) as required in many applications. To close this gap, we propose to add a term to the loss function that promotes similar distributions of the detector under any null hypothesis scenario. Experiments show that our approach leads to near CFAR detectors with similar accuracy as their competitors.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
On the Optimization Landscape of Maximum Mean Discrepancy
Authors:
Itai Alon,
Amir Globerson,
Ami Wiesel
Abstract:
Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex…
▽ More
Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.
△ Less
Submitted 3 May, 2024; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Learning to Estimate Without Bias
Authors:
Tzvi Diskin,
Yonina C. Eldar,
Ami Wiesel
Abstract:
The Gauss Markov theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. The classical approach to designing non-linear MVUEs is through maximum likelihood estimation (MLE) which often involves c…
▽ More
The Gauss Markov theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. The classical approach to designing non-linear MVUEs is through maximum likelihood estimation (MLE) which often involves computationally challenging optimizations. On the other hand, deep learning methods allow for non-linear estimators with fixed computational complexity. Learning based estimators perform optimally on average with respect to their training set but may suffer from significant bias in other parameters. To avoid this, we propose to add a simple bias constraint to the loss function, resulting in an estimator we refer to as Bias Constrained Estimator (BCE). We prove that this yields asymptotic MVUEs that behave similarly to the classical MLEs and asymptotically attain the Cramer Rao bound. We demonstrate the advantages of our approach in the context of signal to noise ratio estimation as well as covariance estimation. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance. Examples include distributed sensor networks and data augmentation in test-time. In such applications, we show that BCE leads to asymptotically consistent estimators.
△ Less
Submitted 29 November, 2023; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Conditional Frechet Inception Distance
Authors:
Michael Soloveitchik,
Tzvi Diskin,
Efrat Morin,
Ami Wiesel
Abstract:
We consider distance functions between conditional distributions. We focus on the Wasserstein metric and its Gaussian case known as the Frechet Inception Distance (FID). We develop conditional versions of these metrics, analyze their relations and provide a closed form solution to the conditional FID (CFID) metric. We numerically compare the metrics in the context of performance evaluation of mode…
▽ More
We consider distance functions between conditional distributions. We focus on the Wasserstein metric and its Gaussian case known as the Frechet Inception Distance (FID). We develop conditional versions of these metrics, analyze their relations and provide a closed form solution to the conditional FID (CFID) metric. We numerically compare the metrics in the context of performance evaluation of modern conditional generative models. Our results show the advantages of CFID compared to the classical FID and mean squared error (MSE) measures. In contrast to FID, CFID is useful in identifying failures where realistic outputs which are not related to their inputs are generated. On the other hand, compared to MSE, CFID is useful in identifying failures where a single realistic output is generated even though there is a diverse set of equally probable outputs.
△ Less
Submitted 28 February, 2022; v1 submitted 21 March, 2021;
originally announced March 2021.
-
Maximin Optimization for Binary Regression
Authors:
Nisan Chiprut,
Amir Globerson,
Ami Wiesel
Abstract:
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradient ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may l…
▽ More
We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of the gradient ascent-descent method. Such maximin techniques are still poorly understood even in the concave-convex case. The non-convex binary constraints may lead to spurious local minima. Interestingly, we prove that this approach is optimal in linear regression with low noise conditions as well as robust regression with a small number of outliers. Practically, the method also performs well in regression with cross entropy loss, as well as non-convex multi-layer neural networks. Taken together our approach highlights the potential of saddle-point optimization for learning constrained models.
△ Less
Submitted 27 November, 2020; v1 submitted 10 October, 2020;
originally announced October 2020.
-
Convex Nonparanormal Regression
Authors:
Yonatan Woodbridge,
Gal Elidan,
Ami Wiesel
Abstract:
Quantifying uncertainty in predictions or, more generally, estimating the posterior conditional distribution, is a core challenge in machine learning and statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for coping with this task. CNR involves a convex optimization of a posterior defined via a rich dictionary of pre-defined non linear transformati…
▽ More
Quantifying uncertainty in predictions or, more generally, estimating the posterior conditional distribution, is a core challenge in machine learning and statistics. We introduce Convex Nonparanormal Regression (CNR), a conditional nonparanormal approach for coping with this task. CNR involves a convex optimization of a posterior defined via a rich dictionary of pre-defined non linear transformations on Gaussians. It can fit an arbitrary conditional distribution, including multimodal and non-symmetric posteriors. For the special but powerful case of a piecewise linear dictionary, we provide a closed form of the posterior mean which can be used for point-wise predictions. Finally, we demonstrate the advantages of CNR over classical competitors using synthetic and real world data.
△ Less
Submitted 4 April, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
PnP-Net: A hybrid Perspective-n-Point Network
Authors:
Roy Sheffer,
Ami Wiesel
Abstract:
We consider the robust Perspective-n-Point (PnP) problem using a hybrid approach that combines deep learning with model based algorithms. PnP is the problem of estimating the pose of a calibrated camera given a set of 3D points in the world and their corresponding 2D projections in the image. In its more challenging robust version, some of the correspondences may be mismatched and must be efficien…
▽ More
We consider the robust Perspective-n-Point (PnP) problem using a hybrid approach that combines deep learning with model based algorithms. PnP is the problem of estimating the pose of a calibrated camera given a set of 3D points in the world and their corresponding 2D projections in the image. In its more challenging robust version, some of the correspondences may be mismatched and must be efficiently discarded. Classical solutions address PnP via iterative robust non-linear least squares method that exploit the problem's geometry but are either inaccurate or computationally intensive. In contrast, we propose to combine a deep learning initial phase followed by a model-based fine tuning phase. This hybrid approach, denoted by PnP-Net, succeeds in estimating the unknown pose parameters under correspondence errors and noise, with low and fixed computational complexity requirements. We demonstrate its advantages on both synthetic data and real world data.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Fair Principal Component Analysis and Filter Design
Authors:
Gad Zalcberg,
Ami Wiesel
Abstract:
We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art a…
▽ More
We consider Fair Principal Component Analysis (FPCA) and search for a low dimensional subspace that spans multiple target vectors in a fair manner. FPCA is defined as a non-concave maximization of the worst projected target norm within a given set. The problem arises in filter design in signal processing, and when incorporating fairness into dimensionality reduction schemes. The state of the art approach to FPCA is via semidefinite relaxation and involves a polynomial yet computationally expensive optimization. To allow scalability, we propose to address FPCA using naive sub-gradient descent. We analyze the landscape of the underlying optimization in the case of orthogonal targets. We prove that the landscape is benign and that all local minima are globally optimal. Interestingly, the SDR approach leads to sub-optimal solutions in this simple case. Finally, we discuss the equivalence between orthogonal FPCA and the design of normalized tight frames.
△ Less
Submitted 1 June, 2021; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Spectral Algorithm for Low-rank Multitask Regression
Authors:
Yotam Gigi,
Ami Wiesel,
Sella Nevo,
Gal Elidan,
Avinatan Hassidim,
Yossi Matias
Abstract:
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank compone…
▽ More
Multitask learning, i.e. taking advantage of the relatedness of individual tasks in order to improve performance on all of them, is a core challenge in the field of machine learning. We focus on matrix regression tasks where the rank of the weight matrix is constrained to reduce sample complexity. We introduce the common mechanism regression (CMR) model which assumes a shared left low-rank component across all tasks, but allows an individual per-task right low-rank component. This dramatically reduces the number of samples needed for accurate estimation. The problem of jointly recovering the common and the local components has a non-convex bi-linear structure. We overcome this hurdle and provide a provably beneficial non-iterative spectral algorithm. Appealingly, the solution has favorable behavior as a function of the number of related tasks and the small number of samples available for each one. We demonstrate the efficacy of our approach for the challenging task of remote river discharge estimation across multiple river sites, where data for each task is naturally scarce. In this scenario sharing a low-rank component between the tasks translates to a shared spectral reflection of the water, which is a true underlying physical model. We also show the benefit of the approach on the markedly different setting of image classification where the common component can be interpreted as the shared convolution filters.
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
ML for Flood Forecasting at Scale
Authors:
Sella Nevo,
Vova Anisimov,
Gal Elidan,
Ran El-Yaniv,
Pete Giencke,
Yotam Gigi,
Avinatan Hassidim,
Zach Moshe,
Mor Schlesinger,
Guy Shalev,
Ajai Tirumali,
Ami Wiesel,
Oleg Zlydenko,
Yossi Matias
Abstract:
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models oft…
▽ More
Effective riverine flood forecasting at scale is hindered by a multitude of factors, most notably the need to rely on human calibration in current methodology, the limited amount of data for a specific location, and the computational difficulty of building continent/global level models that are sufficiently accurate. Machine learning (ML) is primed to be useful in this scenario: learned models often surpass human experts in complex high-dimensional scenarios, and the framework of transfer or multitask learning is an appealing solution for leveraging local signals to achieve improved global performance. We propose to build on these strengths and develop ML systems for timely and accurate riverine flood prediction.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many
Authors:
Yotam Gigi,
Gal Elidan,
Avinatan Hassidim,
Yossi Matias,
Zach Moshe,
Sella Nevo,
Guy Shalev,
Ami Wiesel
Abstract:
Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, which can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at differ…
▽ More
Learning hydrologic models for accurate riverine flood prediction at scale is a challenge of great importance. One of the key difficulties is the need to rely on in-situ river discharge measurements, which can be quite scarce and unreliable, particularly in regions where floods cause the most damage every year. Accordingly, in this work we tackle the problem of river discharge estimation at different river locations. A core characteristic of the data at hand (e.g. satellite measurements) is that we have few measurements for many locations, all sharing the same physics that underlie the water discharge. We capture this scenario in a simple but powerful common mechanism regression (CMR) model with a local component as well as a shared one which captures the global discharge mechanism. The resulting learning objective is non-convex, but we show that we can find its global optimum by leveraging the power of joining local measurements across sites. In particular, using a spectral initialization with provable near-optimal accuracy, we can find the optimum using standard descent methods. We demonstrate the efficacy of our approach for the problem of discharge estimation using simulations.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Learning to Detect
Authors:
Neev Samuel,
Tzvi Diskin,
Ami Wiesel
Abstract:
In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We…
▽ More
In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We compare the accuracy and runtime complexity of the purposed approaches and achieve state-of-the-art performance while maintaining low computational requirements. Furthermore, we manage to train a single network to detect over an entire distribution of channels. Finally, we consider detection with soft outputs and show that the networks can easily be modified to produce soft decisions.
△ Less
Submitted 19 May, 2018;
originally announced May 2018.
-
Deep MIMO Detection
Authors:
Neev Samuel,
Tzvi Diskin,
Ami Wiesel
Abstract:
In this paper, we consider the use of deep neural networks in the context of Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction to deep learning and propose a modern neural network architecture suitable for this detection task. First, we consider the case in which the MIMO channel is constant, and we learn a detector for a specific system. Next, we consider the harder ca…
▽ More
In this paper, we consider the use of deep neural networks in the context of Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction to deep learning and propose a modern neural network architecture suitable for this detection task. First, we consider the case in which the MIMO channel is constant, and we learn a detector for a specific system. Next, we consider the harder case in which the parameters are known yet changing and a single detector must be learned for all multiple varying channels. We demonstrate the performance of our deep MIMO detector using numerical simulations in comparison to competing methods including approximate message passing and semidefinite relaxation. The results show that deep networks can achieve state of the art accuracy with significantly lower complexity while providing robustness against ill conditioned channels and mis-specified noise variance.
△ Less
Submitted 4 June, 2017;
originally announced June 2017.
-
Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models
Authors:
Zhaoshi Meng,
Dennis Wei,
Ami Wiesel,
Alfred O. Hero III
Abstract:
We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstabl…
▽ More
We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.
△ Less
Submitted 13 August, 2014; v1 submitted 19 March, 2013;
originally announced March 2013.