-
Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach based on mixture modeling and optimal transport
Authors:
Martin Slawski,
Bodhisattva Sen
Abstract:
Suppose that we have a regression problem with response variable Y in $\mathbb{R}^d$ and predictor X in $\mathbb{R}^d$, for $d \geq 1$. In permuted or unlinked regression we have access to separate unordered data on X and Y, as opposed to data on (X,Y)-pairs in usual regression. So far in the literature the case $d=1$ has received attention, see e.g., the recent papers by Rigollet and Weed [Inform…
▽ More
Suppose that we have a regression problem with response variable Y in $\mathbb{R}^d$ and predictor X in $\mathbb{R}^d$, for $d \geq 1$. In permuted or unlinked regression we have access to separate unordered data on X and Y, as opposed to data on (X,Y)-pairs in usual regression. So far in the literature the case $d=1$ has received attention, see e.g., the recent papers by Rigollet and Weed [Information & Inference, 8, 619--717] and Balabdaoui et al. [J. Mach. Learn. Res., 22(172), 1--60]. In this paper, we consider the general multivariate setting with $d \geq 1$. We show that the notion of cyclical monotonicity of the regression function is sufficient for identification and estimation in the permuted/unlinked regression model. We study permutation recovery in the permuted regression setting and develop a computationally efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz [Ann. Math. Statist., 27, 887--906] nonparametric maximum likelihood estimator and techniques from the theory of optimal transport. We provide explicit upper bounds on the associated mean squared denoising error for Gaussian noise. As in previous work on the case $d = 1$, the permuted/unlinked setting involves slow (logarithmic) rates of convergence rooting in the underlying deconvolution problem. Numerical studies corroborate our theoretical analysis and show that the proposed approach performs at least on par with the methods in the aforementioned prior work in the case $d = 1$ while achieving substantial reductions in terms of computational complexity.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Linear Regression with Sparsely Permuted Data
Authors:
Martin Slawski,
Emanuel Ben-David
Abstract:
In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permut…
▽ More
In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.
△ Less
Submitted 15 November, 2017; v1 submitted 16 October, 2017;
originally announced October 2017.
-
On Principal Components Regression, Random Projections, and Column Subsampling
Authors:
Martin Slawski
Abstract:
Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computati…
▽ More
Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of Johnson-Lindenstrauss transforms. We provide numerical results based on synthetic and real data as well as basic theory revealing differences and commonalities in terms of statistical performance.
△ Less
Submitted 7 October, 2017; v1 submitted 23 September, 2017;
originally announced September 2017.
-
Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov Random fields
Authors:
Martin Slawski,
Matthias Hein
Abstract:
Consider a random vector with finite second moments. If its precision matrix is an M-matrix, then all partial correlations are non-negative. If that random vector is additionally Gaussian, the corresponding Markov random field (GMRF) is called attractive. We study estimation of M-matrices taking the role of inverse second moment or precision matrices using sign-constrained log-determinant divergen…
▽ More
Consider a random vector with finite second moments. If its precision matrix is an M-matrix, then all partial correlations are non-negative. If that random vector is additionally Gaussian, the corresponding Markov random field (GMRF) is called attractive. We study estimation of M-matrices taking the role of inverse second moment or precision matrices using sign-constrained log-determinant divergence minimization. We also treat the high-dimensional case with the number of variables exceeding the sample size. The additional sign-constraints turn out to greatly simplify the estimation problem: we provide evidence that explicit regularization is no longer required. To solve the resulting convex optimization problem, we propose an algorithm based on block coordinate descent, in which each sub-problem can be recast as non-negative least squares problem. Illustrations on both simulated and real world data are provided.
△ Less
Submitted 26 April, 2014;
originally announced April 2014.
-
Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization
Authors:
Martin Slawski,
Matthias Hein
Abstract:
Least squares fitting is in general not useful for high-dimensional linear models, in which the number of predictors is of the same or even larger order of magnitude than the number of samples. Theory developed in recent years has coined a paradigm according to which sparsity-promoting regularization is regarded as a necessity in such setting. Deviating from this paradigm, we show that non-negativ…
▽ More
Least squares fitting is in general not useful for high-dimensional linear models, in which the number of predictors is of the same or even larger order of magnitude than the number of samples. Theory developed in recent years has coined a paradigm according to which sparsity-promoting regularization is regarded as a necessity in such setting. Deviating from this paradigm, we show that non-negativity constraints on the regression coefficients may be similarly effective as explicit regularization if the design matrix has additional properties, which are met in several applications of non-negative least squares (NNLS). We show that for these designs, the performance of NNLS with regard to prediction and estimation is comparable to that of the lasso. We argue further that in specific cases, NNLS may have a better $\ell_{\infty}$-rate in estimation and hence also advantages with respect to support recovery when combined with thresholding. From a practical point of view, NNLS does not depend on a regularization parameter and is hence easier to use.
△ Less
Submitted 12 February, 2014; v1 submitted 4 May, 2012;
originally announced May 2012.