-
Sufficient dimension reduction for regression with spatially correlated errors: application to prediction
Authors:
Liliana Forzani,
Rodrigo García Arancibia,
Antonella Gieco,
Pamela Llop,
Anne Yao
Abstract:
In this paper, we address the problem of predicting a response variable in the context of both, spatially correlated and high-dimensional data. To reduce the dimensionality of the predictor variables, we apply the sufficient dimension reduction (SDR) paradigm, which reduces the predictor space while retaining relevant information about the response. To achieve this, we impose two different spatial…
▽ More
In this paper, we address the problem of predicting a response variable in the context of both, spatially correlated and high-dimensional data. To reduce the dimensionality of the predictor variables, we apply the sufficient dimension reduction (SDR) paradigm, which reduces the predictor space while retaining relevant information about the response. To achieve this, we impose two different spatial models on the inverse regression: the separable spatial covariance model (SSCM) and the spatial autoregressive error model (SEM). For these models, we derive maximum likelihood estimators for the reduction and use them to predict the response via nonparametric rules for forward regression. Through simulations and real data applications, we demonstrate the effectiveness of our approach for spatial data prediction.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Sufficient reductions in regression with mixed predictors
Authors:
Efstathia Bura,
Liliana Forzani,
Rodrigo García Arancibia,
Pamela Llop,
Diego Tomassi
Abstract:
Most data sets comprise of measurements on continuous and categorical variables. In regression and classification Statistics literature, modeling high-dimensional mixed predictors has received limited attention. In this paper we study the general regression problem of inferring on a variable of interest based on high dimensional mixed continuous and binary predictors. The aim is to find a lower di…
▽ More
Most data sets comprise of measurements on continuous and categorical variables. In regression and classification Statistics literature, modeling high-dimensional mixed predictors has received limited attention. In this paper we study the general regression problem of inferring on a variable of interest based on high dimensional mixed continuous and binary predictors. The aim is to find a lower dimensional function of the mixed predictor vector that contains all the modeling information in the mixed predictors for the response, which can be either continuous or categorical. The approach we propose identifies sufficient reductions by reversing the regression and modeling the mixed predictors conditional on the response. We derive the maximum likelihood estimator of the sufficient reductions, asymptotic tests for dimension, and a regularized estimator, which simultaneously achieves variable (feature) selection and dimension reduction (feature extraction). We study the performance of the proposed method and compare it with other approaches through simulations and real data examples.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Envelopes for multivariate linear regression with linearly constrained coefficients
Authors:
Dennis Cook,
Liliana Forzani,
Lan Liu
Abstract:
A constrained multivariate linear model is a multivariate linear model with the columns of its coefficient matrix constrained to lie in a known subspace. This class of models includes those typically used to study growth curves and longitudinal data. Envelope methods have been proposed to improve estimation efficiency in the class of unconstrained multivariate linear models, but have not yet been…
▽ More
A constrained multivariate linear model is a multivariate linear model with the columns of its coefficient matrix constrained to lie in a known subspace. This class of models includes those typically used to study growth curves and longitudinal data. Envelope methods have been proposed to improve estimation efficiency in the class of unconstrained multivariate linear models, but have not yet been developed for constrained models that we develop in this article. We first compare the standard envelope estimator based on an unconstrained multivariate model with the standard estimator arising from a constrained multivariate model in terms of bias and efficiency. Then, to further improve efficiency, we propose a novel envelope estimator based on a constrained multivariate model. Novel envelope-based testing methods are also proposed. We provide support for our proposals by simulations and by studying the classical dental data and data from the China Health and Nutrition Survey and a study of probiotic capacity to reduced Salmonella infection.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
Fundamentals of path analysis in the social sciences
Authors:
R. Dennis Cook,
Liliana Forzani
Abstract:
Motivated by a recent series of diametrically opposed articles on the relative value of statistical methods for the analysis of path diagrams in the social sciences, we discuss from a primarily theoretical perspective selected fundamental aspects of path modeling and analysis based on a common re reflexive setting. Since there is a paucity of technical support evident in the debate, our aim is to…
▽ More
Motivated by a recent series of diametrically opposed articles on the relative value of statistical methods for the analysis of path diagrams in the social sciences, we discuss from a primarily theoretical perspective selected fundamental aspects of path modeling and analysis based on a common re reflexive setting. Since there is a paucity of technical support evident in the debate, our aim is to connect it to mainline statistics literature and to address selected foundational issues that may help move the discourse. We do not intend to advocate for or against a particular method or analysis philosophy.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Asymptotic theory for maximum likelihood estimates in reduced-rank multivariate generalised linear models
Authors:
Efstathia Bura,
Sabrina Duarte,
Liliana Forzani,
Ezequiel Smucler,
Mariela Sued
Abstract:
Reduced-rank regression is a dimensionality reduction method with many applications. The asymptotic theory for reduced rank estimators of parameter matrices in multivariate linear models has been studied extensively. In contrast, few theoretical results are available for reduced-rank multivariate generalised linear models. We develop M-estimation theory for concave criterion functions that are max…
▽ More
Reduced-rank regression is a dimensionality reduction method with many applications. The asymptotic theory for reduced rank estimators of parameter matrices in multivariate linear models has been studied extensively. In contrast, few theoretical results are available for reduced-rank multivariate generalised linear models. We develop M-estimation theory for concave criterion functions that are maximised over parameters spaces that are neither convex nor closed. These results are used to derive the consistency and asymptotic distribution of maximum likelihood estimators in reduced-rank multivariate generalised linear models, when the response and predictor vectors have a joint distribution. We illustrate our results in a real data classification problem with binary covariates.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
On the classification problem for Poisson Point Processes
Authors:
Alejandro Cholaquidis,
Liliana Forzani,
Pamela Llop,
Leonardo Moreno
Abstract:
We study the binary classification problem for Poisson point processes, which are allowed to take values in a general metric space. The problem is tackled in two different ways: estimating nonparametricaly the intensity functions of the processes (and then plugged into a deterministic formula which expresses the regression function in terms of the intensities), and performing the classical $k$ nea…
▽ More
We study the binary classification problem for Poisson point processes, which are allowed to take values in a general metric space. The problem is tackled in two different ways: estimating nonparametricaly the intensity functions of the processes (and then plugged into a deterministic formula which expresses the regression function in terms of the intensities), and performing the classical $k$ nearest neighbor rule by introducing a suitable distance between patterns of points. In the first approach we prove the consistency of the estimated intensity so that the rule turns out to be also consistent. For the $k$-NN classifier, we prove that the regression function fulfils the so called "Besicovitch condition", usually required for the consistency of the classical classification rules. The theoretical findings are illustrated on simulated data, where in one case the $k$-NN rule outperforms the first approach.
△ Less
Submitted 30 June, 2016; v1 submitted 21 December, 2015;
originally announced December 2015.
-
Supervised dimension reduction for ordinal predictors
Authors:
Liliana Forzani,
Rodrigo García Arancibia,
Pamela Llop,
Diego Tomassi
Abstract:
In applications involving ordinal predictors, common approaches to reduce dimensionality are either extensions of unsupervised techniques such as principal component analysis, or variable selection procedures that rely on modeling the regression function. In this paper, a supervised dimension reduction method tailored to ordered categorical predictors is introduced. It uses a model-based dimension…
▽ More
In applications involving ordinal predictors, common approaches to reduce dimensionality are either extensions of unsupervised techniques such as principal component analysis, or variable selection procedures that rely on modeling the regression function. In this paper, a supervised dimension reduction method tailored to ordered categorical predictors is introduced. It uses a model-based dimension reduction approach, inspired by extending sufficient dimension reductions to the context of latent Gaussian variables. The reduction is chosen without modeling the response as a function of the predictors and does not impose any distributional assumption on the response or on the response given the predictors. A likelihood-based estimator of the reduction is derived and an iterative expectation-maximization type algorithm is proposed to alleviate the computational load and thus make the method more practical. A regularized estimator, which simultaneously achieves variable selection and dimension reduction, is also presented. Performance of the proposed method is evaluated through simulations and a real data example for socioeconomic index construction, comparing favorably to widespread use techniques.
△ Less
Submitted 12 October, 2017; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Estimating sufficient reductions of the predictors in abundant high-dimensional regressions
Authors:
R. Dennis Cook,
Liliana Forzani,
Adam J. Rothman
Abstract:
We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are pos…
▽ More
We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are possible. Simulation results are presented to support the theoretical conclusion.
△ Less
Submitted 30 May, 2012;
originally announced May 2012.
-
On convex regression estimators
Authors:
Néstor E. Aguilera,
Liliana Forzani,
Pedro Morin
Abstract:
A new nonparametric estimator of a convex regression function in any dimension is proposed and its convergence properties are studied. We start by using any estimator of the regression function and we \emph{convexify} it by taking the convex envelope of a sample of the approximation obtained. We prove that the uniform rate of convergence of the estimator is maintained after the convexification is…
▽ More
A new nonparametric estimator of a convex regression function in any dimension is proposed and its convergence properties are studied. We start by using any estimator of the regression function and we \emph{convexify} it by taking the convex envelope of a sample of the approximation obtained. We prove that the uniform rate of convergence of the estimator is maintained after the convexification is applied. The finite sample properties of the new estimator are investigated by means of a simulation study and the application of the new method is demonstrated in examples.
△ Less
Submitted 14 June, 2010;
originally announced June 2010.
-
On the maximal function for the generalized Ornstein-Uhlenbeck semigroup
Authors:
Jorge Betancor,
Liliana Forzani,
Roberto Scotto,
Wilfredo Urbina
Abstract:
In this note we consider the maximal function for the generalized Ornstein-Uhlenbeck semigroup in $\RR$ associated with the generalized Hermite polynomials $\{H_n^μ\}$ and prove that it is weak type (1,1) with respect to $dλ_μ(x) = |x|^{2μ}e^{-|x|^2} dx,$ for $μ>-1/2$ as well as bounded on $L^p(dλ_μ) $ for $p>1$
In this note we consider the maximal function for the generalized Ornstein-Uhlenbeck semigroup in $\RR$ associated with the generalized Hermite polynomials $\{H_n^μ\}$ and prove that it is weak type (1,1) with respect to $dλ_μ(x) = |x|^{2μ}e^{-|x|^2} dx,$ for $μ>-1/2$ as well as bounded on $L^p(dλ_μ) $ for $p>1$
△ Less
Submitted 30 September, 2006;
originally announced October 2006.