-
Generalized Ridge Regression: Applications to Nonorthogonal Linear Regression Models
Authors:
Román Salmerón Gómez,
Catalina García García,
Guillermo Hortal Reina
Abstract:
This paper analyzes the possibilities of using the generalized ridge regression to mitigate multicollinearity in a multiple linear regression model. For this purpose, we obtain the expressions for the estimated variance, the coefficient of variation, the coefficient of correlation, the variance inflation factor and the condition number. The results obtained are illustrated with two numerical examp…
▽ More
This paper analyzes the possibilities of using the generalized ridge regression to mitigate multicollinearity in a multiple linear regression model. For this purpose, we obtain the expressions for the estimated variance, the coefficient of variation, the coefficient of correlation, the variance inflation factor and the condition number. The results obtained are illustrated with two numerical examples.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Stepwise regression revisited
Authors:
Román Salmerón Gómez,
Catalina García García
Abstract:
This paper shows that the degree of approximate multicollinearity in a linear regression model increases simply by including independent variables, even if these are not highly linearly related. In the current situation where it is relatively easy to find linear models with a large number of independent variables, it is shown that this issue can lead to the erroneous conclusion that there is a wor…
▽ More
This paper shows that the degree of approximate multicollinearity in a linear regression model increases simply by including independent variables, even if these are not highly linearly related. In the current situation where it is relatively easy to find linear models with a large number of independent variables, it is shown that this issue can lead to the erroneous conclusion that there is a worrying problem of approximate multicollinearity. To avoid this situation, an adjusted variance inflation factor is proposed to compensate the presence of a large number of independent variables in the multiple linear regression model. It is shown that this proposal has a direct impact on variable selection models based on influence relationships, which translates into a new decision criterion in the individual significance contrast to be considered in stepwise regression models or even directly in a multiple linear regression model.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Unraveling Residualization: enhancing its application and exposing its relationship with the FWL theorem
Authors:
Catalina García García,
Román Salmerón Gómez,
Claudia García García
Abstract:
The residualization procedure has been applied in many different fields to estimate models with multicollinearity. However, there exists a lack of understanding of this methodology and some authors discourage its use. This paper aims to contribute to a better understanding of the residualization procedure to promote an adequate application and interpretation of it among statistics and data science…
▽ More
The residualization procedure has been applied in many different fields to estimate models with multicollinearity. However, there exists a lack of understanding of this methodology and some authors discourage its use. This paper aims to contribute to a better understanding of the residualization procedure to promote an adequate application and interpretation of it among statistics and data sciences. We highlight its interesting potential application, not only to mitigate multicollinearity but also when the study is oriented to the analysis of the isolated effect of independent variables. The relation between the residualization methodology and the Frisch-Waugh-Lovell (FWL) theorem is also analyzed, concluding that, although both provide the same estimations, the interpretation of the estimated coefficients is different. These different interpretations justify the application of the residualization methodology regardless of the FWL theorem. A real data example is presented for a better illustration of the contribution of this paper.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Generalized Ridge Regression: Biased Estimation for Multiple Linear Regression Models
Authors:
Román Salmerón Gómez,
Catalina García García,
Guillermo Hortal Reina
Abstract:
When the regressors of a econometric linear model are nonorthogonal, it is well known that their estimation by ordinary least squares can present various problems that discourage the use of this model. The ridge regression is the most commonly used alternative; however, its generalized version has hardly been analyzed. The present work addresses the estimation of this generalized version, as well…
▽ More
When the regressors of a econometric linear model are nonorthogonal, it is well known that their estimation by ordinary least squares can present various problems that discourage the use of this model. The ridge regression is the most commonly used alternative; however, its generalized version has hardly been analyzed. The present work addresses the estimation of this generalized version, as well as the calculation of its mean squared error, goodness of fit and bootstrap inference.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Enlarging of the sample to address multicollinearity
Authors:
Román Salmerón Gómez,
Catalina García García,
Ainara Rodríguez Sánchez
Abstract:
The paper analyzes how the enlarging of the sample affects to the mitigation of collinearity concluding that it may mitigate the consequences of collinearity related to statistical analysis but not necessarily the numerical instability. The problem that is addressed is of importance in the teaching of social sciences since it discusses one of the solutions proposed almost unanimously to solve the…
▽ More
The paper analyzes how the enlarging of the sample affects to the mitigation of collinearity concluding that it may mitigate the consequences of collinearity related to statistical analysis but not necessarily the numerical instability. The problem that is addressed is of importance in the teaching of social sciences since it discusses one of the solutions proposed almost unanimously to solve the problem of multicollinearity. For a better understanding and illustration of the contribution of this paper, two empirical examples are presented and not highly technical developments are used.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Estimation of ill-conditioned models using penalized sums of squares of the residuals
Authors:
Román Salmerón Gómez,
Catalina B. García García
Abstract:
This paper analyzes the estimation of econometric models by penalizing the sum of squares of the residuals with a factor that makes the model estimates approximate those that would be obtained when considering the possible simple regressions between the dependent variable of the econometric model and each of its independent variables. It is shown that the ridge estimator is a particular case of th…
▽ More
This paper analyzes the estimation of econometric models by penalizing the sum of squares of the residuals with a factor that makes the model estimates approximate those that would be obtained when considering the possible simple regressions between the dependent variable of the econometric model and each of its independent variables. It is shown that the ridge estimator is a particular case of the penalized estimator obtained, which, upon analysis of its main characteristics, presents better properties than the ridge especially in reference to the individual boostrap inference of the coefficients of the model and the numerical stability of the estimates obtained. This improvement is due to the fact that instead of shrinking the estimator towards zero, the estimator shrinks towards the estimates of the coefficients of the simple regressions discussed above.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Modelling Global Fossil CO2 Emissions with a Lognormal Distribution: A Climate Policy Tool
Authors:
Faustino Prieto,
Catalina B. García-García,
Román Salmerón Gómez
Abstract:
Carbon dioxide (CO2) emissions have emerged as a critical issue with profound impacts on the environment, human health, and the global economy. The steady increase in atmospheric CO2 levels, largely due to human activities such as burning fossil fuels and deforestation, has become a major contributor to climate change and its associated catastrophic effects. To tackle this pressing challenge, a co…
▽ More
Carbon dioxide (CO2) emissions have emerged as a critical issue with profound impacts on the environment, human health, and the global economy. The steady increase in atmospheric CO2 levels, largely due to human activities such as burning fossil fuels and deforestation, has become a major contributor to climate change and its associated catastrophic effects. To tackle this pressing challenge, a coordinated global effort is needed, which necessitates a deep understanding of emissions patterns and trends. In this paper, we explore the use of statistical modelling, specifically the lognormal distribution, as a framework for comprehending and predicting CO2 emissions. We build on prior research that suggests a complex distribution of emissions and seek to test the hypothesis that a simpler distribution can still offer meaningful insights for policy-makers. We utilize data from three comprehensive databases and analyse six candidate distributions (exponential, Fisk, gamma, lognormal, Lomax, Weibull) to identify a suitable model for global fossil CO2 emissions. Our findings highlight the adequacy of the lognormal distribution in characterizing emissions across all countries and years studied. Furthermore, to provide additional support for this distribution, we provide statistical evidence supporting the applicability of Gibrat's law to those CO2 emissions. Finally, we employ the lognormal model to predict emission parameters for the coming years and propose two policies for reducing total fossil CO2 emissions. Our research aims to provide policy-makers with accurate and detailed information to support effective climate change mitigation strategies.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
The Raise Regression: Justification, properties and application
Authors:
Román Salmerón Gómez,
Catalina García García,
José García Pérez
Abstract:
Multicollinearity produces an inflation in the variance of the Ordinary Least Squares estimators due to the correlation between two or more independent variables (including the constant term). A widely applied solution is to estimate with penalized estimators (such as the ridge estimator, the Liu estimator, etc.) which exchange the mean square error by the bias. Although the variance diminishes wi…
▽ More
Multicollinearity produces an inflation in the variance of the Ordinary Least Squares estimators due to the correlation between two or more independent variables (including the constant term). A widely applied solution is to estimate with penalized estimators (such as the ridge estimator, the Liu estimator, etc.) which exchange the mean square error by the bias. Although the variance diminishes with these procedures, all seems to indicate that the inference is lost and also the goodness of fit. Alternatively, the raise regression (\cite{Garcia2011} and \cite{Salmeron2017}) allows the mitigation of the problems generated by multicollinearity but without losing the inference and keeping the coefficient of determination. This paper completely formalizes the raise estimator summarizing all the previous contributions: its mean square error, the variance inflation factor, the condition number, the adequate selection of the variable to be raised, the successive raising and the relation between the raise and the ridge estimator. As a novelty, it is also presented the estimation method, the relation between the raise and the residualization, it is analyzed the norm of the estimator and the behaviour of the individual and joint significance test and the behaviour of the mean square error and the coefficient of variation. The usefulness of the raise regression as alternative to mitigate the multicollinearity is illustrated with two empirical applications.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Centered and non-centered variance inflation factor
Authors:
Román Salmerón Gómez,
Catalina García García y José García Pérez
Abstract:
This paper analyzes the diagnostic of near multicollinearity in a multiple linear regression from auxiliary centered regressions (with intercept) and non-centered (without intercept). From these auxiliary regression, the centered and non-centered Variance Inflation Factors are calculated, respectively. It is also presented an expression that relate both of them.
This paper analyzes the diagnostic of near multicollinearity in a multiple linear regression from auxiliary centered regressions (with intercept) and non-centered (without intercept). From these auxiliary regression, the centered and non-centered Variance Inflation Factors are calculated, respectively. It is also presented an expression that relate both of them.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.