-
High-dimensional censored MIDAS logistic regression for corporate survival forecasting
Authors:
Wei Miao,
Jad Beyhum,
Jonas Striaukas,
Ingrid Van Keilegom
Abstract:
This paper addresses the challenge of forecasting corporate distress, a problem marked by three key statistical hurdles: (i) right censoring, (ii) high-dimensional predictors, and (iii) mixed-frequency data. To overcome these complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data Sampling) logistic regression. Our approach handles censoring through inverse probability weigh…
▽ More
This paper addresses the challenge of forecasting corporate distress, a problem marked by three key statistical hurdles: (i) right censoring, (ii) high-dimensional predictors, and (iii) mixed-frequency data. To overcome these complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data Sampling) logistic regression. Our approach handles censoring through inverse probability weighting and achieves accurate estimation with numerous mixed-frequency predictors by employing a sparse-group penalty. We establish finite-sample bounds for the estimation error, accounting for censoring, the MIDAS approximation error, and heavy tails. The superior performance of the method is demonstrated through Monte Carlo simulations. Finally, we present an extensive application of our methodology to predict the financial distress of Chinese-listed firms. Our novel procedure is implemented in the R package 'Survivalml'.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Econometrics of Machine Learning Methods in Economic Forecasting
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.
This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Testing for sparse idiosyncratic components in factor-augmented regression models
Authors:
Jad Beyhum,
Jonas Striaukas
Abstract:
We propose a novel bootstrap test of a dense model, namely factor regression, against a sparse plus dense alternative augmenting model with sparse idiosyncratic components. The asymptotic properties of the test are established under time series dependence and polynomial tails. We outline a data-driven rule to select the tuning parameter and prove its theoretical validity. In simulation experiments…
▽ More
We propose a novel bootstrap test of a dense model, namely factor regression, against a sparse plus dense alternative augmenting model with sparse idiosyncratic components. The asymptotic properties of the test are established under time series dependence and polynomial tails. We outline a data-driven rule to select the tuning parameter and prove its theoretical validity. In simulation experiments, our procedure exhibits high power against sparse alternatives and low power against dense deviations from the null. Moreover, we apply our test to various datasets in macroeconomics and finance and often reject the null. This suggests the presence of sparsity -- on top of a dense model -- in commonly studied economic applications. The R package FAS implements our approach.
△ Less
Submitted 10 July, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Panel Data Nowcasting: The Case of Price-Earnings Ratios
Authors:
Andrii Babii,
Ryan T. Ball,
Eric Ghysels,
Jonas Striaukas
Abstract:
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of th…
▽ More
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Factor-augmented sparse MIDAS regressions with an application to nowcasting
Authors:
Jad Beyhum,
Jonas Striaukas
Abstract:
This article investigates factor-augmented sparse MIDAS (Mixed Data Sampling) regressions for high-dimensional time series data, which may be observed at different frequencies. Our novel approach integrates sparse and dense dimensionality reduction techniques. We derive the convergence rate of our estimator under misspecification, $τ$-mixing dependence, and polynomial tails. Our method's finite sa…
▽ More
This article investigates factor-augmented sparse MIDAS (Mixed Data Sampling) regressions for high-dimensional time series data, which may be observed at different frequencies. Our novel approach integrates sparse and dense dimensionality reduction techniques. We derive the convergence rate of our estimator under misspecification, $τ$-mixing dependence, and polynomial tails. Our method's finite sample performance is assessed via Monte Carlo simulations. We apply the methodology to nowcasting U.S. GDP growth and demonstrate that it outperforms both sparse regression and standard factor-augmented regression during the COVID-19 pandemic. To ensure the robustness of these results, we also implement factor-augmented sparse logistic regression, which further confirms the superior accuracy of our nowcast probabilities during recessions. These findings indicate that recessions are influenced by both idiosyncratic (sparse) and common (dense) shocks.
△ Less
Submitted 12 November, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application
Authors:
Andrii Babii,
Ryan T. Ball,
Eric Ghysels,
Jonas Striaukas
Abstract:
The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed eff…
▽ More
The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data can have fat tails. To that end, we leverage on a new Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $τ$-mixing processes.
△ Less
Submitted 22 November, 2021; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Machine Learning Time Series Regressions with an Application to Nowcasting
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes…
▽ More
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that text data can be a useful addition to more traditional numerical data.
△ Less
Submitted 12 December, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
High-Dimensional Granger Causality Tests with an Application to VIX and News
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
We study Granger causality testing for high-dimensional time series using regularized regressions. To perform proper inference, we rely on heteroskedasticity and autocorrelation consistent (HAC) estimation of the asymptotic variance and develop the inferential theory in the high-dimensional setting. To recognize the time series data structures we focus on the sparse-group LASSO estimator, which in…
▽ More
We study Granger causality testing for high-dimensional time series using regularized regressions. To perform proper inference, we rely on heteroskedasticity and autocorrelation consistent (HAC) estimation of the asymptotic variance and develop the inferential theory in the high-dimensional setting. To recognize the time series data structures we focus on the sparse-group LASSO estimator, which includes the LASSO and the group LASSO as special cases. We establish the debiased central limit theorem for low dimensional groups of regression coefficients and study the HAC estimator of the long-run variance based on the sparse-group LASSO residuals. This leads to valid time series inference for individual regression coefficients as well as groups, including Granger causality tests. The treatment relies on a new Fuk-Nagaev inequality for a class of $τ$-mixing processes with heavier than Gaussian tails, which is of independent interest. In an empirical application, we study the Granger causal relationship between the VIX and financial news.
△ Less
Submitted 1 February, 2021; v1 submitted 12 December, 2019;
originally announced December 2019.