-
Econometrics of Machine Learning Methods in Economic Forecasting
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.
This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Panel Data Nowcasting: The Case of Price-Earnings Ratios
Authors:
Andrii Babii,
Ryan T. Ball,
Eric Ghysels,
Jonas Striaukas
Abstract:
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of th…
▽ More
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Tensor PCA for Factor Models
Authors:
Andrii Babii,
Eric Ghysels,
Junsu Pan
Abstract:
Modern empirical analysis often relies on high-dimensional panel datasets with non-negligible cross-sectional and time-series correlations. Factor models are natural for capturing such dependencies. A tensor factor model describes the $d$-dimensional panel as a sum of a reduced rank component and an idiosyncratic noise, generalizing traditional factor models for two-dimensional panels. We consider…
▽ More
Modern empirical analysis often relies on high-dimensional panel datasets with non-negligible cross-sectional and time-series correlations. Factor models are natural for capturing such dependencies. A tensor factor model describes the $d$-dimensional panel as a sum of a reduced rank component and an idiosyncratic noise, generalizing traditional factor models for two-dimensional panels. We consider a tensor factor model corresponding to the notion of a reduced multilinear rank of a tensor. We show that for a strong factor model, a simple tensor principal component analysis algorithm is optimal for estimating factors and loadings. When the factors are weak, the convergence rate of simple TPCA can be improved with alternating least-squares iterations. We also provide inferential results for factors and loadings and propose the first test to select the number of factors. The new tools are applied to the problem of imputing missing values in a multidimensional panel of firm characteristics.
△ Less
Submitted 6 March, 2025; v1 submitted 25 December, 2022;
originally announced December 2022.
-
Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice
Authors:
Andrii Babii,
Xi Chen,
Eric Ghysels,
Rohit Kumar
Abstract:
We study the binary choice problem in a data-rich environment with asymmetric loss functions. The econometrics literature covers nonparametric binary choice problems but does not offer computationally attractive solutions in data-rich environments. The machine learning literature has many algorithms but is focused mostly on loss functions that are independent of covariates. We show that theoretica…
▽ More
We study the binary choice problem in a data-rich environment with asymmetric loss functions. The econometrics literature covers nonparametric binary choice problems but does not offer computationally attractive solutions in data-rich environments. The machine learning literature has many algorithms but is focused mostly on loss functions that are independent of covariates. We show that theoretically valid decisions on binary outcomes with general loss functions can be achieved via a very simple loss-based reweighting of the logistic regression or state-of-the-art machine learning techniques. We apply our analysis to racial justice in pretrial detention.
△ Less
Submitted 6 November, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application
Authors:
Andrii Babii,
Ryan T. Ball,
Eric Ghysels,
Jonas Striaukas
Abstract:
The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed eff…
▽ More
The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data can have fat tails. To that end, we leverage on a new Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $τ$-mixing processes.
△ Less
Submitted 22 November, 2021; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Machine Learning Time Series Regressions with an Application to Nowcasting
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes…
▽ More
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that text data can be a useful addition to more traditional numerical data.
△ Less
Submitted 12 December, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
High-Dimensional Granger Causality Tests with an Application to VIX and News
Authors:
Andrii Babii,
Eric Ghysels,
Jonas Striaukas
Abstract:
We study Granger causality testing for high-dimensional time series using regularized regressions. To perform proper inference, we rely on heteroskedasticity and autocorrelation consistent (HAC) estimation of the asymptotic variance and develop the inferential theory in the high-dimensional setting. To recognize the time series data structures we focus on the sparse-group LASSO estimator, which in…
▽ More
We study Granger causality testing for high-dimensional time series using regularized regressions. To perform proper inference, we rely on heteroskedasticity and autocorrelation consistent (HAC) estimation of the asymptotic variance and develop the inferential theory in the high-dimensional setting. To recognize the time series data structures we focus on the sparse-group LASSO estimator, which includes the LASSO and the group LASSO as special cases. We establish the debiased central limit theorem for low dimensional groups of regression coefficients and study the HAC estimator of the long-run variance based on the sparse-group LASSO residuals. This leads to valid time series inference for individual regression coefficients as well as groups, including Granger causality tests. The treatment relies on a new Fuk-Nagaev inequality for a class of $τ$-mixing processes with heavier than Gaussian tails, which is of independent interest. In an empirical application, we study the Granger causal relationship between the VIX and financial news.
△ Less
Submitted 1 February, 2021; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Artificial Intelligence Alter Egos: Who benefits from Robo-investing?
Authors:
Catherine D'Hondt,
Rudy De Winne,
Eric Ghysels,
Steve Raymond
Abstract:
Artificial intelligence, or AI, enhancements are increasingly shaping our daily lives. Financial decision-making is no exception to this. We introduce the notion of AI Alter Egos, which are shadow robo-investors, and use a unique data set covering brokerage accounts for a large cross-section of investors over a sample from January 2003 to March 2012, which includes the 2008 financial crisis, to as…
▽ More
Artificial intelligence, or AI, enhancements are increasingly shaping our daily lives. Financial decision-making is no exception to this. We introduce the notion of AI Alter Egos, which are shadow robo-investors, and use a unique data set covering brokerage accounts for a large cross-section of investors over a sample from January 2003 to March 2012, which includes the 2008 financial crisis, to assess the benefits of robo-investing. We have detailed investor characteristics and records of all trades. Our data set consists of investors typically targeted for robo-advising. We explore robo-investing strategies commonly used in the industry, including some involving advanced machine learning methods. The man versus machine comparison allows us to shed light on potential benefits the emerging robo-advising industry may provide to certain segments of the population, such as low income and/or high risk averse investors.
△ Less
Submitted 7 July, 2019;
originally announced July 2019.