-
Dynamic allocation: extremes, tail dependence, and regime Shifts
Authors:
Yin Luo,
Sheng Wang,
Javed Jussa
Abstract:
By capturing outliers, volatility clustering, and tail dependence in the asset return distribution, we build a sophisticated model to predict the downside risk of the global financial market. We further develop a dynamic regime switching model that can forecast real-time risk regime of the market. Our GARCH-DCC-Copula risk model can significantly improve both risk- and alpha-based global tactical…
▽ More
By capturing outliers, volatility clustering, and tail dependence in the asset return distribution, we build a sophisticated model to predict the downside risk of the global financial market. We further develop a dynamic regime switching model that can forecast real-time risk regime of the market. Our GARCH-DCC-Copula risk model can significantly improve both risk- and alpha-based global tactical asset allocation strategies. Our risk regime has strong predictive power of quantitative equity factor performance, which can help equity investors to build better factor models and asset allocation managers to construct more efficient risk premia portfolios.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks
Authors:
Qiang Chen,
Tianyang Han,
Jin Li,
Ye Luo,
Yuxiao Wu,
Xiaowei Zhang,
Tuo Zhou
Abstract:
Can AI effectively perform complex econometric analysis traditionally requiring human expertise? This paper evaluates AI agents' capability to master econometrics, focusing on empirical analysis performance. We develop an ``Econometrics AI Agent'' built on the open-source MetaGPT framework. This agent exhibits outstanding performance in: (1) planning econometric tasks strategically, (2) generating…
▽ More
Can AI effectively perform complex econometric analysis traditionally requiring human expertise? This paper evaluates AI agents' capability to master econometrics, focusing on empirical analysis performance. We develop an ``Econometrics AI Agent'' built on the open-source MetaGPT framework. This agent exhibits outstanding performance in: (1) planning econometric tasks strategically, (2) generating and executing code, (3) employing error-based reflection for improved robustness, and (4) allowing iterative refinement through multi-round conversations. We construct two datasets from academic coursework materials and published research papers to evaluate performance against real-world challenges. Comparative testing shows our domain-specialized AI agent significantly outperforms both benchmark large language models (LLMs) and general-purpose AI agents. This work establishes a testbed for exploring AI's impact on social science research and enables cost-effective integration of domain expertise, making advanced econometric methods accessible to users with minimal coding skills. Furthermore, our AI agent enhances research reproducibility and offers promising pedagogical applications for econometrics teaching.
△ Less
Submitted 13 June, 2025; v1 submitted 1 June, 2025;
originally announced June 2025.
-
Can We Validate Counterfactual Estimations in the Presence of General Network Interference?
Authors:
Sadegh Shirani,
Yuwei Luo,
William Overman,
Ruoxuan Xiong,
Mohsen Bayati
Abstract:
In experimental settings with network interference, a unit's treatment can influence outcomes of other units, challenging both causal effect estimation and its validation. Classic validation approaches fail as outcomes are only observable under one treatment scenario and exhibit complex correlation patterns due to interference. To address these challenges, we introduce a new framework enabling cro…
▽ More
In experimental settings with network interference, a unit's treatment can influence outcomes of other units, challenging both causal effect estimation and its validation. Classic validation approaches fail as outcomes are only observable under one treatment scenario and exhibit complex correlation patterns due to interference. To address these challenges, we introduce a new framework enabling cross-validation for counterfactual estimation. At its core is our distribution-preserving network bootstrap method -- a theoretically-grounded approach inspired by approximate message passing. This method creates multiple subpopulations while preserving the underlying distribution of network effects. We extend recent causal message-passing developments by incorporating heterogeneous unit-level characteristics and varying local interactions, ensuring reliable finite-sample performance through non-asymptotic analysis. We also develop and publicly release a comprehensive benchmark toolbox with diverse experimental environments, from networks of interacting AI agents to opinion formation in real-world communities and ride-sharing applications. These environments provide known ground truth values while maintaining realistic complexities, enabling systematic examination of causal inference methods. Extensive evaluation across these environments demonstrates our method's robustness to diverse forms of network interference. Our work provides researchers with both a practical estimation framework and a standardized platform for testing future methodological developments.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Demand Analysis under Price Rigidity and Endogenous Assortment: An Application to China's Tobacco Industry
Authors:
Hui Liu,
Yao Luo
Abstract:
We observe nominal price rigidity in tobacco markets across China. The monopolistic seller responds by adjusting product assortments, which remain unobserved by the analyst. We develop and estimate a logit demand model that incorporates assortment discrimination and nominal price rigidity. We find that consumers are significantly more responsive to price changes than conventional models predict. S…
▽ More
We observe nominal price rigidity in tobacco markets across China. The monopolistic seller responds by adjusting product assortments, which remain unobserved by the analyst. We develop and estimate a logit demand model that incorporates assortment discrimination and nominal price rigidity. We find that consumers are significantly more responsive to price changes than conventional models predict. Simulated tax increases reveal that neglecting the role of endogenous assortments results in underestimations of the decline in higher-tier product sales, incorrect directional predictions of lower-tier product sales, and overestimation of tax revenue by more than 50%. Finally, we extend our methodology to settings with competition and random coefficient models.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Seesaw Experimentation: A/B Tests with Spillovers
Authors:
Jin Li,
Ye Luo,
Xiaowei Zhang
Abstract:
This paper examines how spillover effects in A/B testing can impede organizational progress and develops strategies for mitigating these challenges. We identify a phenomenon termed ``seesaw experimentation'', where a firm's overall performance paradoxically deteriorates despite achieving continuous improvements in measured A/B testing metrics. Seesaw experimentation arises when successful innovati…
▽ More
This paper examines how spillover effects in A/B testing can impede organizational progress and develops strategies for mitigating these challenges. We identify a phenomenon termed ``seesaw experimentation'', where a firm's overall performance paradoxically deteriorates despite achieving continuous improvements in measured A/B testing metrics. Seesaw experimentation arises when successful innovations in primary metrics generate negative externalities in secondary, unmeasured dimensions. To address this problem, we propose implementing a positive hurdle rate for A/B test approval. We derive the optimal hurdle rate, offering a simple solution that preserves decentralized experimentation while mitigating negative spillovers.
△ Less
Submitted 15 January, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Higher-Order Causal Message Passing for Experimentation with Complex Interference
Authors:
Mohsen Bayati,
Yuwei Luo,
William Overman,
Sadegh Shirani,
Ruoxuan Xiong
Abstract:
Accurate estimation of treatment effects is essential for decision-making across various scientific fields. This task, however, becomes challenging in areas like social sciences and online marketplaces, where treating one experimental unit can influence outcomes for others through direct or indirect interactions. Such interference can lead to biased treatment effect estimates, particularly when th…
▽ More
Accurate estimation of treatment effects is essential for decision-making across various scientific fields. This task, however, becomes challenging in areas like social sciences and online marketplaces, where treating one experimental unit can influence outcomes for others through direct or indirect interactions. Such interference can lead to biased treatment effect estimates, particularly when the structure of these interactions is unknown. We address this challenge by introducing a new class of estimators based on causal message-passing, specifically designed for settings with pervasive, unknown interference. Our estimator draws on information from the sample mean and variance of unit outcomes and treatments over time, enabling efficient use of observed data to estimate the evolution of the system state. Concretely, we construct non-linear features from the moments of unit outcomes and treatments and then learn a function that maps these features to future mean and variance of unit outcomes. This allows for the estimation of the treatment effect over time. Extensive simulations across multiple domains, using synthetic and real network data, demonstrate the efficacy of our approach in estimating total treatment effect dynamics, even in cases where interference exhibits non-monotonic behavior in the probability of treatment.
△ Less
Submitted 3 February, 2025; v1 submitted 1 November, 2024;
originally announced November 2024.
-
Tests of thermal macroeconomic theory on simulated micro-economies
Authors:
Yihang Luo,
R. S. MacKay,
Nick Chater
Abstract:
In this paper, we test predictions of a new theory of macroeconomics, called "thermal macroeconomics." The theory aims to apply the mathematical structure of classical thermodynamics, including analogues of temperature and entropy, to predict aspects of the aggregate behaviour of populations of economic agents without analyzing their detailed interactions. We test the theory by comparing its predi…
▽ More
In this paper, we test predictions of a new theory of macroeconomics, called "thermal macroeconomics." The theory aims to apply the mathematical structure of classical thermodynamics, including analogues of temperature and entropy, to predict aspects of the aggregate behaviour of populations of economic agents without analyzing their detailed interactions. We test the theory by comparing its predictions with the behaviour of a variety of simulated micro-economies in which goods and money can be exchanged between agents, confirming the predictions of the theory. The paper serves also to illustrate and make more tangible the predictions of thermal macroeconomics.
△ Less
Submitted 27 October, 2024;
originally announced October 2024.
-
AI Persuasion, Bayesian Attribution, and Career Concerns of Doctors
Authors:
Hanzhe Li,
Jin Li,
Ye Luo,
Xiaowei Zhang
Abstract:
This paper examines how AI persuades doctors when their diagnoses differ. Disagreements arise from two sources: attention differences, which are objective and play a complementary role to the doctor, and comprehension differences, which are subjective and act as substitutes. AI's interpretability influences how doctors attribute these sources and their willingness to change their minds. Surprising…
▽ More
This paper examines how AI persuades doctors when their diagnoses differ. Disagreements arise from two sources: attention differences, which are objective and play a complementary role to the doctor, and comprehension differences, which are subjective and act as substitutes. AI's interpretability influences how doctors attribute these sources and their willingness to change their minds. Surprisingly, uninterpretable AI can be more persuasive by allowing doctors to partially attribute disagreements to attention differences. This effect is stronger when doctors have low abnormality detection skills. Additionally, uninterpretable AI can improve diagnostic accuracy when doctors have career concerns.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Did Harold Zuercher Have Time-Separable Preferences?
Authors:
Jay Lu,
Yao Luo,
Kota Saito,
Yi Xin
Abstract:
This paper proposes an empirical model of dynamic discrete choice to allow for non-separable time preferences, generalizing the well-known Rust (1987) model. Under weak conditions, we show the existence of value functions and hence well-defined optimal choices. We construct a contraction mapping of the value function and propose an estimation method similar to Rust's nested fixed point algorithm.…
▽ More
This paper proposes an empirical model of dynamic discrete choice to allow for non-separable time preferences, generalizing the well-known Rust (1987) model. Under weak conditions, we show the existence of value functions and hence well-defined optimal choices. We construct a contraction mapping of the value function and propose an estimation method similar to Rust's nested fixed point algorithm. Finally, we apply the framework to the bus engine replacement data. We improve the fit of the data with our general model and reject the null hypothesis that Harold Zuercher has separable time preferences. Misspecifying an agent's preference as time-separable when it is not leads to biased inferences about structure parameters (such as the agent's risk attitudes) and misleading policy recommendations.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Deconvolution from two order statistics
Authors:
JoonHwan Cho,
Yao Luo,
Ruli Xiao
Abstract:
Economic data are often contaminated by measurement errors and truncated by ranking. This paper shows that the classical measurement error model with independent and additive measurement errors is identified nonparametrically using only two order statistics of repeated measurements. The identification result confirms a hypothesis by Athey and Haile (2002) for a symmetric ascending auction model wi…
▽ More
Economic data are often contaminated by measurement errors and truncated by ranking. This paper shows that the classical measurement error model with independent and additive measurement errors is identified nonparametrically using only two order statistics of repeated measurements. The identification result confirms a hypothesis by Athey and Haile (2002) for a symmetric ascending auction model with unobserved heterogeneity. Extensions allow for heterogeneous measurement errors, broadening the applicability to additional empirical settings, including asymmetric auctions and wage offer models. We adapt an existing simulated sieve estimator and illustrate its performance in finite samples.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Robust Estimation of Realized Correlation: New Insight about Intraday Fluctuations in Market Betas
Authors:
Peter Reinhard Hansen,
Yiyao Luo
Abstract:
Time-varying volatility is an inherent feature of most economic time-series, which causes standard correlation estimators to be inconsistent. The quadrant correlation estimator is consistent but very inefficient. We propose a novel subsampled quadrant estimator that improves efficiency while preserving consistency and robustness. This estimator is particularly well-suited for high-frequency financ…
▽ More
Time-varying volatility is an inherent feature of most economic time-series, which causes standard correlation estimators to be inconsistent. The quadrant correlation estimator is consistent but very inefficient. We propose a novel subsampled quadrant estimator that improves efficiency while preserving consistency and robustness. This estimator is particularly well-suited for high-frequency financial data and we apply it to a large panel of US stocks. Our empirical analysis sheds new light on intra-day fluctuations in market betas by decomposing them into time-varying correlations and relative volatility changes. Our results show that intraday variation in betas is primarily driven by intraday variation in correlations.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Blockchain-based Decentralized Co-governance: Innovations and Solutions for Sustainable Crowdfunding
Authors:
Bingyou Chen,
Yu Luo,
Jieni Li,
Yujian Li,
Ying Liu,
Fan Yang,
Junge Bo,
Yanan Qiao
Abstract:
This thesis provides an in-depth exploration of the Decentralized Co-governance Crowdfunding (DCC) Ecosystem, a novel solution addressing prevailing challenges in conventional crowdfunding methods faced by MSMEs and innovative projects. Among the problems it seeks to mitigate are high transaction costs, lack of transparency, fraud, and inefficient resource allocation. Leveraging a comprehensive re…
▽ More
This thesis provides an in-depth exploration of the Decentralized Co-governance Crowdfunding (DCC) Ecosystem, a novel solution addressing prevailing challenges in conventional crowdfunding methods faced by MSMEs and innovative projects. Among the problems it seeks to mitigate are high transaction costs, lack of transparency, fraud, and inefficient resource allocation. Leveraging a comprehensive review of the existing literature on crowdfunding economic activities and blockchain's impact on organizational governance, we propose a transformative socio-economic model based on digital tokens and decentralized co-governance. This ecosystem is marked by a tripartite community structure - the Labor, Capital, and Governance communities - each contributing uniquely to the ecosystem's operation. Our research unfolds the evolution of the DCC ecosystem through distinct phases, offering a novel understanding of socioeconomic dynamics in a decentralized digital world. It also delves into the intricate governance mechanism of the ecosystem, ensuring integrity, fairness, and a balanced distribution of value and wealth.
△ Less
Submitted 2 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Under-Identification of Structural Models Based on Timing and Information Set Assumptions
Authors:
Daniel Ackerberg,
Garth Frazer,
Kyoo il Kim,
Yao Luo,
Yingjun Su
Abstract:
We revisit identification based on timing and information set assumptions in structural models, which have been used in the context of production functions, demand equations, and hedonic pricing models (e.g. Olley and Pakes (1996), Blundell and Bond (2000)). First, we demonstrate a general under-identification problem using these assumptions in a simple version of the Blundell-Bond dynamic panel m…
▽ More
We revisit identification based on timing and information set assumptions in structural models, which have been used in the context of production functions, demand equations, and hedonic pricing models (e.g. Olley and Pakes (1996), Blundell and Bond (2000)). First, we demonstrate a general under-identification problem using these assumptions in a simple version of the Blundell-Bond dynamic panel model. In particular, the basic moment conditions can yield multiple discrete solutions: one at the persistence parameter in the main equation and another at the persistence parameter governing the regressor. We then show that the problem can persist in a broader set of models but disappears in models under stronger timing assumptions. We then propose possible solutions in the simple setting by enforcing an assumed sign restriction and conclude by using lessons from our basic identification approach to propose more general practical advice for empirical researchers.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
On the Empirical Association between Trade Network Complexity and Global Gross Domestic Product
Authors:
Mayank Kejriwal,
Yuesheng Luo
Abstract:
In recent decades, trade between nations has constituted an important component of global Gross Domestic Product (GDP), with official estimates showing that it likely accounted for a quarter of total global production. While evidence of association already exists in macro-economic data between trade volume and GDP growth, there is considerably less work on whether, at the level of individual granu…
▽ More
In recent decades, trade between nations has constituted an important component of global Gross Domestic Product (GDP), with official estimates showing that it likely accounted for a quarter of total global production. While evidence of association already exists in macro-economic data between trade volume and GDP growth, there is considerably less work on whether, at the level of individual granular sectors (such as vehicles or minerals), associations exist between the complexity of trading networks and global GDP. In this paper, we explore this question by using publicly available data from the Atlas of Economic Complexity project to rigorously construct global trade networks between nations across multiple sectors, and studying the correlation between network-theoretic measures computed on these networks (such as average clustering coefficient and density) and global GDP. We find that there is indeed significant association between trade networks' complexity and global GDP across almost every sector, and that network metrics also correlate with business cycle phenomena such as the Great Recession of 2007-2008. Our results show that trade volume alone cannot explain global GDP growth, and that network science may prove to be a valuable empirical avenue for studying complexity in macro-economic phenomena such as trade.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Spectral Representation Learning for Conditional Moment Models
Authors:
Ziyu Wang,
Yucen Luo,
Yueru Li,
Jun Zhu,
Bernhard Schölkopf
Abstract:
Many problems in causal inference and economics can be formulated in the framework of conditional moment models, which characterize the target function through a collection of conditional moment restrictions. For nonparametric conditional moment models, efficient estimation often relies on preimposed conditions on various measures of ill-posedness of the hypothesis space, which are hard to validat…
▽ More
Many problems in causal inference and economics can be formulated in the framework of conditional moment models, which characterize the target function through a collection of conditional moment restrictions. For nonparametric conditional moment models, efficient estimation often relies on preimposed conditions on various measures of ill-posedness of the hypothesis space, which are hard to validate when flexible models are used. In this work, we address this issue by proposing a procedure that automatically learns representations with controlled measures of ill-posedness. Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator, which can be used for kernelized estimators and is known to facilitate minimax optimal estimation in certain settings. We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator. We evaluate the proposed method on proximal causal inference tasks, exhibiting promising performance on high-dimensional, semi-synthetic data.
△ Less
Submitted 28 December, 2022; v1 submitted 29 October, 2022;
originally announced October 2022.
-
A New Method for Generating Random Correlation Matrices
Authors:
Ilya Archakov,
Peter Reinhard Hansen,
Yiyao Luo
Abstract:
We propose a new method for generating random correlation matrices that makes it simple to control both location and dispersion. The method is based on a vector parameterization, gamma = g(C), which maps any distribution on R^d, d = n(n-1)/2 to a distribution on the space of non-singular nxn correlation matrices. Correlation matrices with certain properties, such as being well-conditioned, having…
▽ More
We propose a new method for generating random correlation matrices that makes it simple to control both location and dispersion. The method is based on a vector parameterization, gamma = g(C), which maps any distribution on R^d, d = n(n-1)/2 to a distribution on the space of non-singular nxn correlation matrices. Correlation matrices with certain properties, such as being well-conditioned, having block structures, and having strictly positive elements, are simple to generate. We compare the new method with existing methods.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Order Statistics Approaches to Unobserved Heterogeneity in Auctions
Authors:
Yao Luo,
Peijun Sang,
Ruli Xiao
Abstract:
We establish nonparametric identification of auction models with continuous and nonseparable unobserved heterogeneity using three consecutive order statistics of bids. We then propose sieve maximum likelihood estimators for the joint distribution of unobserved heterogeneity and the private value, as well as their conditional and marginal distributions. Lastly, we apply our methodology to a novel d…
▽ More
We establish nonparametric identification of auction models with continuous and nonseparable unobserved heterogeneity using three consecutive order statistics of bids. We then propose sieve maximum likelihood estimators for the joint distribution of unobserved heterogeneity and the private value, as well as their conditional and marginal distributions. Lastly, we apply our methodology to a novel dataset from judicial auctions in China. Our estimates suggest substantial gains from accounting for unobserved heterogeneity when setting reserve prices. We propose a simple scheme that achieves nearly optimal revenue by using the appraisal value as the reserve price.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Identification of Auction Models Using Order Statistics
Authors:
Yao Luo,
Ruli Xiao
Abstract:
Auction data often contain information on only the most competitive bids as opposed to all bids. The usual measurement error approaches to unobserved heterogeneity are inapplicable due to dependence among order statistics. We bridge this gap by providing a set of positive identification results. First, we show that symmetric auctions with discrete unobserved heterogeneity are identifiable using tw…
▽ More
Auction data often contain information on only the most competitive bids as opposed to all bids. The usual measurement error approaches to unobserved heterogeneity are inapplicable due to dependence among order statistics. We bridge this gap by providing a set of positive identification results. First, we show that symmetric auctions with discrete unobserved heterogeneity are identifiable using two consecutive order statistics and an instrument. Second, we extend the results to ascending auctions with unknown competition and unobserved heterogeneity.
△ Less
Submitted 23 April, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Efficient Estimation of Structural Models via Sieves
Authors:
Yao Luo,
Peijun Sang
Abstract:
We propose a class of sieve-based efficient estimators for structural models (SEES), which approximate the solution using a linear combination of basis functions and impose equilibrium conditions as a penalty to determine the best-fitting coefficients. Our estimators avoid the need to repeatedly solve the model, apply to a broad class of models, and are consistent, asymptotically normal, and asymp…
▽ More
We propose a class of sieve-based efficient estimators for structural models (SEES), which approximate the solution using a linear combination of basis functions and impose equilibrium conditions as a penalty to determine the best-fitting coefficients. Our estimators avoid the need to repeatedly solve the model, apply to a broad class of models, and are consistent, asymptotically normal, and asymptotically efficient. Moreover, they solve unconstrained optimization problems with fewer unknowns and offer convenient standard error calculations. As an illustration, we apply our method to an entry game between Walmart and Kmart.
△ Less
Submitted 23 February, 2025; v1 submitted 28 April, 2022;
originally announced April 2022.
-
New Solution based on Hodge Decomposition for Abstract Games
Authors:
Yihao Luo,
Jinhui Pang,
Weibin Han,
Huafei Sun
Abstract:
This paper proposes Hodge Potential Choice (HPC), a new solution for abstract games with irreflexive dominance relations. This solution is formulated by involving geometric tools like differential forms and Hodge decomposition onto abstract games. We provide a workable algorithm for the proposed solution with a new data structure of abstract games. From the view of gaming, HPC overcomes several we…
▽ More
This paper proposes Hodge Potential Choice (HPC), a new solution for abstract games with irreflexive dominance relations. This solution is formulated by involving geometric tools like differential forms and Hodge decomposition onto abstract games. We provide a workable algorithm for the proposed solution with a new data structure of abstract games. From the view of gaming, HPC overcomes several weaknesses of conventional solutions. HPC coincides with Copeland Choice in complete cases and can be extended to slove games with marginal strengths. It will be proven that the Hodge potential choice possesses three prevalent axiomatic properties: neutrality, strong monotonicity, dominance cycle s reversing independence, and sensitivity to mutual dominance. To compare the HPC with Copeland Choice in large samples of games, we design digital experiments with randomly generated abstract games with different sizes and completeness. The experimental results present the advantage of HPC in the statistical sense.
△ Less
Submitted 15 July, 2024; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Dynamic Selection in Algorithmic Decision-making
Authors:
Jin Li,
Ye Luo,
Xiaowei Zhang
Abstract:
This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to…
▽ More
This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
△ Less
Submitted 27 September, 2023; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity
Authors:
Jin Li,
Ye Luo,
Zigan Wang,
Xiaowei Zhang
Abstract:
In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new…
▽ More
In the standard data analysis framework, data is collected (once and for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.
△ Less
Submitted 24 December, 2024; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data
Authors:
Xi Chen,
Ye Luo,
Martin Spindler
Abstract:
In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using weighted observations from other individuals. The weights are determined by a data-driven way and depend on the similarity between the corresponding functions an…
▽ More
In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using weighted observations from other individuals. The weights are determined by a data-driven way and depend on the similarity between the corresponding functions and are measured based on initial estimates. The key feature of such a procedure is that it clusters individuals based on the distance / similarity between them, estimated in a first stage. Our estimation method can be combined with various statistical estimation procedures, in particular modern machine learning methods which are in particular fruitful in the high-dimensional case and with complex, heterogeneous data. The approach can be interpreted as a \textquotedblleft soft-clustering\textquotedblright\ in comparison to traditional\textquotedblleft\ hard clustering\textquotedblright that assigns each individual to exactly one group. We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator. Finally, we analyze a big data set from didichuxing.com, a leading company in transportation industry, to analyze and predict the gap between supply and demand based on a large set of covariates. Our estimator clearly performs much better in out-of-sample prediction compared to existing linear panel data estimators.
△ Less
Submitted 3 January, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
SortedEffects: Sorted Causal Effects in R
Authors:
Shuowen Chen,
Victor Chernozhukov,
Iván Fernández-Val,
Ye Luo
Abstract:
Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and le…
▽ More
Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and least affected if their causal effects are above or below some tail sorted effects. The R package SortedEffects implements the estimation and inference methods therein and provides tools to visualize the results. This vignette serves as an introduction to the package and displays basic functionality of the functions within.
△ Less
Submitted 6 November, 2019; v1 submitted 2 September, 2019;
originally announced September 2019.
-
Nonparametric Identification of First-Price Auction with Unobserved Competition: A Density Discontinuity Framework
Authors:
Emmanuel Guerre,
Yao Luo
Abstract:
We consider nonparametric identification of independent private value first-price auction models, in which the analyst only observes winning bids. Our benchmark model assumes an exogenous number of bidders $N$. We show that, if the bidders observe $N$, the resulting discontinuities in the winning bid density can be used to identify the distribution of $N$. The private value distribution can be non…
▽ More
We consider nonparametric identification of independent private value first-price auction models, in which the analyst only observes winning bids. Our benchmark model assumes an exogenous number of bidders $N$. We show that, if the bidders observe $N$, the resulting discontinuities in the winning bid density can be used to identify the distribution of $N$. The private value distribution can be nonparametrically identified in a second step. This extends, under testable identification conditions, to the case where $N$ is a number of potential buyers, who bid with some unknown probability. Identification also holds in presence of additive unobserved heterogeneity drawn from some parametric distributions. A parametric Bayesian estimation procedure is proposed. An application to Shanghai Government IT procurements finds that the imposed three bidders participation rule is not effective. This generates loss in the range of as large as $10\%$ of the appraisal budget for small IT contracts.
△ Less
Submitted 27 December, 2024; v1 submitted 15 August, 2019;
originally announced August 2019.
-
Shape-Enforcing Operators for Point and Interval Estimators
Authors:
Xi Chen,
Victor Chernozhukov,
Iván Fernández-Val,
Scott Kostyshak,
Ye Luo
Abstract:
A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these r…
▽ More
A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these restrictions ex post on point and interval estimates of the target function by applying functional operators. If an operator satisfies certain properties that we make precise, the shape-enforced point estimates are closer to the target function than the original point estimates and the shape-enforced interval estimates have greater coverage and shorter length than the original interval estimates. We show that these properties hold for six different operators that cover commonly used shape restrictions in practice: range, convexity, monotonicity, monotone convexity, quasi-convexity, and monotone quasi-convexity. We illustrate the results with two empirical applications to the estimation of a height growth chart for infants in India and a production function for chemical firms in China.
△ Less
Submitted 12 February, 2021; v1 submitted 4 September, 2018;
originally announced September 2018.
-
How much income inequality is fair? Nash bargaining solution and its connection to entropy
Authors:
Venkat Venkatasubramanian,
Yu Luo
Abstract:
The question about fair income inequality has been an important open question in economics and in political philosophy for over two centuries with only qualitative answers such as the ones suggested by Rawls, Nozick, and Dworkin. We provided a quantitative answer recently, for an ideal free-market society, by developing a game-theoretic framework that proved that the ideal inequality is a lognorma…
▽ More
The question about fair income inequality has been an important open question in economics and in political philosophy for over two centuries with only qualitative answers such as the ones suggested by Rawls, Nozick, and Dworkin. We provided a quantitative answer recently, for an ideal free-market society, by developing a game-theoretic framework that proved that the ideal inequality is a lognormal distribution of income at equilibrium. In this paper, we develop another approach, using the Nash Bargaining Solution (NBS) framework, which also leads to the same conclusion. Even though the conclusion is the same, the new approach, however, reveals the true nature of NBS, which has been of considerable interest for several decades. Economists have wondered about the economic meaning or purpose of the NBS. While some have alluded to its fairness property, we show more conclusively that it is all about fairness. Since the essence of entropy is also fairness, we see an interesting connection between the Nash product and entropy for a large population of rational economic agents.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Sufficient Statistics for Unobserved Heterogeneity in Structural Dynamic Logit Models
Authors:
Victor Aguirregabiria,
Jiaying Gu,
Yao Luo
Abstract:
We study the identification and estimation of structural parameters in dynamic panel data logit models where decisions are forward-looking and the joint distribution of unobserved heterogeneity and observable state variables is nonparametric, i.e., fixed-effects model. We consider models with two endogenous state variables: the lagged decision variable, and the time duration in the last choice. Th…
▽ More
We study the identification and estimation of structural parameters in dynamic panel data logit models where decisions are forward-looking and the joint distribution of unobserved heterogeneity and observable state variables is nonparametric, i.e., fixed-effects model. We consider models with two endogenous state variables: the lagged decision variable, and the time duration in the last choice. This class of models includes as particular cases important economic applications such as models of market entry-exit, occupational choice, machine replacement, inventory and investment decisions, or dynamic demand of differentiated products. The identification of structural parameters requires a sufficient statistic that controls for unobserved heterogeneity not only in current utility but also in the continuation value of the forward-looking decision problem. We obtain the minimal sufficient statistic and prove identification of some structural parameters using a conditional likelihood approach. We apply this estimator to a machine replacement model.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Estimation and Inference of Treatment Effects with $L_2$-Boosting in High-Dimensional Settings
Authors:
Jannis Kueck,
Ye Luo,
Martin Spindler,
Zigan Wang
Abstract:
Empirical researchers are increasingly faced with rich data sets containing many controls or instrumental variables, making it essential to choose an appropriate approach to variable selection. In this paper, we provide results for valid inference after post- or orthogonal $L_2$-Boosting is used for variable selection. We consider treatment effects after selecting among many control variables and…
▽ More
Empirical researchers are increasingly faced with rich data sets containing many controls or instrumental variables, making it essential to choose an appropriate approach to variable selection. In this paper, we provide results for valid inference after post- or orthogonal $L_2$-Boosting is used for variable selection. We consider treatment effects after selecting among many control variables and instrumental variable models with potentially many instruments. To achieve this, we establish new results for the rate of convergence of iterated post-$L_2$-Boosting and orthogonal $L_2$-Boosting in a high-dimensional setting similar to Lasso, i.e., under approximate sparsity without assuming the beta-min condition. These results are extended to the 2SLS framework and valid inference is provided for treatment effect analysis. We give extensive simulation results for the proposed methods and compare them with Lasso. In an empirical application, we construct efficient IVs with our proposed methods to estimate the effect of pre-merger overlap of bank branch networks in the US on the post-merger stock returns of the acquirer bank.
△ Less
Submitted 1 July, 2021; v1 submitted 31 December, 2017;
originally announced January 2018.
-
$L_2$Boosting for Economic Applications
Authors:
Ye Luo,
Martin Spindler
Abstract:
In the recent years more and more high-dimensional data sets, where the number of parameters $p$ is high compared to the number of observations $n$ or even larger, are available for applied researchers. Boosting algorithms represent one of the major advances in machine learning and statistics in recent years and are suitable for the analysis of such data sets. While Lasso has been applied very suc…
▽ More
In the recent years more and more high-dimensional data sets, where the number of parameters $p$ is high compared to the number of observations $n$ or even larger, are available for applied researchers. Boosting algorithms represent one of the major advances in machine learning and statistics in recent years and are suitable for the analysis of such data sets. While Lasso has been applied very successfully for high-dimensional data sets in Economics, boosting has been underutilized in this field, although it has been proven very powerful in fields like Biostatistics and Pattern Recognition. We attribute this to missing theoretical results for boosting. The goal of this paper is to fill this gap and show that boosting is a competitive method for inference of a treatment effect or instrumental variable (IV) estimation in a high-dimensional setting. First, we present the $L_2$Boosting with componentwise least squares algorithm and variants which are tailored for regression problems which are the workhorse for most Econometric problems. Then we show how $L_2$Boosting can be used for estimation of treatment effects and IV estimation. We highlight the methods and illustrate them with simulations and empirical examples. For further results and technical details we refer to Luo and Spindler (2016, 2017) and to the online supplement of the paper.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.
-
High-Dimensional $L_2$Boosting: Rate of Convergence
Authors:
Ye Luo,
Martin Spindler,
Jannis Kück
Abstract:
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage…
▽ More
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.
△ Less
Submitted 21 July, 2022; v1 submitted 29 February, 2016;
originally announced February 2016.
-
The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Ye Luo
Abstract:
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). Wh…
▽ More
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.
△ Less
Submitted 25 May, 2018; v1 submitted 17 December, 2015;
originally announced December 2015.
-
Game Theory, Statistical Mechanics and Income Inequality
Authors:
Venkat Venkatasubramanian,
Yu Luo,
Jay Sethuraman
Abstract:
The widening inequality in income distribution in recent years, and the associated excessive pay packages of CEOs in the U.S. and elsewhere, is of growing concern among policy makers as well as the common person. However, there seems to be no satisfactory answer, in conventional economic theories and models, to the fundamental question of what kind of pay distribution we ought to see, at least und…
▽ More
The widening inequality in income distribution in recent years, and the associated excessive pay packages of CEOs in the U.S. and elsewhere, is of growing concern among policy makers as well as the common person. However, there seems to be no satisfactory answer, in conventional economic theories and models, to the fundamental question of what kind of pay distribution we ought to see, at least under ideal conditions, in a free market environment and whether this distribution is fair. We propose a game theoretic framework that addresses these questions and show that the lognormal distribution is the fairest inequality of pay in an organization comprising of homogenous agents, achieved at equilibrium, under ideal free market conditions. We also show that for a population of two different classes of agents, the final distribution is a combination of two different lognormal distributions where one of them, corresponding to the top 3-5% of the population, can be misidentified as a Pareto distribution. Our theory also shows the deep and direct connection between potential game theory and statistical mechanics through entropy, which is a measure of fairness in a distribution. This leads us to propose the fair market hypothesis, that the self-organizing dynamics of the ideal free market, i.e., Adam Smith's "invisible hand", not only promotes efficiency but also maximizes fairness under the given constraints.
△ Less
Submitted 12 November, 2014; v1 submitted 25 June, 2014;
originally announced June 2014.