-
Non relativistic limit of the nonlinear Klein-Gordon equation: Uniform in time convergence of KAM solutions
Authors:
Dario Bambusi,
Andrea Belloni,
Filippo Giuliani
Abstract:
We study the non relativistic limit of the solutions of the cubic nonlinear Klein--Gordon (KG) equation on an interval with Dirichlet boundary conditions. We construct a family of quasi periodic solutions which, after a Gauge transformation, converge globally uniformly in time to quasi periodic solutions of the cubic NLS. The proof is based on KAM theory. We emphasize that, regardless of the spati…
▽ More
We study the non relativistic limit of the solutions of the cubic nonlinear Klein--Gordon (KG) equation on an interval with Dirichlet boundary conditions. We construct a family of quasi periodic solutions which, after a Gauge transformation, converge globally uniformly in time to quasi periodic solutions of the cubic NLS. The proof is based on KAM theory. We emphasize that, regardless of the spatial domain, all the previous results prove the convergence of KG solutions to NLS solutions only on compact time intervals.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Joint Probability Estimation of Many Binary Outcomes via Localized Adversarial Lasso
Authors:
Alexandre Belloni,
Yan Chen,
Matthew Harding
Abstract:
In this work we consider estimating the probability of many (possibly dependent) binary outcomes which is at the core of many applications, e.g., multi-level treatments in causal inference, demands for bundle of products, etc. Without further conditions, the probability distribution of an M dimensional binary vector is characterized by exponentially in M coefficients which can lead to a high-dimen…
▽ More
In this work we consider estimating the probability of many (possibly dependent) binary outcomes which is at the core of many applications, e.g., multi-level treatments in causal inference, demands for bundle of products, etc. Without further conditions, the probability distribution of an M dimensional binary vector is characterized by exponentially in M coefficients which can lead to a high-dimensional problem even without the presence of covariates. Understanding the (in)dependence structure allows us to substantially improve the estimation as it allows for an effective factorization of the probability distribution. In order to estimate the probability distribution of a M dimensional binary vector, we leverage a Bahadur representation that connects the sparsity of its coefficients with independence across the components. We propose to use regularized and adversarial regularized estimators to obtain an adaptive estimator with respect to the dependence structure which allows for rates of convergence to depend on this intrinsic (lower) dimension. These estimators are needed to handle several challenges within this setting, including estimating nuisance parameters, estimating covariates, and nonseparable moment conditions. Our main results consider the presence of (low dimensional) covariates for which we propose a locally penalized estimator. We provide pointwise rates of convergence addressing several issues in the theoretical analyses as we strive for making a computationally tractable formulation. We apply our results in the estimation of causal effects with multiple binary treatments and show how our estimators can improve the finite sample performance when compared with non-adaptive estimators that try to estimate all the probabilities directly. We also provide simulations that are consistent with our theoretical findings.
△ Less
Submitted 30 January, 2025; v1 submitted 19 October, 2024;
originally announced October 2024.
-
Anti-Concentration Inequalities for the Difference of Maxima of Gaussian Random Vectors
Authors:
Alexandre Belloni,
Ethan X. Fang,
Shuting Shen
Abstract:
We derive novel anti-concentration bounds for the difference between the maximal values of two Gaussian random vectors across various settings. Our bounds are dimension-free, scaling with the dimension of the Gaussian vectors only through the smaller expected maximum of the Gaussian subvectors. In addition, our bounds hold under the degenerate covariance structures, which previous results do not c…
▽ More
We derive novel anti-concentration bounds for the difference between the maximal values of two Gaussian random vectors across various settings. Our bounds are dimension-free, scaling with the dimension of the Gaussian vectors only through the smaller expected maximum of the Gaussian subvectors. In addition, our bounds hold under the degenerate covariance structures, which previous results do not cover. In addition, we show that our conditions are sharp under the homogeneous component-wise variance setting, while we only impose some mild assumptions on the covariance structures under the heterogeneous variance setting. We apply the new anti-concentration bounds to derive the central limit theorem for the maximizers of discrete empirical processes. Finally, we back up our theoretical findings with comprehensive numerical studies.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing
Authors:
Alexandre Belloni,
Mingli Chen,
Oscar Hernan Madrid Padilla,
Zixuan,
Wang
Abstract:
We propose a generalization of the linear panel quantile regression model to accommodate both \textit{sparse} and \textit{dense} parts: sparse means while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approxima…
▽ More
We propose a generalization of the linear panel quantile regression model to accommodate both \textit{sparse} and \textit{dense} parts: sparse means while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approximated by latent factors and their loadings. Such a structure poses problems for traditional sparse estimators, such as the $\ell_1$-penalised Quantile Regression, and for traditional latent factor estimator, such as PCA. We propose a new estimation procedure, based on the ADMM algorithm, consists of combining the quantile loss function with $\ell_1$ \textit{and} nuclear norm regularization. We show, under general conditions, that our estimator can consistently estimate both the nonzero coefficients of the covariates and the latent low-rank matrix.
Our proposed model has a "Characteristics + Latent Factors" Asset Pricing Model interpretation: we apply our model and estimator with a large-dimensional panel of financial data and find that (i) characteristics have sparser predictive power once latent factors were controlled (ii) the factors and coefficients at upper and lower quantiles are different from the median.
△ Less
Submitted 23 August, 2022; v1 submitted 4 December, 2019;
originally announced December 2019.
-
A high dimensional Central Limit Theorem for martingales, with applications to context tree models
Authors:
Alexandre Belloni,
Roberto I. Oliveira
Abstract:
We establish a central limit theorem for (a sequence of) multivariate martingales which dimension potentially grows with the length $n$ of the martingale. A consequence of the results are Gaussian couplings and a multiplier bootstrap for the maximum of a multivariate martingale whose dimensionality $d$ can be as large as $e^{n^c}$ for some $c>0$. We also develop new anti-concentration bounds for t…
▽ More
We establish a central limit theorem for (a sequence of) multivariate martingales which dimension potentially grows with the length $n$ of the martingale. A consequence of the results are Gaussian couplings and a multiplier bootstrap for the maximum of a multivariate martingale whose dimensionality $d$ can be as large as $e^{n^c}$ for some $c>0$. We also develop new anti-concentration bounds for the maximum component of a high-dimensional Gaussian vector, which we believe is of independent interest.
The results are applicable to a variety of settings. We fully develop its use to the estimation of context tree models (or variable length Markov chains) for discrete stationary time series. Specifically, we provide a bootstrap-based rule to tune several regularization parameters in a theoretically valid Lepski-type method. Such bootstrap-based approach accounts for the correlation structure and leads to potentially smaller penalty choices, which in turn improve the estimation of the transition probabilities.
△ Less
Submitted 7 September, 2018;
originally announced September 2018.
-
Latent Agents in Networks: Estimation and Targeting
Authors:
Baris Ata,
Alexandre Belloni,
Ozan Candogan
Abstract:
We consider a network of agents. Associated with each agent are her covariate and outcome. Agents influence each other's outcomes according to a certain connection/influence structure. A subset of the agents participate on a platform, and hence, are observable to it. The rest are not observable to the platform and are called the latent agents. The platform does not know the influence structure of…
▽ More
We consider a network of agents. Associated with each agent are her covariate and outcome. Agents influence each other's outcomes according to a certain connection/influence structure. A subset of the agents participate on a platform, and hence, are observable to it. The rest are not observable to the platform and are called the latent agents. The platform does not know the influence structure of the observable or the latent parts of the network. It only observes the data on past covariates and decisions of the observable agents. Observable agents influence each other both directly and indirectly through the influence they exert on the latent agents.
We investigate how the platform can estimate the dependence of the observable agents' outcomes on their covariates, taking the latent agents into account. First, we show that this relationship can be succinctly captured by a matrix and provide an algorithm for estimating it under a suitable approximate sparsity condition using historical data of covariates and outcomes for the observable agents. We also obtain convergence rates for the proposed estimator despite the high dimensionality that allows more agents than observations. Second, we show that the approximate sparsity condition holds under the standard conditions used in the literature. Hence, our results apply to a large class of networks. Finally, we apply our results to two practical settings: targeted advertising and promotional pricing. We show that by using the available historical data with our estimator, it is possible to obtain asymptotically optimal advertising/pricing decisions, despite the presence of latent agents.
△ Less
Submitted 26 January, 2022; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Subvector Inference in Partially Identified Models with Many Moment Inequalities
Authors:
Alexandre Belloni,
Federico Bugni,
Victor Chernozhukov
Abstract:
This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with…
▽ More
This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with a treatment effect or a policy variable of interest.
Our inference method compares a MinMax test statistic (minimum over parameters satisfying $H_0$ and maximum over moment inequalities) against critical values that are based on bootstrap approximations or analytical bounds. We show that this method controls asymptotic size uniformly over a large class of data generating processes despite the partially identified many moment inequality setting. The finite sample analysis allows us to obtain explicit rates of convergence on the size control. Our results are based on combining non-asymptotic approximations and new high-dimensional central limit theorems for the MinMax of the components of random matrices. Unlike the previous literature on functional inference in partially identified models, our results do not rely on weak convergence results based on Donsker's class assumptions and, in fact, our test statistic may not even converge in distribution. Our bootstrap approximation requires the choice of a tuning parameter sequence that can avoid the excessive concentration of our test statistic. To this end, we propose an asymptotically valid data-driven method to select this tuning parameter sequence. This method generalizes the selection of tuning parameter sequences to problems outside the Donsker's class assumptions and may also be of independent interest. Our procedures based on self-normalized moderate deviation bounds are relatively more conservative but easier to implement.
△ Less
Submitted 29 June, 2018;
originally announced June 2018.
-
High-Dimensional Econometrics and Regularized GMM
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Christian Hansen,
Kengo Kato
Abstract:
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means.…
▽ More
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means. Within this context, we review fundamental results including high-dimensional central limit theorems, bootstrap approximation of high-dimensional limit distributions, and moderate deviation theory. We also review key concepts underlying inference when many parameters are of interest such as multiple testing with family-wise error rate or false discovery rate control. We then turn to a general high-dimensional minimum distance framework with a special focus on generalized method of moments problems where we present results for estimation and inference about model parameters. The presented results cover a wide array of econometric applications, and we discuss several leading special cases including high-dimensional linear regression and linear instrumental variables models to illustrate the general results.
△ Less
Submitted 10 June, 2018; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Confidence Bands for Coefficients in High Dimensional Linear Models with Error-in-variables
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Abhishek Kaul
Abstract:
We study high-dimensional linear models with error-in-variables. Such models are motivated by various applications in econometrics, finance and genetics. These models are challenging because of the need to account for measurement errors to avoid non-vanishing biases in addition to handle the high dimensionality of the parameters. A recent growing literature has proposed various estimators that ach…
▽ More
We study high-dimensional linear models with error-in-variables. Such models are motivated by various applications in econometrics, finance and genetics. These models are challenging because of the need to account for measurement errors to avoid non-vanishing biases in addition to handle the high dimensionality of the parameters. A recent growing literature has proposed various estimators that achieve good rates of convergence. Our main contribution complements this literature with the construction of simultaneous confidence regions for the parameters of interest in such high-dimensional linear models with error-in-variables.
These confidence regions are based on the construction of moment conditions that have an additional orthogonal property with respect to nuisance parameters. We provide a construction that requires us to estimate an additional high-dimensional linear model with error-in-variables for each component of interest. We use a multiplier bootstrap to compute critical values for simultaneous confidence intervals for a subset $S$ of the components. We show its validity despite of possible model selection mistakes, and allowing for the cardinality of $S$ to be larger than the sample size.
We apply and discuss the implications of our results to two examples and conduct Monte Carlo simulations to illustrate the performance of the proposed procedure.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Quantile Graphical Models: Prediction and Conditional Independence with Applications to Systemic Risk
Authors:
Alexandre Belloni,
Mingli Chen,
Victor Chernozhukov
Abstract:
We propose two types of Quantile Graphical Models (QGMs) --- Conditional Independence Quantile Graphical Models (CIQGMs) and Prediction Quantile Graphical Models (PQGMs). CIQGMs characterize the conditional independence of distributions by evaluating the distributional dependence structure at each quantile index. As such, CIQGMs can be used for validation of the graph structure in the causal graph…
▽ More
We propose two types of Quantile Graphical Models (QGMs) --- Conditional Independence Quantile Graphical Models (CIQGMs) and Prediction Quantile Graphical Models (PQGMs). CIQGMs characterize the conditional independence of distributions by evaluating the distributional dependence structure at each quantile index. As such, CIQGMs can be used for validation of the graph structure in the causal graphical models (\cite{pearl2009causality, robins1986new, heckman2015causal}). One main advantage of these models is that we can apply them to large collections of variables driven by non-Gaussian and non-separable shocks. PQGMs characterize the statistical dependencies through the graphs of the best linear predictors under asymmetric loss functions. PQGMs make weaker assumptions than CIQGMs as they allow for misspecification. Because of QGMs' ability to handle large collections of variables and focus on specific parts of the distributions, we could apply them to quantify tail interdependence. The resulting tail risk network can be used for measuring systemic risk contributions that help make inroads in understanding international financial contagion and dependence structures of returns under downside market movements.
We develop estimation and inference methods for QGMs focusing on the high-dimensional case, where the number of variables in the graph is large compared to the number of observations. For CIQGMs, these methods and results include valid simultaneous choices of penalty functions, uniform rates of convergence, and confidence regions that are simultaneously valid. We also derive analogous results for PQGMs, which include new results for penalized quantile regressions in high-dimensional settings to handle misspecification, many controls, and a continuum of additional conditioning events.
△ Less
Submitted 28 October, 2019; v1 submitted 1 July, 2016;
originally announced July 2016.
-
Uniformly Valid Post-Regularization Confidence Regions for Many Functional Parameters in Z-Estimation Framework
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Ying Wei
Abstract:
In this paper we develop procedures to construct simultaneous confidence bands for $\tilde p$ potentially infinite-dimensional parameters after model selection for general moment condition models where $\tilde p$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the $\tilde p$ parameters is a functio…
▽ More
In this paper we develop procedures to construct simultaneous confidence bands for $\tilde p$ potentially infinite-dimensional parameters after model selection for general moment condition models where $\tilde p$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the $\tilde p$ parameters is a function. The procedure is based on the construction of score functions that satisfy certain orthogonality condition. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for $\tilde p \gg n$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.
△ Less
Submitted 3 February, 2019; v1 submitted 23 December, 2015;
originally announced December 2015.
-
Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions
Authors:
Alexandre Belloni,
Tengyuan Liang,
Hariharan Narayanan,
Alexander Rakhlin
Abstract:
We consider the problem of optimizing an approximately convex function over a bounded convex set in $\mathbb{R}^n$ using only function evaluations. The problem is reduced to sampling from an \emph{approximately} log-concave distribution using the Hit-and-Run method, which is shown to have the same $\mathcal{O}^*$ complexity as sampling from log-concave distributions. In addition to extend the anal…
▽ More
We consider the problem of optimizing an approximately convex function over a bounded convex set in $\mathbb{R}^n$ using only function evaluations. The problem is reduced to sampling from an \emph{approximately} log-concave distribution using the Hit-and-Run method, which is shown to have the same $\mathcal{O}^*$ complexity as sampling from log-concave distributions. In addition to extend the analysis for log-concave distributions to approximate log-concave distributions, the implementation of the 1-dimensional sampler of the Hit-and-Run walk requires new methods and analysis. The algorithm then is based on simulated annealing which does not relies on first order conditions which makes it essentially immune to local minima.
We then apply the method to different motivating problems. In the context of zeroth order stochastic convex optimization, the proposed method produces an $ε$-minimizer after $\mathcal{O}^*(n^{7.5}ε^{-2})$ noisy function evaluations by inducing a $\mathcal{O}(ε/n)$-approximately log concave distribution. We also consider in detail the case when the "amount of non-convexity" decays towards the optimum of the function. Other applications of the method discussed in this work include private computation of empirical risk minimizers, two-stage stochastic programming, and approximate dynamic programming for online learning.
△ Less
Submitted 15 June, 2015; v1 submitted 28 January, 2015;
originally announced January 2015.
-
An $\{l_1,l_2,l_{\infty}\}$-Regularization Approach to High-Dimensional Errors-in-variables Models
Authors:
Alexandre Belloni,
Mathieu Rosenbaum,
Alexandre B. Tsybakov
Abstract:
Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some $\bar δ$, and (2) zero mean independent observation errors. Unde…
▽ More
Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some $\bar δ$, and (2) zero mean independent observation errors. Under the first assumption, the rates of convergence of the proposed estimators depend explicitly on $\bar δ$, while the second assumption has been applied when an estimator for the second moment of the observational error is available. This work proposes and studies two new estimators which, compared to other procedures for regression models with errors in the design, exploit an additional $l_{\infty}$-norm regularization. The first estimator is applicable when both (1) and (2) hold but does not require an estimator for the second moment of the observational error. The second estimator is applicable under (2) and requires an estimator for the second moment of the observation error. Importantly, we impose no assumption on the accuracy of this pilot estimator, in contrast to the previously known procedures. As the recent proposals, we allow the number of covariates to be much larger than the sample size. We establish the rates of convergence of the estimators and compare them with the bounds obtained for related estimators in the literature. These comparisons show interesting insights on the interplay of the assumptions and the achievable rates of convergence.
△ Less
Submitted 22 December, 2014;
originally announced December 2014.
-
Linear and Conic Programming Estimators in High-Dimensional Errors-in-variables Models
Authors:
Alexandre Belloni,
Mathieu Rosenbaum,
Alexandre Tsybakov
Abstract:
We consider the linear regression model with observation error in the design. In this setting, we allow the number of covariates to be much larger than the sample size. Several new estimation methods have been recently introduced for this model. Indeed, the standard Lasso estimator or Dantzig selector turn out to become unreliable when only noisy regressors are available, which is quite common in…
▽ More
We consider the linear regression model with observation error in the design. In this setting, we allow the number of covariates to be much larger than the sample size. Several new estimation methods have been recently introduced for this model. Indeed, the standard Lasso estimator or Dantzig selector turn out to become unreliable when only noisy regressors are available, which is quite common in practice. We show in this work that under suitable sparsity assumptions, the procedure introduced in Rosenbaum and Tsybakov (2013) is almost optimal in a minimax sense and, despite non-convexities, can be efficiently computed by a single linear programming problem. Furthermore, we provide an estimator attaining the minimax efficiency bound. This estimator is written as a second order cone programming minimisation problem which can be solved numerically in polynomial time.
△ Less
Submitted 3 July, 2016; v1 submitted 1 August, 2014;
originally announced August 2014.
-
Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Kengo Kato
Abstract:
This work proposes new inference methods for a regression coefficient of interest in a (heterogeneous) quantile regression model. We consider a high-dimensional model where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation to the conditional quantile function. The proposed methods are (explicitly or implicitly) based o…
▽ More
This work proposes new inference methods for a regression coefficient of interest in a (heterogeneous) quantile regression model. We consider a high-dimensional model where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation to the conditional quantile function. The proposed methods are (explicitly or implicitly) based on orthogonal score functions that protect against moderate model selection mistakes, which are often inevitable in the approximately sparse model considered in the present paper. We establish the uniform validity of the proposed confidence regions for the quantile regression coefficient. Importantly, these methods directly apply to more than one variable and a continuum of quantile indices. In addition, the performance of the proposed methods is illustrated through Monte-Carlo experiments and an empirical example, dealing with risk factors in childhood malnutrition.
△ Less
Submitted 23 June, 2016; v1 submitted 26 December, 2013;
originally announced December 2013.
-
Program Evaluation and Causal Inference with High-Dimensional Data
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Ivan Fernández-Val,
Christian Hansen
Abstract:
In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenou…
▽ More
In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenous receipt of treatment, either conditional on controls or unconditionally as in randomized control trials. In the latter case, our approach produces efficient estimators and honest bands for (functional) average treatment effects (ATE) and quantile treatment effects (QTE). To make informative inference possible, we assume that key reduced form predictive relationships are approximately sparse. This assumption allows the use of regularization and selection methods to estimate those relations, and we provide methods for post-regularization and post-selection inference that are uniformly valid (honest) across a wide-range of models. We show that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility and participation on accumulated assets.
△ Less
Submitted 5 January, 2018; v1 submitted 11 November, 2013;
originally announced November 2013.
-
Supplementary Appendix for "Inference on Treatment Effects After Selection Amongst High-Dimensional Controls"
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Christian Hansen
Abstract:
In this supplementary appendix we provide additional results, omitted proofs and extensive simulations that complement the analysis of the main text (arXiv:1201.0224).
In this supplementary appendix we provide additional results, omitted proofs and extensive simulations that complement the analysis of the main text (arXiv:1201.0224).
△ Less
Submitted 20 June, 2013; v1 submitted 26 May, 2013;
originally announced May 2013.
-
Post-Selection Inference for Generalized Linear Models with Many Controls
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Ying Wei
Abstract:
This paper considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunize against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regres…
▽ More
This paper considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunize against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest $α_0$, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate $α_0$ at the root-$n$ rate when the total number $p$ of other regressors, called controls, potentially exceed the sample size $n$ using sparsity assumptions. The sparsity assumption means that there is a subset of $s<n$ controls which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over $s$-sparse models satisfying $s^2\log^2 p = o(n)$ and other technical conditions. These procedures do not rely on traditional consistent model selection arguments for their validity. In fact, they are robust with respect to moderate model selection mistakes in variable selection. Under suitable conditions, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper.
△ Less
Submitted 21 March, 2016; v1 submitted 14 April, 2013;
originally announced April 2013.
-
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Kengo Kato
Abstract:
We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against non-regular estimation of the nuisance part of the median regression function by using Neyman's orthogonalization. We establish that the resulting instrumental median regression…
▽ More
We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against non-regular estimation of the nuisance part of the median regression function by using Neyman's orthogonalization. We establish that the resulting instrumental median regression estimator of a target regression coefficient is asymptotically normally distributed uniformly with respect to the underlying sparse model and is semi-parametrically efficient. We also generalize our method to a general non-smooth Z-estimation framework with the number of target parameters $p_1$ being possibly much larger than the sample size $n$. We extend Huber's results on asymptotic normality to this setting, demonstrating uniform asymptotic normality of the proposed estimators over $p_1$-dimensional rectangles, constructing simultaneous confidence bands on all of the $p_1$ target parameters, and establishing asymptotic validity of the bands uniformly over underlying approximately sparse models.
Keywords: Instrument; Post-selection inference; Sparsity; Neyman's Orthogonal Score test; Uniformly valid inference; Z-estimation.
△ Less
Submitted 18 October, 2020; v1 submitted 31 March, 2013;
originally announced April 2013.
-
Approximate group context tree
Authors:
Alexandre Belloni,
Roberto I. Oliveira
Abstract:
We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity…
▽ More
We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $β$-mixing. In particular, model misspecification is allowed.
These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes' continuity rates. Similar guarantees are also derived for renewal processes.
Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work, by Galves et al (2012), of the rhythmic differences between Brazilian and European Portuguese.
△ Less
Submitted 30 December, 2015; v1 submitted 1 July, 2011;
originally announced July 2011.
-
Conditional Quantile Processes based on Series or Many Regressors
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Iván Fernández-Val
Abstract:
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In…
▽ More
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the QR-series coefficient process, namely we obtain uniform strong approximations to the QR-series coefficient process by conditionally pivotal and Gaussian processes. Based on these strong approximations, or couplings, we develop four resampling methods (pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be used for inference on the entire QR-series coefficient function.
We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence and show how to use the four resampling methods mentioned above for inference on the functionals. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and the covariate value, and covering the pointwise case as a by-product. We demonstrate the practical utility of these results with an example, where we estimate the price elasticity function and test the Slutsky condition of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.
△ Less
Submitted 9 August, 2018; v1 submitted 30 May, 2011;
originally announced May 2011.
-
Pivotal estimation via square-root Lasso in nonparametric regression
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Lie Wang
Abstract:
We propose a self-tuning $\sqrt{\mathrm {Lasso}}$ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even i…
▽ More
We propose a self-tuning $\sqrt{\mathrm {Lasso}}$ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}}$ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}}$ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}}$ is as good as $\sqrt{\mathrm {Lasso}}$'s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}}$ and ols post $\sqrt{\mathrm {Lasso}}$ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.
△ Less
Submitted 26 May, 2014; v1 submitted 7 May, 2011;
originally announced May 2011.
-
LASSO Methods for Gaussian Instrumental Variables Models
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Christian Hansen
Abstract:
In this note, we propose to use sparse methods (e.g. LASSO, Post-LASSO, sqrt-LASSO, and Post-sqrt-LASSO) to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments in the canonical Gaussian case. The methods apply even when the number of instruments is much larger than the sample size. We derive asymptotic distributions for t…
▽ More
In this note, we propose to use sparse methods (e.g. LASSO, Post-LASSO, sqrt-LASSO, and Post-sqrt-LASSO) to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments in the canonical Gaussian case. The methods apply even when the number of instruments is much larger than the sample size. We derive asymptotic distributions for the resulting IV estimators and provide conditions under which these sparsity-based IV estimators are asymptotically oracle-efficient. In simulation experiments, a sparsity-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument-robust procedures. We illustrate the procedure in an empirical example using the Angrist and Krueger (1991) schooling data.
△ Less
Submitted 23 February, 2011; v1 submitted 6 December, 2010;
originally announced December 2010.
-
Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain
Authors:
Alexandre Belloni,
Daniel Chen,
Victor Chernozhukov,
Christian Hansen
Abstract:
We develop results for the use of Lasso and Post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, $p$. Our results apply even when $p$ is much larger than the sample size, $n$. We show that the IV estimator based on using Lasso or Post-Lasso in the first stage is root-n consistent and asymptotically n…
▽ More
We develop results for the use of Lasso and Post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, $p$. Our results apply even when $p$ is much larger than the sample size, $n$. We show that the IV estimator based on using Lasso or Post-Lasso in the first stage is root-n consistent and asymptotically normal when the first-stage is approximately sparse; i.e. when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show the estimator is semi-parametrically efficient when the structural error is homoscedastic. Notably our results allow for imperfect model selection, and do not rely upon the unrealistic "beta-min" conditions that are widely used to establish validity of inference following model selection. In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument-robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark.
In developing the IV results, we establish a series of new results for Lasso and Post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances which uses a data-weighted $\ell_1$-penalty function. Using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and Post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that $\log p = o(n^{1/3})$.
△ Less
Submitted 19 April, 2015; v1 submitted 20 October, 2010;
originally announced October 2010.
-
Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Lie Wang
Abstract:
We propose a pivotal method for estimating high-dimensional sparse linear regression models, where the overall number of regressors $p$ is large, possibly much larger than $n$, but only $s$ regressors are significant. The method is a modification of the lasso, called the square-root lasso. The method is pivotal in that it neither relies on the knowledge of the standard deviation $σ$ or nor does it…
▽ More
We propose a pivotal method for estimating high-dimensional sparse linear regression models, where the overall number of regressors $p$ is large, possibly much larger than $n$, but only $s$ regressors are significant. The method is a modification of the lasso, called the square-root lasso. The method is pivotal in that it neither relies on the knowledge of the standard deviation $σ$ or nor does it need to pre-estimate $σ$. Moreover, the method does not rely on normality or sub-Gaussianity of noise. It achieves near-oracle performance, attaining the convergence rate $σ\{(s/n)\log p\}^{1/2}$ in the prediction norm, and thus matching the performance of the lasso with known $σ$. These performance results are valid for both Gaussian and non-Gaussian errors, under some mild moment restrictions. We formulate the square-root lasso as a solution to a convex conic programming problem, which allows us to implement the estimator using efficient algorithmic methods, such as interior-point and first-order methods.
△ Less
Submitted 18 December, 2011; v1 submitted 28 September, 2010;
originally announced September 2010.
-
Least squares after model selection in high-dimensional sparse models
Authors:
Alexandre Belloni,
Victor Chernozhukov
Abstract:
In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of…
▽ More
In this article we study post-model selection estimators that apply ordinary least squares (OLS) to the model selected by first-step penalized estimators, typically Lasso. It is well known that Lasso can estimate the nonparametric regression function at nearly the oracle rate, and is thus hard to improve upon. We show that the OLS post-Lasso estimator performs at least as well as Lasso in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the Lasso-based model selection "fails" in the sense of missing some components of the "true" regression model. By the "true" model, we mean the best s-dimensional approximation to the nonparametric regression function chosen by the oracle. Furthermore, OLS post-Lasso estimator can perform strictly better than Lasso, in the sense of a strictly faster rate of convergence, if the Lasso-based model selection correctly includes all components of the "true" model as a subset and also achieves sufficient sparsity. In the extreme case, when Lasso perfectly selects the "true" model, the OLS post-Lasso estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by Lasso, which guarantees that this dimension is at most of the same order as the dimension of the "true" model. Our rate results are nonasymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the Lasso estimator acting as a selector in the first step, but also applies to any other estimator, for example, various forms of thresholded Lasso, with good rates and good sparsity properties. Our analysis covers both traditional thresholding and a new practical, data-driven thresholding scheme that induces additional sparsity subject to maintaining a certain goodness of fit. The latter scheme has theoretical guarantees similar to those of Lasso or OLS post-Lasso, but it dominates those procedures as well as traditional thresholding in a wide variety of experiments.
△ Less
Submitted 20 March, 2013; v1 submitted 31 December, 2009;
originally announced January 2010.
-
On multivariate quantiles under partial orders
Authors:
Alexandre Belloni,
Robert L. Winkler
Abstract:
This paper focuses on generalizing quantiles from the ordering point of view. We propose the concept of partial quantiles, which are based on a given partial order. We establish that partial quantiles are equivariant under order-preserving transformations of the data, robust to outliers, characterize the probability distribution if the partial order is sufficiently rich, generalize the concept of…
▽ More
This paper focuses on generalizing quantiles from the ordering point of view. We propose the concept of partial quantiles, which are based on a given partial order. We establish that partial quantiles are equivariant under order-preserving transformations of the data, robust to outliers, characterize the probability distribution if the partial order is sufficiently rich, generalize the concept of efficient frontier, and can measure dispersion from the partial order perspective. We also study several statistical aspects of partial quantiles. We provide estimators, associated rates of convergence, and asymptotic distributions that hold uniformly over a continuum of quantile indices. Furthermore, we provide procedures that can restore monotonicity properties that might have been disturbed by estimation error, establish computational complexity bounds, and point out a concentration of measure phenomenon (the latter under independence and the componentwise natural order). Finally, we illustrate the concepts by discussing several theoretical examples and simulations. Empirical applications to compare intake nutrients within diets, to evaluate the performance of investment funds, and to study the impact of policies on tobacco awareness are also presented to illustrate the concepts and their use.
△ Less
Submitted 30 May, 2011; v1 submitted 29 December, 2009;
originally announced December 2009.
-
Posterior Inference in Curved Exponential Families under Increasing Dimensions
Authors:
Alexandre Belloni,
Victor Chernozhukov
Abstract:
This work studies the large sample properties of the posterior-based inference in the curved exponential family under increasing dimension. The curved structure arises from the imposition of various restrictions on the model, such as moment restrictions, and plays a fundamental role in econometrics and others branches of data analysis. We establish conditions under which the posterior distribution…
▽ More
This work studies the large sample properties of the posterior-based inference in the curved exponential family under increasing dimension. The curved structure arises from the imposition of various restrictions on the model, such as moment restrictions, and plays a fundamental role in econometrics and others branches of data analysis. We establish conditions under which the posterior distribution is approximately normal, which in turn implies various good properties of estimation and inference procedures based on the posterior. In the process we also revisit and improve upon previous results for the exponential family under increasing dimension by making use of concentration of measure. We also discuss a variety of applications to high-dimensional versions of the classical econometric models including the multinomial model with moment restrictions, seemingly unrelated regression equations, and single structural equation models. In our analysis, both the parameter dimension and the number of moments are increasing with the sample size.
△ Less
Submitted 22 April, 2014; v1 submitted 20 April, 2009;
originally announced April 2009.
-
L1-Penalized Quantile Regression in High-Dimensional Sparse Models
Authors:
Alexandre Belloni,
Victor Chernozhukov
Abstract:
We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models the overall number of regressors $p$ is very large, possibly larger than the sample size $n$, but only $s$ of these regressors have non-zero impact on the conditional quantile of the response variable, where $s$ grows slower than $n$. We consi…
▽ More
We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models the overall number of regressors $p$ is very large, possibly larger than the sample size $n$, but only $s$ of these regressors have non-zero impact on the conditional quantile of the response variable, where $s$ grows slower than $n$. We consider quantile regression penalized by the $\ell_1$-norm of coefficients ($\ell_1$-QR). First, we show that $\ell_1$-QR is consistent at the rate $\sqrt{s/n} \sqrt{\log p}$. The overall number of regressors $p$ affects the rate only through the $\log p$ factor, thus allowing nearly exponential growth in the number of zero-impact regressors. The rate result holds under relatively weak conditions, requiring that $s/n$ converges to zero at a super-logarithmic speed and that regularization parameter satisfies certain theoretical constraints. Second, we propose a pivotal, data-driven choice of the regularization parameter and show that it satisfies these theoretical constraints. Third, we show that $\ell_1$-QR correctly selects the true minimal model as a valid submodel, when the non-zero coefficients of the true model are well separated from zero. We also show that the number of non-zero coefficients in $\ell_1$-QR is of same stochastic order as $s$. Fourth, we analyze the rate of convergence of a two-step estimator that applies ordinary quantile regression to the selected model. Fifth, we evaluate the performance of $\ell_1$-QR in a Monte-Carlo experiment, and illustrate its use on an international economic growth application.
△ Less
Submitted 26 September, 2019; v1 submitted 19 April, 2009;
originally announced April 2009.
-
On the Behrens--Fisher problem: A globally convergent algorithm and a finite-sample study of the Wald, LR and LM Tests
Authors:
Alexandre Belloni,
Gustavo Didier
Abstract:
In this paper we provide a provably convergent algorithm for the multivariate Gaussian Maximum Likelihood version of the Behrens--Fisher Problem. Our work builds upon a formulation of the log-likelihood function proposed by Buot and Richards \citeBR. Instead of focusing on the first order optimality conditions, the algorithm aims directly for the maximization of the log-likelihood function itsel…
▽ More
In this paper we provide a provably convergent algorithm for the multivariate Gaussian Maximum Likelihood version of the Behrens--Fisher Problem. Our work builds upon a formulation of the log-likelihood function proposed by Buot and Richards \citeBR. Instead of focusing on the first order optimality conditions, the algorithm aims directly for the maximization of the log-likelihood function itself to achieve a global solution. Convergence proof and complexity estimates are provided for the algorithm. Computational experiments illustrate the applicability of such methods to high-dimensional data. We also discuss how to extend the proposed methodology to a broader class of problems. We establish a systematic algebraic relation between the Wald, Likelihood Ratio and Lagrangian Multiplier Test ($W\geq \mathit{LR}\geq \mathit{LM}$) in the context of the Behrens--Fisher Problem. Moreover, we use our algorithm to computationally investigate the finite-sample size and power of the Wald, Likelihood Ratio and Lagrange Multiplier Tests, which previously were only available through asymptotic results. The methods developed here are applicable to much higher dimensional settings than the ones available in the literature. This allows us to better capture the role of high dimensionality on the actual size and power of the tests for finite samples.
△ Less
Submitted 5 November, 2008;
originally announced November 2008.
-
On the Computational Complexity of MCMC-based Estimators in Large Samples
Authors:
Alexandre Belloni,
Victor Chernozhukov
Abstract:
In this paper we examine the implications of the statistical large sample theory for the computational complexity of Bayesian and quasi-Bayesian estimation carried out using Metropolis random walks. Our analysis is motivated by the Laplace-Bernstein-Von Mises central limit theorem, which states that in large samples the posterior or quasi-posterior approaches a normal density. Using the conditions…
▽ More
In this paper we examine the implications of the statistical large sample theory for the computational complexity of Bayesian and quasi-Bayesian estimation carried out using Metropolis random walks. Our analysis is motivated by the Laplace-Bernstein-Von Mises central limit theorem, which states that in large samples the posterior or quasi-posterior approaches a normal density. Using the conditions required for the central limit theorem to hold, we establish polynomial bounds on the computational complexity of general Metropolis random walks methods in large samples. Our analysis covers cases where the underlying log-likelihood or extremum criterion function is possibly non-concave, discontinuous, and with increasing parameter dimension. However, the central limit theorem restricts the deviations from continuity and log-concavity of the log-likelihood or extremum criterion function in a very specific manner.
Under minimal assumptions required for the central limit theorem to hold under the increasing parameter dimension, we show that the Metropolis algorithm is theoretically efficient even for the canonical Gaussian walk which is studied in detail. Specifically, we show that the running time of the algorithm in large samples is bounded in probability by a polynomial in the parameter dimension $d$, and, in particular, is of stochastic order $d^2$ in the leading cases after the burn-in period. We then give applications to exponential families, curved exponential families, and Z-estimation of increasing dimension.
△ Less
Submitted 24 January, 2012; v1 submitted 17 April, 2007;
originally announced April 2007.