-
Online Bootstrap Inference with Nonconvex Stochastic Gradient Descent Estimator
Authors:
Yanjie Zhong,
Todd Kuffner,
Soumendra Lahiri
Abstract:
In this paper, we investigate the theoretical properties of stochastic gradient descent (SGD) for statistical inference in the context of nonconvex optimization problems, which have been relatively unexplored compared to convex settings. Our study is the first to establish provable inferential procedures using the SGD estimator for general nonconvex objective functions, which may contain multiple…
▽ More
In this paper, we investigate the theoretical properties of stochastic gradient descent (SGD) for statistical inference in the context of nonconvex optimization problems, which have been relatively unexplored compared to convex settings. Our study is the first to establish provable inferential procedures using the SGD estimator for general nonconvex objective functions, which may contain multiple local minima.
We propose two novel online inferential procedures that combine SGD and the multiplier bootstrap technique. The first procedure employs a consistent covariance matrix estimator, and we establish its error convergence rate. The second procedure approximates the limit distribution using bootstrap SGD estimators, yielding asymptotically valid bootstrap confidence intervals. We validate the effectiveness of both approaches through numerical experiments.
Furthermore, our analysis yields an intermediate result: the in-expectation error convergence rate for the original SGD estimator in nonconvex settings, which is comparable to existing results for convex problems. We believe this novel finding holds independent interest and enriches the literature on optimization and statistical inference.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Bayesian Inference on Volatility in the Presence of Infinite Jump Activity and Microstructure Noise
Authors:
Qi Wang,
José E. Figueroa-López,
Todd Kuffner
Abstract:
Volatility estimation based on high-frequency data is key to accurately measure and control the risk of financial assets. A Lévy process with infinite jump activity and microstructure noise is considered one of the simplest, yet accurate enough, models for financial data at high-frequency. Utilizing this model, we propose a "purposely misspecified" posterior of the volatility obtained by ignoring…
▽ More
Volatility estimation based on high-frequency data is key to accurately measure and control the risk of financial assets. A Lévy process with infinite jump activity and microstructure noise is considered one of the simplest, yet accurate enough, models for financial data at high-frequency. Utilizing this model, we propose a "purposely misspecified" posterior of the volatility obtained by ignoring the jump-component of the process. The misspecified posterior is further corrected by a simple estimate of the location shift and re-scaling of the log likelihood. Our main result establishes a Bernstein-von Mises (BvM) theorem, which states that the proposed adjusted posterior is asymptotically Gaussian, centered at a consistent estimator, and with variance equal to the inverse of the Fisher information. In the absence of microstructure noise, our approach can be extended to inferences of the integrated variance of a general Itô semimartingale. Simulations are provided to demonstrate the accuracy of the resulting credible intervals, and the frequentist properties of the approximate Bayesian inference based on the adjusted posterior.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
Block bootstrap optimality for density estimation with dependent data
Authors:
Todd A. Kuffner,
Stephen M. -S. Lee,
G. Alastair Young
Abstract:
Accurate approximation of the sampling distribution of nonparametric kernel density estimators is crucial for many statistical inference problems. Since these estimators have complex asymptotic distributions, bootstrap methods are often used for this purpose. With i.i.d. observations, a large literature exists concerning optimal bootstrap methods which achieve the fastest possible convergence rate…
▽ More
Accurate approximation of the sampling distribution of nonparametric kernel density estimators is crucial for many statistical inference problems. Since these estimators have complex asymptotic distributions, bootstrap methods are often used for this purpose. With i.i.d. observations, a large literature exists concerning optimal bootstrap methods which achieve the fastest possible convergence rate of the bootstrap estimator of the sampling distribution of the kernel density estimator. With dependent data, such an optimality theory is an important open problem. We establish a general theory of optimality of the block bootstrap for kernel density estimation under weak dependence assumptions which are satisfied by many important time series models. We propose a unified framework for a theoretical study of a rich class of bootstrap methods which include as special cases subsampling, Kunsch's moving block bootstrap, Hall's under-smoothing (UNS) as well as approaches incorporating no (NBC) or explicit bias correction (EBC). Moreover, we consider their accuracy under a broad spectrum of choices of the bandwidth $h$, which include as an important special case the MSE-optimal choice, as well as other under-smoothed choices. Under each choice of $h$, we derive the optimal tuning parameters and compare optimal performances between the main subclasses (EBC, NBC, UNS) of the bootstrap methods.
△ Less
Submitted 5 September, 2019;
originally announced September 2019.
-
On overfitting and post-selection uncertainty assessments
Authors:
Liang Hong,
Todd A. Kuffner,
Ryan Martin
Abstract:
In a regression context, when the relevant subset of explanatory variables is uncertain, it is common to use a data-driven model selection procedure. Classical linear model theory, applied naively to the selected sub-model, may not be valid because it ignores the selected sub-model's dependence on the data. We provide an explanation of this phenomenon, in terms of overfitting, for a class of model…
▽ More
In a regression context, when the relevant subset of explanatory variables is uncertain, it is common to use a data-driven model selection procedure. Classical linear model theory, applied naively to the selected sub-model, may not be valid because it ignores the selected sub-model's dependence on the data. We provide an explanation of this phenomenon, in terms of overfitting, for a class of model selection criteria.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Optimal hybrid block bootstrap for sample quantiles under weak dependence
Authors:
Todd A. Kuffner,
Stephen M. S. Lee,
G. Alastair Young
Abstract:
We establish a general theory of optimality for block bootstrap distribution estimation for sample quantiles under a mild strong mixing assumption. In contrast to existing results, we study the block bootstrap for varying numbers of blocks. This corresponds to a hybrid between the subsampling bootstrap and the moving block bootstrap (MBB), in which the number of blocks is somewhere between 1 and t…
▽ More
We establish a general theory of optimality for block bootstrap distribution estimation for sample quantiles under a mild strong mixing assumption. In contrast to existing results, we study the block bootstrap for varying numbers of blocks. This corresponds to a hybrid between the subsampling bootstrap and the moving block bootstrap (MBB), in which the number of blocks is somewhere between 1 and the ratio of sample size to block length. Our main theorem determines the optimal choice of the number of blocks and block length to achieve the best possible convergence rate for the block bootstrap distribution estimator for sample quantiles. As part of our analysis, we also prove an important lemma which gives the convergence rate of the block bootstrap distribution estimator, with implications even for the smooth function model. We propose an intuitive procedure for empirical selection of the optimal number and length of blocks. Relevant examples are presented which illustrate the benefits of optimally choosing the number of blocks.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
On the validity of the formal Edgeworth expansion for posterior densities
Authors:
John E. Kolassa,
Todd A. Kuffner
Abstract:
We consider a fundamental open problem in parametric Bayesian theory, namely the validity of the formal Edgeworth expansion of the posterior density. While the study of valid asymptotic expansions for posterior distributions constitutes a rich literature, the validity of the formal Edgeworth expansion has not been rigorously established. Several authors have claimed connections of various posterio…
▽ More
We consider a fundamental open problem in parametric Bayesian theory, namely the validity of the formal Edgeworth expansion of the posterior density. While the study of valid asymptotic expansions for posterior distributions constitutes a rich literature, the validity of the formal Edgeworth expansion has not been rigorously established. Several authors have claimed connections of various posterior expansions with the classical Edgeworth expansion, or have simply assumed its validity. Our main result settles this open problem. We also prove a lemma concerning the order of posterior cumulants which is of independent interest in Bayesian parametric theory. The most relevant literature is synthesized and compared to the newly-derived Edgeworth expansions. Numerical investigations illustrate that our expansion has the behavior expected of an Edgeworth expansion, and that it has better performance than the other existing expansion which was previously claimed to be of Edgeworth-type.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Bayes factor consistency
Authors:
Siddhartha Chib,
Todd A. Kuffner
Abstract:
Good large sample performance is typically a minimum requirement of any model selection criterion. This article focuses on the consistency property of the Bayes factor, a commonly used model comparison tool, which has experienced a recent surge of attention in the literature. We thoroughly review existing results. As there exists such a wide variety of settings to be considered, e.g. parametric vs…
▽ More
Good large sample performance is typically a minimum requirement of any model selection criterion. This article focuses on the consistency property of the Bayes factor, a commonly used model comparison tool, which has experienced a recent surge of attention in the literature. We thoroughly review existing results. As there exists such a wide variety of settings to be considered, e.g. parametric vs. nonparametric, nested vs. non-nested, etc., we adopt the view that a unified framework has didactic value. Using the basic marginal likelihood identity of Chib (1995), we study Bayes factor asymptotics by decomposing the natural logarithm of the ratio of marginal likelihoods into three components. These are, respectively, log ratios of likelihoods, prior densities, and posterior densities. This yields an interpretation of the log ratio of posteriors as a penalty term, and emphasizes that to understand Bayes factor consistency, the prior support conditions driving posterior consistency in each respective model under comparison should be contrasted in terms of the rates of posterior contraction they imply.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Quantifying nuisance parameter effects via decompositions of asymptotic refinements for likelihood-based statistics
Authors:
Thomas J. DiCiccio,
Todd A. Kuffner,
G. Alastair Young
Abstract:
Accurate inference on a scalar interest parameter in the presence of a nuisance parameter may be obtained using an adjusted version of the signed root likelihood ratio statistic, in particular Barndorff-Nielsen's $R^*$ statistic. The adjustment made by this statistic may be decomposed into a sum of two terms, interpreted as correcting respectively for the possible effect of nuisance parameters and…
▽ More
Accurate inference on a scalar interest parameter in the presence of a nuisance parameter may be obtained using an adjusted version of the signed root likelihood ratio statistic, in particular Barndorff-Nielsen's $R^*$ statistic. The adjustment made by this statistic may be decomposed into a sum of two terms, interpreted as correcting respectively for the possible effect of nuisance parameters and the deviation from standard normality of the signed root likelihood ratio statistic itself. We show that the adjustment terms are determined to second-order in the sample size by their means. Explicit expressions are obtained for the leading terms in asymptotic expansions of these means. These are easily calculated, allowing a simple way of quantifying and interpreting the respective effects of the two adjustments, in particular of the effect of a high dimensional nuisance parameter. Illustrations are given for a number of examples, which provide theoretical insight to the effect of nuisance parameters on parametric inference. The analysis provides a decomposition of the mean of the signed root statistic involving two terms: the first has the property of taking the same value whether there are no nuisance parameters or whether there is an orthogonal nuisance parameter, while the second is zero when there are no nuisance parameters. Similar decompositions are discussed for the Bartlett correction factor of the likelihood ratio statistic, and for other asymptotically standard normal pivots.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
Stability and uniqueness of $p$-values for likelihood-based inference
Authors:
Thomas J. DiCiccio,
Todd A. Kuffner,
G. Alastair Young,
Russell Zaretzki
Abstract:
Likelihood-based methods of statistical inference provide a useful general methodology that is appealing, as a straightforward asymptotic theory can be applied for their implementation. It is important to assess the relationships between different likelihood-based inferential procedures in terms of accuracy and adherence to key principles of statistical inference, in particular those relating to c…
▽ More
Likelihood-based methods of statistical inference provide a useful general methodology that is appealing, as a straightforward asymptotic theory can be applied for their implementation. It is important to assess the relationships between different likelihood-based inferential procedures in terms of accuracy and adherence to key principles of statistical inference, in particular those relating to conditioning on relevant ancillary statistics. An analysis is given of the stability properties of a general class of likelihood-based statistics, including those derived from forms of adjusted profile likelihood, and comparisons are made between inferences derived from different statistics. In particular, we derive a set of sufficient conditions for agreement to $O_{p}(n^{-1})$, in terms of the sample size $n$, of inferences, specifically $p$-values, derived from different asymptotically standard normal pivots. Our analysis includes inference problems concerning a scalar or vector interest parameter, in the presence of a nuisance parameter.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
Objective Bayes, conditional inference and the signed root likelihood ratio statistic
Authors:
Thomas J. DiCiccio,
Todd A. Kuffner,
G. Alastair Young
Abstract:
Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic.…
▽ More
Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic. Conditions for conditional probability matching in ancillary statistic models are derived and discussed.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.