-
Model-free identification in ill-posed regression
Authors:
Gianluca Finocchio,
Tatyana Krivobokova
Abstract:
The problem of parsimonious parameter identification in possibly high-dimensional linear regression with highly correlated features is addressed. This problem is formalized as the estimation of the best, in a certain sense, linear combinations of the features that are relevant to the response variable. Importantly, the dependence between the features and the response is allowed to be arbitrary. Ne…
▽ More
The problem of parsimonious parameter identification in possibly high-dimensional linear regression with highly correlated features is addressed. This problem is formalized as the estimation of the best, in a certain sense, linear combinations of the features that are relevant to the response variable. Importantly, the dependence between the features and the response is allowed to be arbitrary. Necessary and sufficient conditions for such parsimonious identification -- referred to as statistical interpretability -- are established for a broad class of linear dimensionality reduction algorithms. Sharp bounds on their estimation errors, with high probability, are derived. To our knowledge, this is the first formal framework that enables the definition and assessment of the interpretability of a broad class of algorithms. The results are specifically applied to methods based on sparse regression, unsupervised projection and sufficient reduction. The implications of employing such methods for prediction problems are discussed in the context of the prolific literature on overparametrized methods in the regime of benign overfitting.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Nonparametric spectral density estimation using interactive mechanisms under local differential privacy
Authors:
Cristina Butucea,
Karolina Klockmann,
Tatyana Krivobokova
Abstract:
We address the problem of nonparametric estimation of the spectral density for a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: estimating a single covariance coefficient, estimating the spectral density at a fixed frequency, and estimating the entire spectral density function. Our…
▽ More
We address the problem of nonparametric estimation of the spectral density for a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: estimating a single covariance coefficient, estimating the spectral density at a fixed frequency, and estimating the entire spectral density function. Our approach achieves faster rates through a two-stage process: we apply first the Laplace mechanism to the truncated value and then use the former privatized sample to gain knowledge on the dependence mechanism in the time series. For spectral densities belonging to Hölder and Sobolev smoothness classes, we demonstrate that our estimators improve upon the non-interactive mechanism of Kroll (2024) for small privacy parameter $α$, since the pointwise rates depend on $nα^2$ instead of $nα^4$. Moreover, we show that the rate $(nα^4)^{-1}$ is optimal for estimating a covariance coefficient with non-interactive mechanisms. However, the $L_2$ rate of our interactive estimator is slower than the pointwise rate. We show how to use these estimators to provide a bona-fide locally differentially private covariance matrix estimator.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
An extended latent factor framework for ill-posed linear regression
Authors:
Gianluca Finocchio,
Tatyana Krivobokova
Abstract:
The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squ…
▽ More
The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squares (PLS) algorithm under random design. In particular, a novel perturbation bound for PLS solutions is proven and the high-probability $L^2$-estimation rate for the PLS estimator is obtained. This novel framework also sheds light on the performance of other regularisation methods for ill-posed linear regression that exploit sparsity or unsupervised projection. The theoretical findings are confirmed by numerical studies on both real and simulated data.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
On Second-Order Statistics of the Log-Average Periodogram for Gaussian Processes
Authors:
Karolina Klockmann,
Tatyana Krivobokova
Abstract:
We present an approximate expression for the covariance of the log-average periodogram for a zero mean stationary Gaussian process. Our findings extend the work of [1] on the covariance of the log-periodogram by additionally taking averaging over adjacent frequencies into account. Moreover, we provide a simple expression for the non-integer moments of a non-central chi-squared distribution.
We present an approximate expression for the covariance of the log-average periodogram for a zero mean stationary Gaussian process. Our findings extend the work of [1] on the covariance of the log-periodogram by additionally taking averaging over adjacent frequencies into account. Moreover, we provide a simple expression for the non-integer moments of a non-central chi-squared distribution.
△ Less
Submitted 9 October, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Efficient nonparametric estimation of Toeplitz covariance matrices
Authors:
Karolina Klockmann,
Tatyana Krivobokova
Abstract:
A new nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very f…
▽ More
A new nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the Discrete Cosine Transform (DCT) is proposed. The method is implemented in the R package vstdct that accompanies the paper.
△ Less
Submitted 5 January, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Uniformly Valid Inference Based on the Lasso in Linear Mixed Models
Authors:
Peter Kramlinger,
Ulrike Schneider,
Tatyana Krivobokova
Abstract:
Linear mixed models (LMMs) are suitable for clustered data and are common in biometrics, medicine, survey statistics and many other fields. In those applications, it is essential to carry out valid inference after selecting a subset of the available variables. We construct confidence sets for the fixed effects in Gaussian LMMs that are based on Lasso-type estimators. Aside from providing confidenc…
▽ More
Linear mixed models (LMMs) are suitable for clustered data and are common in biometrics, medicine, survey statistics and many other fields. In those applications, it is essential to carry out valid inference after selecting a subset of the available variables. We construct confidence sets for the fixed effects in Gaussian LMMs that are based on Lasso-type estimators. Aside from providing confidence regions, this also allows to quantify the joint uncertainty of both variable selection and parameter estimation in the procedure. To show that the resulting confidence sets for the fixed effects are uniformly valid over the parameter spaces of both the regression coefficients and the covariance parameters, we also prove the novel result on uniform Cramer consistency of the restricted maximum likelihood (REML) estimators of the covariance parameters. The superiority of the constructed confidence sets to naive post-selection procedures is validated in simulations and illustrated with a study of the acid neutralization capacity of lakes in the United States.
△ Less
Submitted 16 August, 2023; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Threshold Selection in Univariate Extreme Value Analysis
Authors:
Laura Fee Schneider,
Andrea Krajina,
Tatyana Krivobokova
Abstract:
Threshold selection plays a key role for various aspects of statistical inference of rare events. Most classical approaches tackling this problem for heavy-tailed distributions crucially depend on tuning parameters or critical values to be chosen by the practitioner. To simplify the use of automated, data-driven threshold selection methods, we introduce two new procedures not requiring the manual…
▽ More
Threshold selection plays a key role for various aspects of statistical inference of rare events. Most classical approaches tackling this problem for heavy-tailed distributions crucially depend on tuning parameters or critical values to be chosen by the practitioner. To simplify the use of automated, data-driven threshold selection methods, we introduce two new procedures not requiring the manual choice of any parameters. The first method measures the deviation of the log-spacings from the exponential distribution and achieves good performance in simulations for estimating high quantiles. The second approach smoothly estimates the asymptotic mean square error of the Hill estimator and performs consistently well over a wide range of distributions. The methods are compared to existing procedures in an extensive simulation study and applied to a dataset of financial losses, where the underlying extreme value index is assumed to vary over time. This application strongly emphasizes the importance of solid automated threshold selection.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
Marginal and Conditional Multiple Inference for Linear Mixed Model Predictors
Authors:
Peter Kramlinger,
Tatyana Krivobokova,
Stefan Sperlich
Abstract:
In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for cluster specifi…
▽ More
In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for cluster specific predictors. Consistent confidence sets for multiple inference are constructed under both, the marginal and the conditional law. Furthermore, it is shown that, remarkably, corresponding multiple marginal confidence sets are also asymptotically valid for conditional inference. Those lend themselves for testing linear hypotheses using standard quantiles without the need of re-sampling techniques. All findings are validated in simulations and illustrated along a study on Covid-19 mortality in US state prisons.
△ Less
Submitted 16 February, 2022; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Adaptive Non-parametric Estimation of Mean and Autocovariance in Regression with Dependent Errors
Authors:
Tatyana Krivobokova,
Paulo Serra,
Francisco Rosales,
Karolina Klockmann
Abstract:
Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. Our empirical Bayes approach is data-driven, numerically efficient and allows for the construction of confidence sets for the mea…
▽ More
Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. Our empirical Bayes approach is data-driven, numerically efficient and allows for the construction of confidence sets for the mean function. Performance is demonstrated in simulations and real data analysis. The method is implemented in the R package eBsc that accompanies the paper.
△ Less
Submitted 18 August, 2021; v1 submitted 17 December, 2018;
originally announced December 2018.
-
Asymptotic Distribution and Simultaneous Confidence Bands for Ratios of Quantile Functions
Authors:
Fabian Dunker,
Stephan Klasen,
Tatyana Krivobokova
Abstract:
Ratio of medians or other suitable quantiles of two distributions is widely used in medical research to compare treatment and control groups or in economics to compare various economic variables when repeated cross-sectional data are available. Inspired by the so-called growth incidence curves introduced in poverty research, we argue that the ratio of quantile functions is a more appropriate and i…
▽ More
Ratio of medians or other suitable quantiles of two distributions is widely used in medical research to compare treatment and control groups or in economics to compare various economic variables when repeated cross-sectional data are available. Inspired by the so-called growth incidence curves introduced in poverty research, we argue that the ratio of quantile functions is a more appropriate and informative tool to compare two distributions. We present an estimator for the ratio of quantile functions and develop corresponding simultaneous confidence bands, which allow to assess significance of certain features of the quantile functions ratio. Derived simultaneous confidence bands rely on the asymptotic distribution of the quantile functions ratio and do not require re-sampling techniques. The performance of the simultaneous confidence bands is demonstrated in simulations. Analysis of the expenditure data from Uganda in years 1999, 2002 and 2005 illustrates the relevance of our approach.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Kernel partial least squares for stationary data
Authors:
Marco Singer,
Tatyana Krivobokova,
Axel Munk
Abstract:
We consider the kernel partial least squares algorithm for non-parametric regression with stationary dependent data. Probabilistic convergence rates of the kernel partial least squares estimator to the true regression function are established under a source and an effective dimensionality condition. It is shown both theoretically and in simulations that long range dependence results in slower conv…
▽ More
We consider the kernel partial least squares algorithm for non-parametric regression with stationary dependent data. Probabilistic convergence rates of the kernel partial least squares estimator to the true regression function are established under a source and an effective dimensionality condition. It is shown both theoretically and in simulations that long range dependence results in slower convergence rates. A protein dynamics example shows high predictive power of kernel partial least squares.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
A unified framework for spline estimators
Authors:
Katsiaryna Schwarz,
Tatyana Krivobokova
Abstract:
This article develops a unified framework to study the asymptotic properties of all periodic spline-based estimators, that is, of regression, penalized and smoothing splines. The explicit form of the periodic Demmler-Reinsch basis in terms of exponential splines allows the derivation of an expression for the asymptotic equivalent kernel on the real line for all spline estimators simultaneously. Th…
▽ More
This article develops a unified framework to study the asymptotic properties of all periodic spline-based estimators, that is, of regression, penalized and smoothing splines. The explicit form of the periodic Demmler-Reinsch basis in terms of exponential splines allows the derivation of an expression for the asymptotic equivalent kernel on the real line for all spline estimators simultaneously. The corresponding bandwidth, which drives the asymptotic behavior of spline estimators, is shown to be a function of the number of knots and the smoothing parameter. Strategies for the selection of the optimal bandwidth and other model parameters are discussed.
△ Less
Submitted 19 February, 2016;
originally announced February 2016.
-
Partial least squares for dependent data
Authors:
Marco Singer,
Tatyana Krivobokova,
Bert L. de Groot,
Axel Munk
Abstract:
The partial least squares algorithm for dependent data realisations is considered. Consequences of ignoring the dependence for the algorithm performance are studied both theoretically and in simulations. It is shown that ignoring certain non-stationary dependence structures leads to inconsistent estimation. A simple modification of the partial least squares algorithm for dependent data is proposed…
▽ More
The partial least squares algorithm for dependent data realisations is considered. Consequences of ignoring the dependence for the algorithm performance are studied both theoretically and in simulations. It is shown that ignoring certain non-stationary dependence structures leads to inconsistent estimation. A simple modification of the partial least squares algorithm for dependent data is proposed and consistency of corresponding estimators is shown. A real-data example on protein dynamics llustrates a superior predictive power of the method and the practical relevance of the problem.
△ Less
Submitted 3 March, 2016; v1 submitted 16 October, 2015;
originally announced October 2015.
-
Adaptive empirical Bayesian smoothing splines
Authors:
Paulo Serra,
Tatyana Krivobokova
Abstract:
In this paper we develop and study adaptive empirical Bayesian smoothing splines. These are smoothing splines with both smoothing parameter and penalty order determined via the empirical Bayes method from the marginal likelihood of the model. The selected order and smoothing parameter are used to construct adaptive credible sets with good frequentist coverage for the underlying regression function…
▽ More
In this paper we develop and study adaptive empirical Bayesian smoothing splines. These are smoothing splines with both smoothing parameter and penalty order determined via the empirical Bayes method from the marginal likelihood of the model. The selected order and smoothing parameter are used to construct adaptive credible sets with good frequentist coverage for the underlying regression function. We use these credible sets as a proxy to show the superior performance of adaptive empirical Bayesian smoothing splines compared to frequentist smoothing splines.
△ Less
Submitted 17 November, 2015; v1 submitted 25 November, 2014;
originally announced November 2014.