Search | arXiv e-print repository

Model-free identification in ill-posed regression

Authors: Gianluca Finocchio, Tatyana Krivobokova

Abstract: The problem of parsimonious parameter identification in possibly high-dimensional linear regression with highly correlated features is addressed. This problem is formalized as the estimation of the best, in a certain sense, linear combinations of the features that are relevant to the response variable. Importantly, the dependence between the features and the response is allowed to be arbitrary. Ne… ▽ More The problem of parsimonious parameter identification in possibly high-dimensional linear regression with highly correlated features is addressed. This problem is formalized as the estimation of the best, in a certain sense, linear combinations of the features that are relevant to the response variable. Importantly, the dependence between the features and the response is allowed to be arbitrary. Necessary and sufficient conditions for such parsimonious identification -- referred to as statistical interpretability -- are established for a broad class of linear dimensionality reduction algorithms. Sharp bounds on their estimation errors, with high probability, are derived. To our knowledge, this is the first formal framework that enables the definition and assessment of the interpretability of a broad class of algorithms. The results are specifically applied to methods based on sparse regression, unsupervised projection and sufficient reduction. The implications of employing such methods for prediction problems are discussed in the context of the prolific literature on overparametrized methods in the regime of benign overfitting. △ Less

Submitted 2 May, 2025; originally announced May 2025.

Comments: 59 pages, 3 figures

MSC Class: 65F22; 65F10 (Primary) 62B05; 65F20 (Secondary)

arXiv:2504.00919 [pdf, ps, other]

Nonparametric spectral density estimation using interactive mechanisms under local differential privacy

Authors: Cristina Butucea, Karolina Klockmann, Tatyana Krivobokova

Abstract: We address the problem of nonparametric estimation of the spectral density for a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: estimating a single covariance coefficient, estimating the spectral density at a fixed frequency, and estimating the entire spectral density function. Our… ▽ More We address the problem of nonparametric estimation of the spectral density for a centered stationary Gaussian time series under local differential privacy constraints. Specifically, we propose new interactive privacy mechanisms for three tasks: estimating a single covariance coefficient, estimating the spectral density at a fixed frequency, and estimating the entire spectral density function. Our approach achieves faster rates through a two-stage process: we apply first the Laplace mechanism to the truncated value and then use the former privatized sample to gain knowledge on the dependence mechanism in the time series. For spectral densities belonging to Hölder and Sobolev smoothness classes, we demonstrate that our estimators improve upon the non-interactive mechanism of Kroll (2024) for small privacy parameter $α$, since the pointwise rates depend on $nα^2$ instead of $nα^4$. Moreover, we show that the rate $(nα^4)^{-1}$ is optimal for estimating a covariance coefficient with non-interactive mechanisms. However, the $L_2$ rate of our interactive estimator is slower than the pointwise rate. We show how to use these estimators to provide a bona-fide locally differentially private covariance matrix estimator. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: 47 pages

arXiv:2307.08377 [pdf, other]

An extended latent factor framework for ill-posed linear regression

Authors: Gianluca Finocchio, Tatyana Krivobokova

Abstract: The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squ… ▽ More The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squares (PLS) algorithm under random design. In particular, a novel perturbation bound for PLS solutions is proven and the high-probability $L^2$-estimation rate for the PLS estimator is obtained. This novel framework also sheds light on the performance of other regularisation methods for ill-posed linear regression that exploit sparsity or unsupervised projection. The theoretical findings are confirmed by numerical studies on both real and simulated data. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 48 pages, 4 figures

MSC Class: 62H25; 65F22 (Primary) 65F10; 62J05 (Secondary)

arXiv:2306.10920 [pdf, other]

On Second-Order Statistics of the Log-Average Periodogram for Gaussian Processes

Authors: Karolina Klockmann, Tatyana Krivobokova

Abstract: We present an approximate expression for the covariance of the log-average periodogram for a zero mean stationary Gaussian process. Our findings extend the work of [1] on the covariance of the log-periodogram by additionally taking averaging over adjacent frequencies into account. Moreover, we provide a simple expression for the non-integer moments of a non-central chi-squared distribution. We present an approximate expression for the covariance of the log-average periodogram for a zero mean stationary Gaussian process. Our findings extend the work of [1] on the covariance of the log-periodogram by additionally taking averaging over adjacent frequencies into account. Moreover, we provide a simple expression for the non-integer moments of a non-central chi-squared distribution. △ Less

Submitted 9 October, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 14 pages, 1 figure

arXiv:2303.10018 [pdf, ps, other]

Efficient nonparametric estimation of Toeplitz covariance matrices

Authors: Karolina Klockmann, Tatyana Krivobokova

Abstract: A new nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very f… ▽ More A new nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the Discrete Cosine Transform (DCT) is proposed. The method is implemented in the R package vstdct that accompanies the paper. △ Less

Submitted 5 January, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 58 pages, 6 figures, 9 tables

arXiv:2204.03887 [pdf, other]

doi 10.1016/j.jmva.2023.105230

Uniformly Valid Inference Based on the Lasso in Linear Mixed Models

Authors: Peter Kramlinger, Ulrike Schneider, Tatyana Krivobokova

Abstract: Linear mixed models (LMMs) are suitable for clustered data and are common in biometrics, medicine, survey statistics and many other fields. In those applications, it is essential to carry out valid inference after selecting a subset of the available variables. We construct confidence sets for the fixed effects in Gaussian LMMs that are based on Lasso-type estimators. Aside from providing confidenc… ▽ More Linear mixed models (LMMs) are suitable for clustered data and are common in biometrics, medicine, survey statistics and many other fields. In those applications, it is essential to carry out valid inference after selecting a subset of the available variables. We construct confidence sets for the fixed effects in Gaussian LMMs that are based on Lasso-type estimators. Aside from providing confidence regions, this also allows to quantify the joint uncertainty of both variable selection and parameter estimation in the procedure. To show that the resulting confidence sets for the fixed effects are uniformly valid over the parameter spaces of both the regression coefficients and the covariance parameters, we also prove the novel result on uniform Cramer consistency of the restricted maximum likelihood (REML) estimators of the covariance parameters. The superiority of the constructed confidence sets to naive post-selection procedures is validated in simulations and illustrated with a study of the acid neutralization capacity of lakes in the United States. △ Less

Submitted 16 August, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 25 pages, 1 figure

MSC Class: 62F25; 62J10; 62J07

Journal ref: Journal of Multivariate Analysis 198 (2023), 105230

arXiv:1903.02517 [pdf, other]

Threshold Selection in Univariate Extreme Value Analysis

Authors: Laura Fee Schneider, Andrea Krajina, Tatyana Krivobokova

Abstract: Threshold selection plays a key role for various aspects of statistical inference of rare events. Most classical approaches tackling this problem for heavy-tailed distributions crucially depend on tuning parameters or critical values to be chosen by the practitioner. To simplify the use of automated, data-driven threshold selection methods, we introduce two new procedures not requiring the manual… ▽ More Threshold selection plays a key role for various aspects of statistical inference of rare events. Most classical approaches tackling this problem for heavy-tailed distributions crucially depend on tuning parameters or critical values to be chosen by the practitioner. To simplify the use of automated, data-driven threshold selection methods, we introduce two new procedures not requiring the manual choice of any parameters. The first method measures the deviation of the log-spacings from the exponential distribution and achieves good performance in simulations for estimating high quantiles. The second approach smoothly estimates the asymptotic mean square error of the Hill estimator and performs consistently well over a wide range of distributions. The methods are compared to existing procedures in an extensive simulation study and applied to a dataset of financial losses, where the underlying extreme value index is assumed to vary over time. This application strongly emphasizes the importance of solid automated threshold selection. △ Less

Submitted 6 March, 2019; originally announced March 2019.

MSC Class: 62G32

arXiv:1812.09250 [pdf, other]

doi 10.1080/01621459.2022.2044826

Marginal and Conditional Multiple Inference for Linear Mixed Model Predictors

Authors: Peter Kramlinger, Tatyana Krivobokova, Stefan Sperlich

Abstract: In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for cluster specifi… ▽ More In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for cluster specific predictors. Consistent confidence sets for multiple inference are constructed under both, the marginal and the conditional law. Furthermore, it is shown that, remarkably, corresponding multiple marginal confidence sets are also asymptotically valid for conditional inference. Those lend themselves for testing linear hypotheses using standard quantiles without the need of re-sampling techniques. All findings are validated in simulations and illustrated along a study on Covid-19 mortality in US state prisons. △ Less

Submitted 16 February, 2022; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: 31 pages, 4 figures

MSC Class: 62J15

arXiv:1812.06948 [pdf, other]

Adaptive Non-parametric Estimation of Mean and Autocovariance in Regression with Dependent Errors

Authors: Tatyana Krivobokova, Paulo Serra, Francisco Rosales, Karolina Klockmann

Abstract: Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. Our empirical Bayes approach is data-driven, numerically efficient and allows for the construction of confidence sets for the mea… ▽ More Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. Our empirical Bayes approach is data-driven, numerically efficient and allows for the construction of confidence sets for the mean function. Performance is demonstrated in simulations and real data analysis. The method is implemented in the R package eBsc that accompanies the paper. △ Less

Submitted 18 August, 2021; v1 submitted 17 December, 2018; originally announced December 2018.

arXiv:1710.09009 [pdf, other]

Asymptotic Distribution and Simultaneous Confidence Bands for Ratios of Quantile Functions

Authors: Fabian Dunker, Stephan Klasen, Tatyana Krivobokova

Abstract: Ratio of medians or other suitable quantiles of two distributions is widely used in medical research to compare treatment and control groups or in economics to compare various economic variables when repeated cross-sectional data are available. Inspired by the so-called growth incidence curves introduced in poverty research, we argue that the ratio of quantile functions is a more appropriate and i… ▽ More Ratio of medians or other suitable quantiles of two distributions is widely used in medical research to compare treatment and control groups or in economics to compare various economic variables when repeated cross-sectional data are available. Inspired by the so-called growth incidence curves introduced in poverty research, we argue that the ratio of quantile functions is a more appropriate and informative tool to compare two distributions. We present an estimator for the ratio of quantile functions and develop corresponding simultaneous confidence bands, which allow to assess significance of certain features of the quantile functions ratio. Derived simultaneous confidence bands rely on the asymptotic distribution of the quantile functions ratio and do not require re-sampling techniques. The performance of the simultaneous confidence bands is demonstrated in simulations. Analysis of the expenditure data from Uganda in years 1999, 2002 and 2005 illustrates the relevance of our approach. △ Less

Submitted 24 October, 2017; originally announced October 2017.

MSC Class: 62G15; 62G30

arXiv:1706.03559 [pdf, other]

Kernel partial least squares for stationary data

Authors: Marco Singer, Tatyana Krivobokova, Axel Munk

Abstract: We consider the kernel partial least squares algorithm for non-parametric regression with stationary dependent data. Probabilistic convergence rates of the kernel partial least squares estimator to the true regression function are established under a source and an effective dimensionality condition. It is shown both theoretically and in simulations that long range dependence results in slower conv… ▽ More We consider the kernel partial least squares algorithm for non-parametric regression with stationary dependent data. Probabilistic convergence rates of the kernel partial least squares estimator to the true regression function are established under a source and an effective dimensionality condition. It is shown both theoretically and in simulations that long range dependence results in slower convergence rates. A protein dynamics example shows high predictive power of kernel partial least squares. △ Less

Submitted 12 June, 2017; originally announced June 2017.

arXiv:1602.06318 [pdf, other]

A unified framework for spline estimators

Authors: Katsiaryna Schwarz, Tatyana Krivobokova

Abstract: This article develops a unified framework to study the asymptotic properties of all periodic spline-based estimators, that is, of regression, penalized and smoothing splines. The explicit form of the periodic Demmler-Reinsch basis in terms of exponential splines allows the derivation of an expression for the asymptotic equivalent kernel on the real line for all spline estimators simultaneously. Th… ▽ More This article develops a unified framework to study the asymptotic properties of all periodic spline-based estimators, that is, of regression, penalized and smoothing splines. The explicit form of the periodic Demmler-Reinsch basis in terms of exponential splines allows the derivation of an expression for the asymptotic equivalent kernel on the real line for all spline estimators simultaneously. The corresponding bandwidth, which drives the asymptotic behavior of spline estimators, is shown to be a function of the number of knots and the smoothing parameter. Strategies for the selection of the optimal bandwidth and other model parameters are discussed. △ Less

Submitted 19 February, 2016; originally announced February 2016.

arXiv:1510.05014 [pdf, ps, other]

Partial least squares for dependent data

Authors: Marco Singer, Tatyana Krivobokova, Bert L. de Groot, Axel Munk

Abstract: The partial least squares algorithm for dependent data realisations is considered. Consequences of ignoring the dependence for the algorithm performance are studied both theoretically and in simulations. It is shown that ignoring certain non-stationary dependence structures leads to inconsistent estimation. A simple modification of the partial least squares algorithm for dependent data is proposed… ▽ More The partial least squares algorithm for dependent data realisations is considered. Consequences of ignoring the dependence for the algorithm performance are studied both theoretically and in simulations. It is shown that ignoring certain non-stationary dependence structures leads to inconsistent estimation. A simple modification of the partial least squares algorithm for dependent data is proposed and consistency of corresponding estimators is shown. A real-data example on protein dynamics llustrates a superior predictive power of the method and the practical relevance of the problem. △ Less

Submitted 3 March, 2016; v1 submitted 16 October, 2015; originally announced October 2015.

arXiv:1411.6860 [pdf, other]

Adaptive empirical Bayesian smoothing splines

Authors: Paulo Serra, Tatyana Krivobokova

Abstract: In this paper we develop and study adaptive empirical Bayesian smoothing splines. These are smoothing splines with both smoothing parameter and penalty order determined via the empirical Bayes method from the marginal likelihood of the model. The selected order and smoothing parameter are used to construct adaptive credible sets with good frequentist coverage for the underlying regression function… ▽ More In this paper we develop and study adaptive empirical Bayesian smoothing splines. These are smoothing splines with both smoothing parameter and penalty order determined via the empirical Bayes method from the marginal likelihood of the model. The selected order and smoothing parameter are used to construct adaptive credible sets with good frequentist coverage for the underlying regression function. We use these credible sets as a proxy to show the superior performance of adaptive empirical Bayesian smoothing splines compared to frequentist smoothing splines. △ Less

Submitted 17 November, 2015; v1 submitted 25 November, 2014; originally announced November 2014.

Showing 1–14 of 14 results for author: Krivobokova, T