-
Is model selection possible for the $\ell_p$-loss? PCO estimation for regression models
Authors:
Claire Lacour,
Pascal Massart,
Vincent Rivoirard
Abstract:
This paper addresses the problem of model selection in the sequence model $Y=θ+\varepsilonξ$, when $ξ$ is sub-Gaussian, for non-euclidian loss-functions. In this model, the Penalized Comparison to Overfitting procedure is studied for the weighted $\ell_p$-loss, $p\geq 1.$ Several oracle inequalities are derived from concentration inequalities for sub-Weibull variables. Using judicious collections…
▽ More
This paper addresses the problem of model selection in the sequence model $Y=θ+\varepsilonξ$, when $ξ$ is sub-Gaussian, for non-euclidian loss-functions. In this model, the Penalized Comparison to Overfitting procedure is studied for the weighted $\ell_p$-loss, $p\geq 1.$ Several oracle inequalities are derived from concentration inequalities for sub-Weibull variables. Using judicious collections of models and penalty terms, minimax rates of convergence are stated for Besov bodies $\mathcal{B}_{r,\infty}^s$. These results are applied to the functional model of nonparametric regression.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Adaptive greedy algorithm for moderately large dimensions in kernel conditional density estimation
Authors:
Minh-Lien Jeanne Nguyen,
Claire Lacour,
Vincent Rivoirard
Abstract:
This paper studies the estimation of the conditional density f (x, $\times$) of Y i given X i = x, from the observation of an i.i.d. sample (X i , Y i) $\in$ R d , i = 1,. .. , n. We assume that f depends only on r unknown components with typically r d. We provide an adaptive fully-nonparametric strategy based on kernel rules to estimate f. To select the bandwidth of our kernel rule, we propose a…
▽ More
This paper studies the estimation of the conditional density f (x, $\times$) of Y i given X i = x, from the observation of an i.i.d. sample (X i , Y i) $\in$ R d , i = 1,. .. , n. We assume that f depends only on r unknown components with typically r d. We provide an adaptive fully-nonparametric strategy based on kernel rules to estimate f. To select the bandwidth of our kernel rule, we propose a new fast iterative algorithm inspired by the Rodeo algorithm (Wasserman and Lafferty (2006)) to detect the sparsity structure of f. More precisely, in the minimax setting, our pointwise estimator, which is adaptive to both the regularity and the sparsity, achieves the quasi-optimal rate of convergence. Its computational complexity is only O(dn log n).
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Three rates of convergence or separation via U-statistics in a dependent framework
Authors:
Quentin Duchemin,
Yohann De Castro,
Claire Lacour
Abstract:
Despite the ubiquity of U-statistics in modern Probability and Statistics, their non-asymptotic analysis in a dependent framework may have been overlooked. In a recent work, a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains has been proved. In this paper, we put this theoretical breakthrough into action by pushing further the current state of knowledg…
▽ More
Despite the ubiquity of U-statistics in modern Probability and Statistics, their non-asymptotic analysis in a dependent framework may have been overlooked. In a recent work, a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains has been proved. In this paper, we put this theoretical breakthrough into action by pushing further the current state of knowledge in three different active fields of research. First, we establish a new exponential inequality for the estimation of spectra of trace class integral operators with MCMC methods. The novelty is that this result holds for kernels with positive and negative eigenvalues, which is new as far as we know. In addition, we investigate generalization performance of online algorithms working with pairwise loss functions and Markov chain samples. We provide an online-to-batch conversion result by showing how we can extract a low risk hypothesis from the sequence of hypotheses generated by any online learner. We finally give a non-asymptotic analysis of a goodness-of-fit test on the density of the invariant measure of a Markov chain. We identify some classes of alternatives over which our test based on the $L_2$ distance has a prescribed power.
△ Less
Submitted 16 June, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Semiparametric inference for mixtures of circular data
Authors:
Claire Lacour,
Thanh Mai Pham Ngoc
Abstract:
We consider X 1 ,. .. , X n a sample of data on the circle S 1 , whose distribution is a twocomponent mixture. Denoting R and Q two rotations on S 1 , the density of the X i 's is assumed to be g(x) = pf (R --1 x) + (1 -- p)f (Q --1 x), where p $\in$ (0, 1) and f is an unknown density on the circle. In this paper we estimate both the parametric part $θ$ = (p, R, Q) and the nonparametric part f. Th…
▽ More
We consider X 1 ,. .. , X n a sample of data on the circle S 1 , whose distribution is a twocomponent mixture. Denoting R and Q two rotations on S 1 , the density of the X i 's is assumed to be g(x) = pf (R --1 x) + (1 -- p)f (Q --1 x), where p $\in$ (0, 1) and f is an unknown density on the circle. In this paper we estimate both the parametric part $θ$ = (p, R, Q) and the nonparametric part f. The specific problems of identifiability on the circle are studied. A consistent estimator of $θ$ is introduced and its asymptotic normality is proved. We propose a Fourier-based estimator of f with a penalized criterion to choose the resolution level. We show that our adaptive estimator is optimal from the oracle and minimax points of view when the density belongs to a Sobolev ball. Our method is illustrated by numerical simulations.
△ Less
Submitted 31 May, 2022; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains
Authors:
Quentin Duchemin,
Yohann de Castro,
Claire Lacour
Abstract:
We prove a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. Working with bounded and $π$-canonical kernels, we show that we can recover the convergence rate of Arcones and Gin{é} who proved a concentration result for U-statistics of independent random variables and canonical kernels. Our result allows for a dependence of the kernels $h_{i,j}$ with the…
▽ More
We prove a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. Working with bounded and $π$-canonical kernels, we show that we can recover the convergence rate of Arcones and Gin{é} who proved a concentration result for U-statistics of independent random variables and canonical kernels. Our result allows for a dependence of the kernels $h_{i,j}$ with the indexes in the sums, which prevents the use of standard blocking tools. Our proof relies on an inductive analysis where we use martingale techniques, uniform ergodicity, Nummelin splitting and Bernstein's type inequality. Assuming further that the Markov chain starts from its invariant distribution, we prove a Bernstein-type concentration inequality that provides sharper convergence rate for small variance terms.
△ Less
Submitted 18 March, 2022; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation
Authors:
Suzanne Varet,
Claire Lacour,
Pascal Massart,
Vincent Rivoirard
Abstract:
Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel…
▽ More
Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel density estimation to those which are commonly used by now (at least those which are implemented in the R-package). This new method is called Penalized Comparison to Overfitting (PCO). It has been proposed by some of the authors of this paper in a previous work devoted to its statistical relevance from a purely theoretical perspective. It is compared here to other usual bandwidth selection methods for univariate and also multivariate kernel density estimation on the basis of intensive simulation studies. In particular, cross-validation and plug-in criteria are numerically investigated and compared to PCO. The take home message is that PCO can outperform the classical methods without algorithmic additionnal cost.
△ Less
Submitted 4 February, 2019;
originally announced February 2019.
-
Adaptive Estimation of Nonparametric Geometric Graphs
Authors:
Yohann De Castro,
Claire Lacour,
Thanh Mai Pham Ngoc
Abstract:
This article studies the recovery of graphons when they are convolution kernels on compact (symmetric) metric spaces. This case is of particular interest since it covers the situation where the probability of an edge depends only on some unknown nonparametric function of the distance between latent points, referred to as Nonparametric Geometric Graphs (NGG). In this setting, adaptive estimation of…
▽ More
This article studies the recovery of graphons when they are convolution kernels on compact (symmetric) metric spaces. This case is of particular interest since it covers the situation where the probability of an edge depends only on some unknown nonparametric function of the distance between latent points, referred to as Nonparametric Geometric Graphs (NGG). In this setting, adaptive estimation of NGG is possible using a spectral procedure combined with a Goldenshluger-Lepski adaptation method. The latent spaces covered by our framework encompass (among others) compact symmetric spaces of rank one, namely real spheres and projective spaces. For these latter, explicit computations of the eigen-basis and of the model complexity can be achieved, leading to quantitative non-asymptotic results. The time complexity of our method scales cubicly in the size of the graph and exponentially in the regularity of the graphon. Hence, this paper offers an algorithmically and theoretically efficient procedure to estimate smooth NGG. As a by product, this paper shows a non-asymptotic concentration result on the spectrum of integral operators defined by symmetric kernels (not necessarily positive).
△ Less
Submitted 6 April, 2020; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Estimator selection: a new method with applications to kernel density estimation
Authors:
Claire Lacour,
Pascal Massart,
Vincent Rivoirard
Abstract:
Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski's method). Our aim in this paper is twofold. First we explain some general ideas about the calibration issue of estimator selection methods. We review some known results, p…
▽ More
Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski's method). Our aim in this paper is twofold. First we explain some general ideas about the calibration issue of estimator selection methods. We review some known results, putting the emphasis on the concept of minimal penalty which is helpful to design data-driven selection criteria. Secondly we present a new method for bandwidth selection within the framework of kernel density density estimation which is in some sense intermediate between these two main methods mentioned above. We provide some theoretical results which lead to some fully data-driven selection strategy.
△ Less
Submitted 18 October, 2017; v1 submitted 18 July, 2016;
originally announced July 2016.
-
Minimal penalty for Goldenshluger-Lepski method
Authors:
Claire Lacour,
Pascal Massart
Abstract:
This paper is concerned with adaptive nonparametric estimation using the Goldenshluger-Lepski selection method. This estimator selection method is based on pairwise comparisons between estimators with respect to some loss function. The method also involves a penalty term that typically needs to be large enough in order that the method works (in the sense that one can prove some oracle type inequa…
▽ More
This paper is concerned with adaptive nonparametric estimation using the Goldenshluger-Lepski selection method. This estimator selection method is based on pairwise comparisons between estimators with respect to some loss function. The method also involves a penalty term that typically needs to be large enough in order that the method works (in the sense that one can prove some oracle type inequality for the selected estimator). In the case of density estimation with kernel estimators and a quadratic loss, we show that the procedure fails if the penalty term is chosen smaller than some critical value for the penalty: the minimal penalty. More precisely we show that the quadratic risk of the selected estimator explodes when the penalty is below this critical value while it stays under control when the penalty is above this critical value. This kind of phase transition phenomenon for penalty calibration has already been observed and proved for penalized model selection methods in various contexts but appears here for the first time for the Goldenshluger-Lepski pairwise comparison method. Some simulations illustrate the theoretical results and lead to some hints on how to use the theory to calibrate the method in practice.
△ Less
Submitted 29 February, 2016; v1 submitted 3 March, 2015;
originally announced March 2015.
-
Minimax adaptive estimation of nonparametric hidden Markov models
Authors:
Yohann De Castro,
Élisabeth Gassiat,
Claire Lacour
Abstract:
We consider stationary hidden Markov models with finite state space and nonparametric modeling of the emission distributions. It has remained unknown until very recently that such models are identifiable. In this paper, we propose a new penalized least-squares esti-mator for the emission distributions which is statistically optimal and practically tractable. We prove a non asymptotic oracle inequa…
▽ More
We consider stationary hidden Markov models with finite state space and nonparametric modeling of the emission distributions. It has remained unknown until very recently that such models are identifiable. In this paper, we propose a new penalized least-squares esti-mator for the emission distributions which is statistically optimal and practically tractable. We prove a non asymptotic oracle inequality for our nonparametric estimator of the emission distributions. A consequence is that this new estimator is rate minimax adaptive up to a logarithmic term. Our methodology is based on projections of the emission distributions onto nested subspaces of increasing complexity. The popular spectral estimators are unable to achieve the optimal rate but may be used as initial points in our procedure. Simulations are given that show the improvement obtained when applying the least-squares minimization consecutively to the spectral estimation.
△ Less
Submitted 27 December, 2015; v1 submitted 20 January, 2015;
originally announced January 2015.
-
Adaptive pointwise estimation of conditional density function
Authors:
Karine Bertin,
Claire Lacour,
Vincent Rivoirard
Abstract:
In this paper we consider the problem of estimating $f$, the conditional density of $Y$ given $X$, by using an independent sample distributed as $(X,Y)$ in the multivariate setting. We consider the estimation of $f(x,.)$ where $x$ is a fixed point. We define two different procedures of estimation, the first one using kernel rules, the second one inspired from projection methods. Both adapted estim…
▽ More
In this paper we consider the problem of estimating $f$, the conditional density of $Y$ given $X$, by using an independent sample distributed as $(X,Y)$ in the multivariate setting. We consider the estimation of $f(x,.)$ where $x$ is a fixed point. We define two different procedures of estimation, the first one using kernel rules, the second one inspired from projection methods. Both adapted estimators are tuned by using the Goldenshluger and Lepski methodology. After deriving lower bounds, we show that these procedures satisfy oracle inequalities and are optimal from the minimax point of view on anisotropic H{ö}lder balls. Furthermore, our results allow us to measure precisely the influence of $\mathrm{f}\_X(x)$ on rates of convergence, where $\mathrm{f}\_X$ is the density of $X$. Finally, some simulations illustrate the good behavior of our tuned estimates in practice.
△ Less
Submitted 29 December, 2014; v1 submitted 28 December, 2013;
originally announced December 2013.
-
Adaptive pointwise estimation for pure jump Lévy processes
Authors:
Mélina Bec,
Claire Lacour
Abstract:
This paper is concerned with adaptive kernel estimation of the Lévy density N(x) for bounded-variation pure-jump Lévy processes. The sample path is observed at n discrete instants in the "high frequency" context (Δ= Δ(n) tends to zero while nΔtends to infinity). We construct a collection of kernel estimators of the function g(x)=xN(x) and propose a method of local adaptive selection of the bandwid…
▽ More
This paper is concerned with adaptive kernel estimation of the Lévy density N(x) for bounded-variation pure-jump Lévy processes. The sample path is observed at n discrete instants in the "high frequency" context (Δ= Δ(n) tends to zero while nΔtends to infinity). We construct a collection of kernel estimators of the function g(x)=xN(x) and propose a method of local adaptive selection of the bandwidth. We provide an oracle inequality and a rate of convergence for the quadratic pointwise risk. This rate is proved to be the optimal minimax rate. We give examples and simulation results for processes fitting in our framework. We also consider the case of irregular sampling.
△ Less
Submitted 13 February, 2013; v1 submitted 21 May, 2012;
originally announced May 2012.
-
Goodness-of-fit test for noisy directional data
Authors:
Claire Lacour,
Thanh Mai Pham Ngoc
Abstract:
We consider spherical data $X_i$ noised by a random rotation $\varepsilon_i\in$ SO(3) so that only the sample $Z_i=\varepsilon_iX_i$, $i=1,\dots, N$ is observed. We define a nonparametric test procedure to distinguish $H_0:$ ''the density $f$ of $X_i$ is the uniform density $f_0$ on the sphere'' and $H_1:$ ''$\|f-f_0\|_2^2\geq \Cψ_N$ and $f$ is in a Sobolev space with smoothness $s$''. For a noise…
▽ More
We consider spherical data $X_i$ noised by a random rotation $\varepsilon_i\in$ SO(3) so that only the sample $Z_i=\varepsilon_iX_i$, $i=1,\dots, N$ is observed. We define a nonparametric test procedure to distinguish $H_0:$ ''the density $f$ of $X_i$ is the uniform density $f_0$ on the sphere'' and $H_1:$ ''$\|f-f_0\|_2^2\geq \Cψ_N$ and $f$ is in a Sobolev space with smoothness $s$''. For a noise density $f_\varepsilon$ with smoothness index $ν$, we show that an adaptive procedure (i.e. $s$ is not assumed to be known) cannot have a faster rate of separation than $ψ_N^{ad}(s)=(N/\sqrt{\log\log(N)})^{-2s/(2s+2ν+1)}$ and we provide a procedure which reaches this rate. We also deal with the case of super smooth noise. We illustrate the theory by implementing our test procedure for various kinds of noise on SO(3) and by comparing it to other procedures. Applications to real data in astrophysics and paleomagnetism are provided.
△ Less
Submitted 15 November, 2013; v1 submitted 9 March, 2012;
originally announced March 2012.
-
Least squares type estimation of the transition density of a particular hidden Markov chain
Authors:
Claire Lacour
Abstract:
In this paper, we study the following model of hidden Markov chain: $Y_i=X_i+ε_i$, $i=1,...,n+1$ with $(X_i)$ a real-valued stationary Markov chain and $(ε_i)_{1\leq i\leq n+1}$ a noise having a known distribution and independent of the sequence $(X_i)$. We present an estimator of the transition density obtained by minimization of an original contrast that takes advantage of the regressive aspec…
▽ More
In this paper, we study the following model of hidden Markov chain: $Y_i=X_i+ε_i$, $i=1,...,n+1$ with $(X_i)$ a real-valued stationary Markov chain and $(ε_i)_{1\leq i\leq n+1}$ a noise having a known distribution and independent of the sequence $(X_i)$. We present an estimator of the transition density obtained by minimization of an original contrast that takes advantage of the regressive aspect of the problem. It is selected among a collection of projection estimators with a model selection method. The $L^2$-risk and its rate of convergence are evaluated for ordinary smooth noise and some simulations illustrate the method. We obtain uniform risk bounds over classes of Besov balls. In addition our estimation procedure requires no prior knowledge of the regularity of the true transition. Finally, our estimator permits to avoid the drawbacks of quotient estimators.
△ Less
Submitted 17 January, 2008;
originally announced January 2008.
-
Rates of convergence for nonparametric deconvolution
Authors:
Claire Lacour
Abstract:
This Note presents original rates of convergence for the deconvolution problem. We assume that both the estimated density and noise density are supersmooth and we compute the risk for two kinds of estimators.
This Note presents original rates of convergence for the deconvolution problem. We assume that both the estimated density and noise density are supersmooth and we compute the risk for two kinds of estimators.
△ Less
Submitted 22 November, 2006;
originally announced November 2006.
-
Adaptive estimation of the transition density of a particular hidden Markov chain
Authors:
Claire Lacour
Abstract:
We study the following model of hidden Markov chain: $Y_i=X_i+ε_i$, $ i=1,...,n+1$ with $(X_i)$ a real-valued positive recurrent and stationary Markov chain and $(ε_i)_{1\leq i\leq n+1}$ a noise independent of the sequence $(X_i)$ having a known distribution. We present an adaptive estimator of the transition density based on the quotient of a deconvolution estimator of the density of $X_i$ and…
▽ More
We study the following model of hidden Markov chain: $Y_i=X_i+ε_i$, $ i=1,...,n+1$ with $(X_i)$ a real-valued positive recurrent and stationary Markov chain and $(ε_i)_{1\leq i\leq n+1}$ a noise independent of the sequence $(X_i)$ having a known distribution. We present an adaptive estimator of the transition density based on the quotient of a deconvolution estimator of the density of $X_i$ and an estimator of the density of $(X_i,X_{i+1})$. These estimators are obtained by contrast minimization and model selection. We evaluate the $L2$ risk and its rate of convergence for ordinary smooth and supersmooth noise with regard to ordinary smooth and supersmooth chains. Some examples are also detailed.
△ Less
Submitted 22 November, 2006;
originally announced November 2006.
-
Adaptive estimation of the transition density of a Markov chain
Authors:
Claire Lacour
Abstract:
In this paper a new estimator for the transition density $π$ of an homogeneous Markov chain is considered. We introduce an original contrast derived from regression framework and we use a model selection method to estimate $π$ under mild conditions. The resulting estimate is adaptive with an optimal rate of convergence over a large range of anisotropic Besov spaces $B_{2,\infty}^{(α_1,α_2)}$. So…
▽ More
In this paper a new estimator for the transition density $π$ of an homogeneous Markov chain is considered. We introduce an original contrast derived from regression framework and we use a model selection method to estimate $π$ under mild conditions. The resulting estimate is adaptive with an optimal rate of convergence over a large range of anisotropic Besov spaces $B_{2,\infty}^{(α_1,α_2)}$. Some simulations are also presented.
△ Less
Submitted 22 November, 2006;
originally announced November 2006.
-
Nonparametric estimation of the stationary density and the transition density of a Markov chain
Authors:
Claire Lacour
Abstract:
In this paper, we study first the problem of nonparametric estimation of the stationary density $f$ of a discrete-time Markov chain $(X_i)$. We consider a collection of projection estimators on finite dimensional linear spaces. We select an estimator among the collection by minimizing a penalized contrast. The same technique enables to estimate the density $g$ of $(X_i, X_{i+1})$ and so to provi…
▽ More
In this paper, we study first the problem of nonparametric estimation of the stationary density $f$ of a discrete-time Markov chain $(X_i)$. We consider a collection of projection estimators on finite dimensional linear spaces. We select an estimator among the collection by minimizing a penalized contrast. The same technique enables to estimate the density $g$ of $(X_i, X_{i+1})$ and so to provide an adaptive estimator of the transition density $π=g/f$. We give bounds in $L^2$ norm for these estimators and we show that they are adaptive in the minimax sense over a large class of Besov spaces. Some examples and simulations are also provided.
△ Less
Submitted 9 January, 2008; v1 submitted 21 November, 2006;
originally announced November 2006.