-
A Quasi-Bayesian Perspective to Online Clustering
Authors:
Le Li,
Benjamin Guedj,
Sébastien Loustau
Abstract:
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavore…
▽ More
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.
△ Less
Submitted 25 May, 2018; v1 submitted 1 February, 2016;
originally announced February 2016.
-
Bandwidth selection in kernel empirical risk minimization via the gradient
Authors:
Michaël Chichignoud,
Sébastien Loustau
Abstract:
In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results a…
▽ More
In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.
△ Less
Submitted 18 August, 2015; v1 submitted 27 January, 2014;
originally announced January 2014.
-
Noisy classification with boundary assumptions
Authors:
Sébastien Loustau,
Clément Marteau
Abstract:
We address the problem of classification when data are collected from two samples with measurement errors. This problem turns to be an inverse problem and requires a specific treatment. In this context, we investigate the minimax rates of convergence using both a margin assumption, and a smoothness condition on the boundary of the set associated to the Bayes classifier. We establish lower and uppe…
▽ More
We address the problem of classification when data are collected from two samples with measurement errors. This problem turns to be an inverse problem and requires a specific treatment. In this context, we investigate the minimax rates of convergence using both a margin assumption, and a smoothness condition on the boundary of the set associated to the Bayes classifier. We establish lower and upper bounds (based on a deconvolution classifier) on these rates.
△ Less
Submitted 12 July, 2013;
originally announced July 2013.
-
Adaptive Noisy Clustering
Authors:
Michael Chichignoud,
Sébastien Loustau
Abstract:
The problem of adaptive noisy clustering is investigated. Given a set of noisy observations $Z_i=X_i+ε_i$, $i=1,...,n$, the goal is to design clusters associated with the law of $X_i$'s, with unknown density $f$ with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it $k$-means} is not suitable in this case. In this paper, we propose a noisy…
▽ More
The problem of adaptive noisy clustering is investigated. Given a set of noisy observations $Z_i=X_i+ε_i$, $i=1,...,n$, the goal is to design clusters associated with the law of $X_i$'s, with unknown density $f$ with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular {\it $k$-means} is not suitable in this case. In this paper, we propose a noisy $k$-means minimization, which is based on the $k$-means loss function and a deconvolution estimator of the density $f$. In particular, this approach suffers from the dependence on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density $f$.
Then, we turn out into the main issue of the paper: the data-driven choice of the bandwidth. We state an adaptive upper bound for a new selection rule, called ERC (Empirical Risk Comparison). This selection rule is based on the Lepski's principle, where empirical risks associated with different bandwidths are compared. Finally, we illustrate that this adaptive rule can be used in many statistical problems of $M$-estimation where the empirical risk depends on a nuisance parameter.
△ Less
Submitted 10 June, 2013;
originally announced June 2013.
-
Anisotropic oracle inequalities in noisy quantization
Authors:
Sébastien Loustau
Abstract:
The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample $Z_i=X_i+ε_i,i=1,\ldots,n$, where $X_i$ are i.i.d. with density $f$ and $ε_i$ are i.i.d. with density $η$. These rates depend on the geometry of the density $f$ and the asymptotic behaviour of the character…
▽ More
The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample $Z_i=X_i+ε_i,i=1,\ldots,n$, where $X_i$ are i.i.d. with density $f$ and $ε_i$ are i.i.d. with density $η$. These rates depend on the geometry of the density $f$ and the asymptotic behaviour of the characteristic function of $η$.
This general study can be applied to the problem of $k$-means clustering with noisy data. For this purpose, we introduce a deconvolution $k$-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.
△ Less
Submitted 3 May, 2013;
originally announced May 2013.
-
Fast rates for noisy clustering
Authors:
Sébastien Loustau
Abstract:
The effect of errors in variables in empirical minimization is investigated. Given a loss $l$ and a set of decision rules $\mathcal{G}$, we prove a general upper bound for an empirical minimization based on a deconvolution kernel and a noisy sample $Z_i=X_i+ε_i,i=1,...,n$. We apply this general upper bound to give the rate of convergence for the expected excess risk in noisy clustering. A recent b…
▽ More
The effect of errors in variables in empirical minimization is investigated. Given a loss $l$ and a set of decision rules $\mathcal{G}$, we prove a general upper bound for an empirical minimization based on a deconvolution kernel and a noisy sample $Z_i=X_i+ε_i,i=1,...,n$. We apply this general upper bound to give the rate of convergence for the expected excess risk in noisy clustering. A recent bound from \citet{levrard} proves that this rate is $\mathcal{O}(1/n)$ in the direct case, under Pollard's regularity assumptions. Here the effect of noisy measurements gives a rate of the form $\mathcal{O}(1/n^{\fracγ{γ+2β}})$, where $γ$ is the Hölder regularity of the density of $X$ whereas $β$ is the degree of illposedness.
△ Less
Submitted 7 May, 2012;
originally announced May 2012.
-
Statistical learning with indirect observations
Authors:
Sébastien Loustau
Abstract:
Let $(X,Y)\in\mathcal{X}\times \mathcal{Y}$ be a random couple with unknown distribution $P$. Let $\GG$ be a class of measurable functions and $\ell$ a loss function. The problem of statistical learning deals with the estimation of the Bayes: $$g^*=\arg\min_{g\in\GG}\E_P \ell(g(X),Y). $$ In this paper, we study this problem when we deal with a contaminated sample $(Z_1,Y_1),..., (Z_n,Y_n)$ of i.i.…
▽ More
Let $(X,Y)\in\mathcal{X}\times \mathcal{Y}$ be a random couple with unknown distribution $P$. Let $\GG$ be a class of measurable functions and $\ell$ a loss function. The problem of statistical learning deals with the estimation of the Bayes: $$g^*=\arg\min_{g\in\GG}\E_P \ell(g(X),Y). $$ In this paper, we study this problem when we deal with a contaminated sample $(Z_1,Y_1),..., (Z_n,Y_n)$ of i.i.d. indirect observations. Each input $Z_i$, $i=1,...,n$ is distributed from a density $Af$, where $A$ is a known compact linear operator and $f$ is the density of the direct input $X$.
We derive fast rates of convergence for empirical risk minimizers based on regularization methods, such as deconvolution kernel density estimators or spectral cut-off. These results are comparable to the existing fast rates in Koltchinskii for the direct case. It gives some insights into the effect of indirect measurements in the presence of fast rates of convergence.
△ Less
Submitted 10 July, 2012; v1 submitted 30 January, 2012;
originally announced January 2012.
-
Minimax fast rates for discriminant analysis with errors in variables
Authors:
Sébastien Loustau,
Clément Marteau
Abstract:
The effect of measurement errors in discriminant analysis is investigated. Given observations $Z=X+ε$, where $ε$ denotes a random noise, the goal is to predict the density of $X$ among two possible candidates $f$ and $g$. We suppose that we have at our disposal two learning samples. The aim is to approach the best possible decision rule $G^\star$ defined as a minimizer of the Bayes risk. In the fr…
▽ More
The effect of measurement errors in discriminant analysis is investigated. Given observations $Z=X+ε$, where $ε$ denotes a random noise, the goal is to predict the density of $X$ among two possible candidates $f$ and $g$. We suppose that we have at our disposal two learning samples. The aim is to approach the best possible decision rule $G^\star$ defined as a minimizer of the Bayes risk. In the free-noise case $(ε=0)$, minimax fast rates of convergence are well-known under the margin assumption in discriminant analysis (see \cite{mammen}) or in the more general classification framework (see \cite{tsybakov2004,AT}). In this paper we intend to establish similar results in the noisy case, i.e. when dealing with errors in variables. We prove minimax lower bounds for this problem and explain how can these rates be attained, using in particular an Empirical Risk Minimizer (ERM) method based on deconvolution kernel estimators.
△ Less
Submitted 12 May, 2015; v1 submitted 16 January, 2012;
originally announced January 2012.