-
On regression and classification with possibly missing response variables in the data
Authors:
Majid Mojirsheibani,
William Pouliot,
Andre Shakhbandaryan
Abstract:
This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and the response variables. Our proposed approach involves two steps: In the first step, we construct a family of models (possibly infinite dimensional) indexed by…
▽ More
This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and the response variables. Our proposed approach involves two steps: In the first step, we construct a family of models (possibly infinite dimensional) indexed by the unknown parameter of the missing probability mechanism. In the second step, a search is carried out to find the empirically optimal member of an appropriate cover (or subclass) of the underlying family in the sense of minimizing the mean squared prediction error. The main focus of the paper is to look into the theoretical properties of these estimators. The issue of identifiability is also addressed. Our methods use a data-splitting approach which is quite easy to implement. We also derive exponential bounds on the performance of the resulting estimators in terms of their deviations from the true regression curve in general Lp norms, where we also allow the size of the cover or subclass to diverge as the sample size n increases. These bounds immediately yield various strong convergence results for the proposed estimators. As an application of our findings, we consider the problem of statistical classification based on the proposed regression estimators and also look into their rates of convergence under different settings. Although this work is mainly stated for kernel-type estimators, they can also be extended to other popular local-averaging methods such as nearest-neighbor estimators, and histogram estimators.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
A simple approach to construct confidence bands for a regression function with incomplete data
Authors:
Ali Al-Sharadqah,
Majid Mojirsheibani
Abstract:
A long-standing problem in the construction of asymptotically correct confidence bands for a regression function $m(x)=E[Y|X=x]$, where $Y$ is the response variable influenced by the covariate $X$, involves the situation where $Y$ values may be missing at random, and where the selection probability, the density function $f(x)$ of $X$, and the conditional variance of $Y$ given $X$ are all completel…
▽ More
A long-standing problem in the construction of asymptotically correct confidence bands for a regression function $m(x)=E[Y|X=x]$, where $Y$ is the response variable influenced by the covariate $X$, involves the situation where $Y$ values may be missing at random, and where the selection probability, the density function $f(x)$ of $X$, and the conditional variance of $Y$ given $X$ are all completely unknown. This can be particularly more complicated in nonparametric situations. In this paper we propose a new kernel-type regression estimator and study the limiting distribution of the properly normalized versions of the maximal deviation of the proposed estimator from the true regression curve. The resulting limiting distribution will be used to construct uniform confidence bands for the underlying regression curve with asymptotically correct coverages. The focus of the current paper is on the case where $X\in \mathbb{R}$. We also perform numerical studies to assess the finite-sample performance of the proposed method. In this paper, both mechanics and the theoretical validity of our methods are discussed.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Classification on convex sets in the presence of missing covariates
Authors:
Levon Demirdjian,
Majid Mojirsheibani
Abstract:
A number of results related to statistical classification on convex sets are presented. In particular, the focus is on the case where some of the covariates in the data and observation being classified can be missing. The form of the optimal classifier is derived when the class-conditional densities are uniform over convex regions. In practice, the underlying convex sets are often unknown and must…
▽ More
A number of results related to statistical classification on convex sets are presented. In particular, the focus is on the case where some of the covariates in the data and observation being classified can be missing. The form of the optimal classifier is derived when the class-conditional densities are uniform over convex regions. In practice, the underlying convex sets are often unknown and must be estimated with a set of data. In this case, the convex hull of a set of points is shown to be a consistent estimator of the underlying convex set. The problem of estimation is further complicated since the number of points in each convex hull is itself a random variable. The corresponding plug-in version of the optimal classifier is derived and shown to be Bayes consistent.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.