-
Nonparametric Exponential Family Regression Under Star-Shaped Constraints
Authors:
Guanghong Yi,
Matey Neykov
Abstract:
We study the minimax rate of estimation in nonparametric exponential family regression under star-shaped constraints. Specifically, the parameter space $K$ is a star-shaped set contained within a bounded box $[-M, M]^n$, where $M$ is a known positive constant. Moreover, we assume that the exponential family is nonsingular and that its cumulant function is twice continuously differentiable. Our mai…
▽ More
We study the minimax rate of estimation in nonparametric exponential family regression under star-shaped constraints. Specifically, the parameter space $K$ is a star-shaped set contained within a bounded box $[-M, M]^n$, where $M$ is a known positive constant. Moreover, we assume that the exponential family is nonsingular and that its cumulant function is twice continuously differentiable. Our main result shows that the minimax rate for this problem is $\varepsilon^{*2} \wedge \operatorname{diam}(K)^2$, up to absolute constants, where $\varepsilon^*$ is defined as
\[ \varepsilon^* = \sup \{\varepsilon: \varepsilon^2 κ(M) \leq \log N^{\operatorname{loc}}(\varepsilon)\}, \]
with $N^{\operatorname{loc}}(\varepsilon)$ denoting the local entropy and $κ(M)$ is an absolute constant allowed to depend on $M$. We also provide an example and derive its corresponding minimax optimal rate.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Robust density estimation over star-shaped density classes
Authors:
Xiaolong Liu,
Matey Neykov
Abstract:
We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star…
▽ More
We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction $ε\leq \frac{1}{3}$ of the $N$ observations are arbitrarily corrupted. We obtain the minimax upper bound $\max\{ τ_{\overline{J}}^2, ε\} \wedge d^2$. Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared $L_2$ loss as $$ \max\left\{ τ^{*2} \wedge d^2, ε\wedge d^2 \right\}, $$ where $τ^* := \sup\left\{ τ: Nτ^2 \leq \log \mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c) \right\}$ for a sufficiently large constant $c$. Here, $\mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c)$ denotes the local entropy of the set $\mathcal{F}$, and $d$ is the $L_2$ diameter of $\mathcal{F}$.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints
Authors:
Akshay Prasadan,
Matey Neykov
Abstract:
We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}^n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $ε\le 1/2-κ$ for some fixed $κ\in(0,1/2]$ of $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared…
▽ More
We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}^n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $ε\le 1/2-κ$ for some fixed $κ\in(0,1/2]$ of $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared $\ell_2$ loss of $\max(η^{*2},σ^2ε^2)\wedge d^2$ with \begin{align*}
η^* = \sup \bigg\{η\ge 0 : \frac{Nη^2}{σ^2} \leq \log \mathcal{M}_K^{\operatorname{loc}}(η,c)\bigg\}, \end{align*} where $\log \mathcal{M}_K^{\operatorname{loc}}(η,c)$ denotes the local entropy of the set $K$, $d$ is the diameter of $K$, $σ^2$ is the variance, and $c$ is some sufficiently large absolute constant. A variant of our algorithm achieves the same rate for settings with known or symmetric sub-Gaussian noise, with a smaller breakdown point, still of constant order. We further study the case of unknown sub-Gaussian noise and show that the rate is slightly slower: $\max(η^{*2},σ^2ε^2\log(1/ε))\wedge d^2$. We generalize our results to the case when $K$ is star-shaped but unbounded.
△ Less
Submitted 12 June, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Some facts about the optimality of the LSE in the Gaussian sequence model with convex constraint
Authors:
Akshay Prasadan,
Matey Neykov
Abstract:
We consider a convex constrained Gaussian sequence model and characterize necessary and sufficient conditions for the least squares estimator (LSE) to be minimax optimal. For a closed convex set $K\subset \mathbb{R}^n$ we observe $Y=μ+ξ$ for $ξ\sim \mathcal{N}(0,σ^2\mathbb{I}_n)$ and $μ\in K$ and aim to estimate $μ$. We characterize the worst case risk of the LSE in multiple ways by analyzing the…
▽ More
We consider a convex constrained Gaussian sequence model and characterize necessary and sufficient conditions for the least squares estimator (LSE) to be minimax optimal. For a closed convex set $K\subset \mathbb{R}^n$ we observe $Y=μ+ξ$ for $ξ\sim \mathcal{N}(0,σ^2\mathbb{I}_n)$ and $μ\in K$ and aim to estimate $μ$. We characterize the worst case risk of the LSE in multiple ways by analyzing the behavior of the local Gaussian width on $K$. We demonstrate that optimality is equivalent to a Lipschitz property of the local Gaussian width mapping. We also provide theoretical algorithms that search for the worst case risk. We then provide examples showing optimality or suboptimality of the LSE on various sets, including $\ell_p$ balls for $p\in[1,2]$, pyramids, solids of revolution, and multivariate isotonic regression, among others.
△ Less
Submitted 3 February, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Semi-Supervised U-statistics
Authors:
Ilmun Kim,
Larry Wasserman,
Sivaraman Balakrishnan,
Matey Neykov
Abstract:
Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate thei…
▽ More
Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. Responding to this demand, we introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data, and investigate their statistical properties. We show that the proposed approach is asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, we derive minimax lower bounds in semi-supervised settings and showcase that our procedure is semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, we propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes, and demonstrate its optimality properties. Simulation studies are conducted to corroborate our findings and to further demonstrate our framework.
△ Less
Submitted 9 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Characterizing the minimax rate of nonparametric regression under bounded star-shaped constraints
Authors:
Akshay Prasadan,
Matey Neykov
Abstract:
We quantify the minimax rate for a nonparametric regression model over a star-shaped function class $\mathcal{F}$ with bounded diameter. We obtain a minimax rate of ${\varepsilon^{\ast}}^2\wedge\mathrm{diam}(\mathcal{F})^2$ where \[\varepsilon^{\ast} =\sup\{\varepsilon>0:n\varepsilon^2 \le \log M_{\mathcal{F}}^{\operatorname{loc}}(\varepsilon,c)\},\] where…
▽ More
We quantify the minimax rate for a nonparametric regression model over a star-shaped function class $\mathcal{F}$ with bounded diameter. We obtain a minimax rate of ${\varepsilon^{\ast}}^2\wedge\mathrm{diam}(\mathcal{F})^2$ where \[\varepsilon^{\ast} =\sup\{\varepsilon>0:n\varepsilon^2 \le \log M_{\mathcal{F}}^{\operatorname{loc}}(\varepsilon,c)\},\] where $\log M_{\mathcal{F}}^{\operatorname{loc}}(\cdot, c)$ is the local metric entropy of $\mathcal{F}$, $c$ is some absolute constant scaling down the entropy radius, and our loss function is the squared population $L_2$ distance over our input space $\mathcal{X}$. In contrast to classical works on the topic [cf. Yang and Barron, 1999], our results do not require functions in $\mathcal{F}$ to be uniformly bounded in sup-norm. In fact, we propose a condition that simultaneously generalizes boundedness in sup-norm and the so-called $L$-sub-Gaussian assumption that appears in the prior literature. In addition, we prove that our estimator is adaptive to the true point in the convex-constrained case, and to the best of our knowledge this is the first such estimator in this general setting. This work builds on the Gaussian sequence framework of Neykov [2022] using a similar algorithmic scheme to achieve the minimax rate. Our algorithmic rate also applies with sub-Gaussian noise. We illustrate the utility of this theory with examples including multivariate monotone functions, linear functionals over ellipsoids, and Lipschitz classes.
△ Less
Submitted 9 January, 2025; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Signal Detection with Quadratically Convex Orthosymmetric Constraints
Authors:
Matey Neykov
Abstract:
This paper is concerned with signal detection in Gaussian noise under quadratically convex orthosymmetric (QCO) constraints. Specifically the null hypothesis assumes no signal, whereas the alternative considers signal which is separated in Euclidean norm from zero, and belongs to the QCO constraint. Our main result establishes the minimax rate of the separation radius between the null and alternat…
▽ More
This paper is concerned with signal detection in Gaussian noise under quadratically convex orthosymmetric (QCO) constraints. Specifically the null hypothesis assumes no signal, whereas the alternative considers signal which is separated in Euclidean norm from zero, and belongs to the QCO constraint. Our main result establishes the minimax rate of the separation radius between the null and alternative purely in terms of the geometry of the QCO constraint -- we argue that the Kolmogorov widths of the constraint determine the critical radius. This is similar to the estimation problem with QCO constraints, which was first established by Donoho et al. (1990); however, as expected, the critical separation radius is smaller compared to the minimax optimal estimation rate. Thus signals may be detectable even when they cannot be reliably estimated.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Nearly Minimax Optimal Wasserstein Conditional Independence Testing
Authors:
Matey Neykov,
Larry Wasserman,
Ilmun Kim,
Sivaraman Balakrishnan
Abstract:
This paper is concerned with minimax conditional independence testing. In contrast to some previous works on the topic, which use the total variation distance to separate the null from the alternative, here we use the Wasserstein distance. In addition, we impose Wasserstein smoothness conditions which on bounded domains are weaker than the corresponding total variation smoothness imposed, for inst…
▽ More
This paper is concerned with minimax conditional independence testing. In contrast to some previous works on the topic, which use the total variation distance to separate the null from the alternative, here we use the Wasserstein distance. In addition, we impose Wasserstein smoothness conditions which on bounded domains are weaker than the corresponding total variation smoothness imposed, for instance, by Neykov et al. [2021]. This added flexibility expands the distributions which are allowed under the null and the alternative to include distributions which may contain point masses for instance. We characterize the optimal rate of the critical radius of testing up to logarithmic factors. Our test statistic which nearly achieves the optimal critical radius is novel, and can be thought of as a weighted multi-resolution version of the U-statistic studied by Neykov et al. [2021].
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Conditional Independence Testing for Discrete Distributions: Beyond $χ^2$- and $G$-tests
Authors:
Ilmun Kim,
Matey Neykov,
Sivaraman Balakrishnan,
Larry Wasserman
Abstract:
This paper is concerned with the problem of conditional independence testing for discrete data. In recent years, researchers have shed new light on this fundamental problem, emphasizing finite-sample optimality. The non-asymptotic viewpoint adapted in these works has led to novel conditional independence tests that enjoy certain optimality under various regimes. Despite their attractive theoretica…
▽ More
This paper is concerned with the problem of conditional independence testing for discrete data. In recent years, researchers have shed new light on this fundamental problem, emphasizing finite-sample optimality. The non-asymptotic viewpoint adapted in these works has led to novel conditional independence tests that enjoy certain optimality under various regimes. Despite their attractive theoretical properties, the considered tests are not necessarily practical, relying on a Poissonization trick and unspecified constants in their critical values. In this work, we attempt to bridge the gap between theory and practice by reproving optimality without Poissonization and calibrating tests using Monte Carlo permutations. Along the way, we also prove that classical asymptotic $χ^2$- and $G$-tests are notably sub-optimal in a high-dimensional regime, which justifies the demand for new tools. Our theoretical results are complemented by experiments on both simulated and real-world datasets. Accompanying this paper is an R package UCI that implements the proposed tests.
△ Less
Submitted 28 October, 2023; v1 submitted 10 August, 2023;
originally announced August 2023.
-
Revisiting Le Cam's Equation: Exact Minimax Rates over Convex Density Classes
Authors:
Shamindra Shrotriya,
Matey Neykov
Abstract:
We study the classical problem of deriving minimax rates for density estimation over convex density classes. Building on the pioneering work of Le Cam (1973), Birge (1983, 1986), Wong and Shen (1995), Yang and Barron (1999), we determine the exact (up to constants) minimax rate over any convex density class. This work thus extends these known results by demonstrating that the local metric entropy…
▽ More
We study the classical problem of deriving minimax rates for density estimation over convex density classes. Building on the pioneering work of Le Cam (1973), Birge (1983, 1986), Wong and Shen (1995), Yang and Barron (1999), we determine the exact (up to constants) minimax rate over any convex density class. This work thus extends these known results by demonstrating that the local metric entropy of the density class always captures the minimax optimal rates under such settings. Our bounds provide a unifying perspective across both parametric and nonparametric convex density classes, under weaker assumptions on the richness of the density class than previously considered. Our proposed `multistage sieve' MLE applies to any such convex density class. We further demonstrate that this estimator is also adaptive to the true underlying density of interest. We apply our risk bounds to rederive known minimax rates including bounded total variation, and Holder density classes. We further illustrate the utility of the result by deriving upper bounds for less studied classes, e.g., convex mixture of densities.
△ Less
Submitted 23 October, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Adversarial Sign-Corrupted Isotonic Regression
Authors:
Shamindra Shrotriya,
Matey Neykov
Abstract:
Classical univariate isotonic regression involves nonparametric estimation under a monotonicity constraint of the true signal. We consider a variation of this generating process, which we term adversarial sign-corrupted isotonic (\texttt{ASCI}) regression. Under this \texttt{ASCI} setting, the adversary has full access to the true isotonic responses, and is free to sign-corrupt them. Estimating th…
▽ More
Classical univariate isotonic regression involves nonparametric estimation under a monotonicity constraint of the true signal. We consider a variation of this generating process, which we term adversarial sign-corrupted isotonic (\texttt{ASCI}) regression. Under this \texttt{ASCI} setting, the adversary has full access to the true isotonic responses, and is free to sign-corrupt them. Estimating the true monotonic signal given these sign-corrupted responses is a highly challenging task. Notably, the sign-corruptions are designed to violate monotonicity, and possibly induce heavy dependence between the corrupted response terms. In this sense, \texttt{ASCI} regression may be viewed as an adversarial stress test for isotonic regression. Our motivation is driven by understanding whether efficient robust estimation of the monotone signal is feasible under this adversarial setting. We develop \texttt{ASCIFIT}, a three-step estimation procedure under the \texttt{ASCI} setting. The \texttt{ASCIFIT} procedure is conceptually simple, easy to implement with existing software, and consists of applying the \texttt{PAVA} with crucial pre- and post-processing corrections. We formalize this procedure, and demonstrate its theoretical guarantees in the form of sharp high probability upper bounds and minimax lower bounds. We illustrate our findings with detailed simulations.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
On the minimax rate of the Gaussian sequence model under bounded convex constraints
Authors:
Matey Neykov
Abstract:
We determine the exact minimax rate of a Gaussian sequence model under bounded convex constraints, purely in terms of the local geometry of the given constraint set $K$. Our main result shows that the minimax risk (up to constant factors) under the squared $\ell_2$ loss is given by $ε^{*2} \wedge \operatorname{diam}(K)^2$ with \begin{align*}
ε^* = \sup \bigg\{ε: \frac{ε^2}{σ^2} \leq \log M^{\ope…
▽ More
We determine the exact minimax rate of a Gaussian sequence model under bounded convex constraints, purely in terms of the local geometry of the given constraint set $K$. Our main result shows that the minimax risk (up to constant factors) under the squared $\ell_2$ loss is given by $ε^{*2} \wedge \operatorname{diam}(K)^2$ with \begin{align*}
ε^* = \sup \bigg\{ε: \frac{ε^2}{σ^2} \leq \log M^{\operatorname{loc}}(ε)\bigg\}, \end{align*} where $\log M^{\operatorname{loc}}(ε)$ denotes the local entropy of the set $K$, and $σ^2$ is the variance of the noise. We utilize our abstract result to re-derive known minimax rates for some special sets $K$ such as hyperrectangles, ellipses, and more generally quadratically convex orthosymmetric sets. Finally, we extend our results to the unbounded case with known $σ^2$ to show that the minimax rate in that case is $ε^{*2}$.
△ Less
Submitted 7 November, 2022; v1 submitted 18 January, 2022;
originally announced January 2022.
-
Local permutation tests for conditional independence
Authors:
Ilmun Kim,
Matey Neykov,
Sivaraman Balakrishnan,
Larry Wasserman
Abstract:
In this paper, we investigate local permutation tests for testing conditional independence between two random vectors $X$ and $Y$ given $Z$. The local permutation test determines the significance of a test statistic by locally shuffling samples which share similar values of the conditioning variables $Z$, and it forms a natural extension of the usual permutation approach for unconditional independ…
▽ More
In this paper, we investigate local permutation tests for testing conditional independence between two random vectors $X$ and $Y$ given $Z$. The local permutation test determines the significance of a test statistic by locally shuffling samples which share similar values of the conditioning variables $Z$, and it forms a natural extension of the usual permutation approach for unconditional independence testing. Despite its simplicity and empirical support, the theoretical underpinnings of the local permutation test remain unclear. Motivated by this gap, this paper aims to establish theoretical foundations of local permutation tests with a particular focus on binning-based statistics. We start by revisiting the hardness of conditional independence testing and provide an upper bound for the power of any valid conditional independence test, which holds when the probability of observing collisions in $Z$ is small. This negative result naturally motivates us to impose additional restrictions on the possible distributions under the null and alternate. To this end, we focus our attention on certain classes of smooth distributions and identify provably tight conditions under which the local permutation method is universally valid, i.e. it is valid when applied to any (binning-based) test statistic. To complement this result on type I error control, we also show that in some cases, a binning-based statistic calibrated via the local permutation method can achieve minimax optimal power. We also introduce a double-binning permutation strategy, which yields a valid test over less smooth null distributions than the typical single-binning method without compromising much power. Finally, we present simulation results to support our theoretical findings.
△ Less
Submitted 6 January, 2022; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Non-Asymptotic Bounds for the $\ell_{\infty}$ Estimator in Linear Regression with Uniform Noise
Authors:
Yufei Yi,
Matey Neykov
Abstract:
The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*}
\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ} \|\boldsymbol{Y} - \mathbf{X}\boldsymbolβ\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under…
▽ More
The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*}
\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ} \|\boldsymbol{Y} - \mathbf{X}\boldsymbolβ\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates was recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $\|\hat{\boldsymbolβ}-\boldsymbolβ^*\|_2$ for a Chebyshev estimator $\hat{\boldsymbolβ}$, in a regression setting with uniformly distributed noise $\varepsilon_i\sim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $\mathbf{X}$, we can bound the error rate by $\frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. On the other hand we also argue that there exist designs for which this estimator behaves sub-optimally in terms of the constant $C_p$'s dependence on $p$. In addition we show that "Chebyshev's LASSO" has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.
△ Less
Submitted 14 March, 2023; v1 submitted 17 August, 2021;
originally announced August 2021.
-
A New Perspective on Debiasing Linear Regressions
Authors:
Yufei Yi,
Matey Neykov
Abstract:
In this paper, we propose an abstract procedure for debiasing constrained or regularized potentially high-dimensional linear models. It is elementary to show that the proposed procedure can produce $\frac{1}{\sqrt{n}}$-confidence intervals for individual coordinates (or even bounded contrasts) in models with unknown covariance, provided that the covariance has bounded spectrum. While the proof of…
▽ More
In this paper, we propose an abstract procedure for debiasing constrained or regularized potentially high-dimensional linear models. It is elementary to show that the proposed procedure can produce $\frac{1}{\sqrt{n}}$-confidence intervals for individual coordinates (or even bounded contrasts) in models with unknown covariance, provided that the covariance has bounded spectrum. While the proof of the statistical guarantees of our procedure is simple, its implementation requires more care due to the complexity of the optimization programs we need to solve. We spend the bulk of this paper giving examples in which the proposed algorithm can be implemented in practice. One fairly general class of instances which are amenable to applications of our procedure include convex constrained least squares. We are able to translate the procedure to an abstract algorithm over this class of models, and we give concrete examples where efficient polynomial time methods for debiasing exist. Those include the constrained version of the group LASSO, regression under monotone constraints, regression with positive monotone constraints and non-negative least squares. We also demonstrate that our method can debias Minkowski gauge selectors such as the ones proposed by Cai et al. (2016) under a certain condition. This solves an open problem posed by Cai et al. (2016) on how to debias such selectors when the covariance is unknown. In addition, we show that our abstract procedure can be applied to efficiently debias group LASSO, SLOPE and square-root SLOPE, among other popular regularized procedures under certain assumptions. We provide thorough simulation results in support of our theoretical findings.
△ Less
Submitted 11 January, 2023; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Minimax Optimal Conditional Density Estimation under Total Variation Smoothness
Authors:
Michael Li,
Matey Neykov,
Sivaraman Balakrishnan
Abstract:
This paper studies the minimax rate of nonparametric conditional density estimation under a weighted absolute value loss function in a multivariate setting. We first demonstrate that conditional density estimation is impossible if one only requires that $p_{X|Z}$ is smooth in $x$ for all values of $z$. This motivates us to consider a sub-class of absolutely continuous distributions, restricting th…
▽ More
This paper studies the minimax rate of nonparametric conditional density estimation under a weighted absolute value loss function in a multivariate setting. We first demonstrate that conditional density estimation is impossible if one only requires that $p_{X|Z}$ is smooth in $x$ for all values of $z$. This motivates us to consider a sub-class of absolutely continuous distributions, restricting the conditional density $p_{X|Z}(x|z)$ to not only be Hölder smooth in $x$, but also be total variation smooth in $z$. We propose a corresponding kernel-based estimator and prove that it achieves the minimax rate. We give some simple examples of densities satisfying our assumptions which imply that our results are not vacuous. Finally, we propose an estimator which achieves the minimax optimal rate adaptively, i.e., without the need to know the smoothness parameter values in advance. Crucially, both of our estimators (the adaptive and non-adaptive ones) impose no assumptions on the marginal density $p_Z$, and are not obtained as a ratio between two kernel smoothing estimators which may sound like a go to approach in this problem.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
Non-Sparse PCA in High Dimensions via Cone Projected Power Iteration
Authors:
Yufei Yi,
Matey Neykov
Abstract:
In this paper, we propose a cone projected power iteration algorithm to recover the first principal eigenvector from a noisy positive semidefinite matrix. When the true principal eigenvector is assumed to belong to a convex cone, the proposed algorithm is fast and has a tractable error. Specifically, the method achieves polynomial time complexity for certain convex cones equipped with fast project…
▽ More
In this paper, we propose a cone projected power iteration algorithm to recover the first principal eigenvector from a noisy positive semidefinite matrix. When the true principal eigenvector is assumed to belong to a convex cone, the proposed algorithm is fast and has a tractable error. Specifically, the method achieves polynomial time complexity for certain convex cones equipped with fast projection such as the monotone cone. It attains a small error when the noisy matrix has a small cone-restricted operator norm. We supplement the above results with a minimax lower bound of the error under the spiked covariance model. Our numerical experiments on simulated and real data, show that our method achieves shorter run time and smaller error in comparison to the ordinary power iteration and some sparse principal component analysis algorithms if the principal eigenvector is in a convex cone.
△ Less
Submitted 28 February, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping
Authors:
Yichi Zhang,
Molei Liu,
Matey Neykov,
Tianxi Cai
Abstract:
Electronic Health Records (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to it's major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised a…
▽ More
Electronic Health Records (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to it's major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms especially when the number of candidate features, $p$, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small labeled data where both the label $Y$ and the feature set $X$ are observed and a much larger unlabeled data with observations on $X$ only as well as a surrogate variable $S$ that is predictive of $Y$ and available for all patients, under a high dimensional setting. Under a working prior assumption that $S$ is related to $X$ only through $Y$ and allowing it to hold approximately, we propose a prior adaptive semi-supervised (PASS) estimator that adaptively incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and demonstrate its superiority over existing estimators via simulation studies. The proposed method is applied to an EHR phenotyping study of rheumatoid arthritis at Partner's Healthcare.
△ Less
Submitted 12 September, 2021; v1 submitted 26 March, 2020;
originally announced March 2020.
-
Minimax Optimal Conditional Independence Testing
Authors:
Matey Neykov,
Sivaraman Balakrishnan,
Larry Wasserman
Abstract:
We consider the problem of conditional independence testing of $X$ and $Y$ given $Z$ where $X,Y$ and $Z$ are three real random variables and $Z$ is continuous. We focus on two main cases - when $X$ and $Y$ are both discrete, and when $X$ and $Y$ are both continuous. In view of recent results on conditional independence testing (Shah and Peters, 2018), one cannot hope to design non-trivial tests, w…
▽ More
We consider the problem of conditional independence testing of $X$ and $Y$ given $Z$ where $X,Y$ and $Z$ are three real random variables and $Z$ is continuous. We focus on two main cases - when $X$ and $Y$ are both discrete, and when $X$ and $Y$ are both continuous. In view of recent results on conditional independence testing (Shah and Peters, 2018), one cannot hope to design non-trivial tests, which control the type I error for all absolutely continuous conditionally independent distributions, while still ensuring power against interesting alternatives. Consequently, we identify various, natural smoothness assumptions on the conditional distributions of $X,Y|Z=z$ as $z$ varies in the support of $Z$, and study the hardness of conditional independence testing under these smoothness assumptions. We derive matching lower and upper bounds on the critical radius of separation between the null and alternative hypotheses in the total variation metric. The tests we consider are easily implementable and rely on binning the support of the continuous variable $Z$. To complement these results, we provide a new proof of the hardness result of Shah and Peters.
△ Less
Submitted 1 July, 2021; v1 submitted 9 January, 2020;
originally announced January 2020.
-
High-Temperature Structure Detection in Ferromagnets
Authors:
Yuan Cao,
Matey Neykov,
Han Liu
Abstract:
This paper studies structure detection problems in high temperature ferromagnetic (positive interaction only) Ising models. The goal is to distinguish whether the underlying graph is empty, i.e., the model consists of independent Rademacher variables, versus the alternative that the underlying graph contains a subgraph of a certain structure. We give matching upper and lower minimax bounds under w…
▽ More
This paper studies structure detection problems in high temperature ferromagnetic (positive interaction only) Ising models. The goal is to distinguish whether the underlying graph is empty, i.e., the model consists of independent Rademacher variables, versus the alternative that the underlying graph contains a subgraph of a certain structure. We give matching upper and lower minimax bounds under which testing this problem is possible/impossible respectively. Our results reveal that a key quantity called graph arboricity drives the testability of the problem. On the computational front, under a conjecture of the computational hardness of sparse principal component analysis, we prove that, unless the signal is strong enough, there are no polynomial time tests which are capable of testing this problem. In order to prove this result we exhibit a way to give sharp inequalities for the even moments of sums of i.i.d. Rademacher random variables which may be of independent interest.
△ Less
Submitted 12 January, 2021; v1 submitted 21 September, 2018;
originally announced September 2018.
-
Misspecified Nonconvex Statistical Optimization for Phase Retrieval
Authors:
Zhuoran Yang,
Lin F. Yang,
Ethan X. Fang,
Tuo Zhao,
Zhaoran Wang,
Matey Neykov
Abstract:
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtin…
▽ More
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.
△ Less
Submitted 17 December, 2017;
originally announced December 2017.
-
Property Testing in High Dimensional Ising models
Authors:
Matey Neykov,
Han Liu
Abstract:
This paper explores the information-theoretic limitations of graph property testing in zero-field Ising models. Instead of learning the entire graph structure, sometimes testing a basic graph property such as connectivity, cycle presence or maximum clique size is a more relevant and attainable objective. Since property testing is more fundamental than graph recovery, any necessary conditions for p…
▽ More
This paper explores the information-theoretic limitations of graph property testing in zero-field Ising models. Instead of learning the entire graph structure, sometimes testing a basic graph property such as connectivity, cycle presence or maximum clique size is a more relevant and attainable objective. Since property testing is more fundamental than graph recovery, any necessary conditions for property testing imply corresponding conditions for graph recovery, while custom property tests can be statistically and/or computationally more efficient than graph recovery based algorithms. Understanding the statistical complexity of property testing requires the distinction of ferromagnetic (i.e., positive interactions only) and general Ising models. Using combinatorial constructs such as graph packing and strong monotonicity, we characterize how target properties affect the corresponding minimax upper and lower bounds within the realm of ferromagnets. On the other hand, by studying the detection of an antiferromagnetic (i.e., negative interactions only) Curie-Weiss model buried in Rademacher noise, we show that property testing is strictly more challenging over general Ising models. In terms of methodological development, we propose two types of correlation based tests: computationally efficient screening for ferromagnets, and score type tests for general models, including a fast cycle presence test. Our correlation screening tests match the information-theoretic bounds for property testing in ferromagnets.
△ Less
Submitted 30 July, 2018; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Adaptive Inferential Method for Monotone Graph Invariants
Authors:
Junwei Lu,
Matey Neykov,
Han Liu
Abstract:
We consider the problem of undirected graphical model inference. In many applications, instead of perfectly recovering the unknown graph structure, a more realistic goal is to infer some graph invariants (e.g., the maximum degree, the number of connected subgraphs, the number of isolated nodes). In this paper, we propose a new inferential framework for testing nested multiple hypotheses and constr…
▽ More
We consider the problem of undirected graphical model inference. In many applications, instead of perfectly recovering the unknown graph structure, a more realistic goal is to infer some graph invariants (e.g., the maximum degree, the number of connected subgraphs, the number of isolated nodes). In this paper, we propose a new inferential framework for testing nested multiple hypotheses and constructing confidence intervals of the unknown graph invariants under undirected graphical models. Compared to perfect graph recovery, our methods require significantly weaker conditions. This paper makes two major contributions: (i) Methodologically, for testing nested multiple hypotheses, we propose a skip-down algorithm on the whole family of monotone graph invariants (The invariants which are non-decreasing under addition of edges). We further show that the same skip-down algorithm also provides valid confidence intervals for the targeted graph invariants. (ii) Theoretically, we prove that the length of the obtained confidence intervals are optimal and adaptive to the unknown signal strength. We also prove generic lower bounds for the confidence interval length for various invariants. Numerical results on both synthetic simulations and a brain imaging dataset are provided to illustrate the usefulness of the proposed method.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes
Authors:
Abhishek Chakrabortty,
Matey Neykov,
Raymond Carroll,
Tianxi Cai
Abstract:
We consider the recovery of regression coefficients, denoted by $\boldsymbolβ_0$, for a single index model (SIM) relating a binary outcome $Y$ to a set of possibly high dimensional covariates $\boldsymbol{X}$, based on a large but 'unlabeled' dataset $\mathcal{U}$, with $Y$ never observed. On $\mathcal{U}$, we fully observe $\boldsymbol{X}$ and additionally, a surrogate $S$ which, while not being…
▽ More
We consider the recovery of regression coefficients, denoted by $\boldsymbolβ_0$, for a single index model (SIM) relating a binary outcome $Y$ to a set of possibly high dimensional covariates $\boldsymbol{X}$, based on a large but 'unlabeled' dataset $\mathcal{U}$, with $Y$ never observed. On $\mathcal{U}$, we fully observe $\boldsymbol{X}$ and additionally, a surrogate $S$ which, while not being strongly predictive of $Y$ throughout the entirety of its support, can forecast it with high accuracy when it assumes extreme values. Such datasets arise naturally in modern studies involving large databases such as electronic medical records (EMR) where $Y$, unlike $(\boldsymbol{X}, S)$, is difficult and/or expensive to obtain. In EMR studies, an example of $Y$ and $S$ would be the true disease phenotype and the count of the associated diagnostic codes respectively. Assuming another SIM for $S$ given $\boldsymbol{X}$, we show that under sparsity assumptions, we can recover $\boldsymbolβ_0$ proportionally by simply fitting a least squares LASSO estimator to the subset of the observed data on $(\boldsymbol{X}, S)$ restricted to the extreme sets of $S$, with $Y$ imputed using the surrogacy of $S$. We obtain sharp finite sample performance bounds for our estimator, including deterministic deviation bounds and probabilistic guarantees. We demonstrate the effectiveness of our approach through multiple simulation studies, as well as by application to real data from an EMR study conducted at the Partners HealthCare Systems.
△ Less
Submitted 30 June, 2018; v1 submitted 18 January, 2017;
originally announced January 2017.
-
Combinatorial Inference for Graphical Models
Authors:
Matey Neykov,
Junwei Lu,
Han Liu
Abstract:
We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. T…
▽ More
We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we develop a unified theory for the fundamental limits of a large family of combinatorial inference problems. We propose new concepts including structural packing and buffer entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of novel and practical structural testing algorithms to match the lower bounds. We provide thorough numerical results on both synthetic graphical models and brain networks to illustrate the usefulness of these proposed methods.
△ Less
Submitted 12 February, 2018; v1 submitted 10 August, 2016;
originally announced August 2016.
-
L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs
Authors:
Matey Neykov,
Jun S. Liu,
Tianxi Cai
Abstract:
It is known that for a certain class of single index models (SIMs) $Y = f(\boldsymbol{X}_{p \times 1}^\intercal\boldsymbolβ_0, \varepsilon)$, support recovery is impossible when $\boldsymbol{X} \sim \mathcal{N}(0, \mathbb{I}_{p \times p})$ and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested.…
▽ More
It is known that for a certain class of single index models (SIMs) $Y = f(\boldsymbol{X}_{p \times 1}^\intercal\boldsymbolβ_0, \varepsilon)$, support recovery is impossible when $\boldsymbol{X} \sim \mathcal{N}(0, \mathbb{I}_{p \times p})$ and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design $\boldsymbol{X}$ comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with $L_1$ penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on $f$ and $\varepsilon$ compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of $\boldsymbolβ_0$ if $\boldsymbol{X} \sim \mathcal{N}(0, \boldsymbolΣ)$, and the covariance $\boldsymbolΣ$ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.
△ Less
Submitted 22 June, 2016; v1 submitted 25 November, 2015;
originally announced November 2015.
-
Signed Support Recovery for Single Index Models in High-Dimensions
Authors:
Matey Neykov,
Qian Lin,
Jun S. Liu
Abstract:
In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpe…
▽ More
In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by Lin et al. (2015); and (b) a semi-definite programming (SDP) approach inspired by Amini & Wainwright (2008). When $s=O(p^{1-δ})$ for some $δ>0$, we demonstrate that both procedures can succeed in recovering the support of $\boldsymbolβ$ as long as the rescaled sample size $κ=\frac{n}{s\log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $κ$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\boldsymbolβ$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations.
△ Less
Submitted 22 June, 2016; v1 submitted 6 November, 2015;
originally announced November 2015.
-
A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations
Authors:
Matey Neykov,
Yang Ning,
Jun S. Liu,
Han Liu
Abstract:
We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-e…
▽ More
We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems.
Different from existing methods, all of which require the specification of the likelihood or pseudo-likelihood, our framework is likelihood-free. As a result, our approach provides valid inference for a broad class of high dimensional constrained estimating equation problems, which are not covered by existing methods.
Such examples include, noisy compressed sensing, instrumental variable regression, undirected graphical models, discriminant analysis and vector autoregressive models. We present detailed theoretical results for all these examples. Finally, we conduct thorough numerical simulations, and a real dataset analysis to back up the developed theoretical results.
△ Less
Submitted 22 June, 2016; v1 submitted 30 October, 2015;
originally announced October 2015.