-
On the estimation of persistence intensity functions and linear representations of persistence diagrams
Authors:
Weichen Wu,
Jisu Kim,
Alessandro Rinaldo
Abstract:
The prevailing statistical approach to analyzing persistence diagrams is concerned with filtering out topological noise. In this paper, we adopt a different viewpoint and aim at estimating the actual distribution of a random persistence diagram, which captures both topological signal and noise. To that effect, Chazel and Divol (2019) proved that, under general conditions, the expected value of a r…
▽ More
The prevailing statistical approach to analyzing persistence diagrams is concerned with filtering out topological noise. In this paper, we adopt a different viewpoint and aim at estimating the actual distribution of a random persistence diagram, which captures both topological signal and noise. To that effect, Chazel and Divol (2019) proved that, under general conditions, the expected value of a random persistence diagram is a measure admitting a Lebesgue density, called the persistence intensity function. In this paper, we are concerned with estimating the persistence intensity function and a novel, normalized version of it -- called the persistence density function. We present a class of kernel-based estimators based on an i.i.d. sample of persistence diagrams and derive estimation rates in the supremum norm. As a direct corollary, we obtain uniform consistency rates for estimating linear representations of persistence diagrams, including Betti numbers and persistence surfaces. Interestingly, the persistence density function delivers stronger statistical guarantees.
△ Less
Submitted 25 October, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Inference for Projection Parameters in Linear Regression: beyond $d = o(n^{1/2})$
Authors:
Woonyoung Chang,
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
We consider the problem of inference for projection parameters in linear regression with increasing dimensions. This problem has been studied under a variety of assumptions in the literature. The classical asymptotic normality result for the least squares estimator of the projection parameter only holds when the dimension $d$ of the covariates is of a smaller order than $n^{1/2}$, where $n$ is the…
▽ More
We consider the problem of inference for projection parameters in linear regression with increasing dimensions. This problem has been studied under a variety of assumptions in the literature. The classical asymptotic normality result for the least squares estimator of the projection parameter only holds when the dimension $d$ of the covariates is of a smaller order than $n^{1/2}$, where $n$ is the sample size. Traditional sandwich estimator-based Wald intervals are asymptotically valid in this regime. In this work, we propose a bias correction for the least squares estimator and prove the asymptotic normality of the resulting debiased estimator. Precisely, we provide an explicit finite sample Berry Esseen bound on the Normal approximation to the law of the linear contrasts of the proposed estimator normalized by the sandwich standard error estimate. Our bound, under only finite moment conditions on covariates and errors, tends to 0 as long as $d = o(n^{2/3})$ up to the polylogarithmic factors. Furthermore, we leverage recent methods of statistical inference that do not require an estimator of the variance to perform asymptotically valid statistical inference and that leads to a sharper miscoverage control compared to Wald's. We provide a discussion of how our techniques can be generalized to increase the allowable range of $d$ even further.
△ Less
Submitted 11 January, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Dual Induction CLT for High-dimensional m-dependent Data
Authors:
Heejong Bong,
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, match…
▽ More
We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, matching the optimal rates established in the univariate case. When specialized to the sums of independent non-degenerate random vectors, we obtain sharp rates under the weakest possible conditions. On the technical side, we develop an inductive relationship between anti-concentration inequalities and Berry--Esseen bounds, inspired by the classical Lindeberg swapping method and the concentration inequality approach for dependent data, that may be of independent interest.
△ Less
Submitted 16 November, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
High-probability sample complexities for policy evaluation with linear function approximation
Authors:
Gen Li,
Weichen Wu,
Yuejie Chi,
Cong Ma,
Alessandro Rinaldo,
Yuting Wei
Abstract:
This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale li…
▽ More
This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.
△ Less
Submitted 2 May, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
A Sequential Test for Log-Concavity
Authors:
Aditya Gangrade,
Alessandro Rinaldo,
Aaditya Ramdas
Abstract:
On observing a sequence of i.i.d.\ data with distribution $P$ on $\mathbb{R}^d$, we ask the question of how one can test the null hypothesis that $P$ has a log-concave density. This paper proves one interesting negative and positive result: the non-existence of test (super)martingales, and the consistency of universal inference. To elaborate, the set of log-concave distributions $\mathcal{L}$ is a…
▽ More
On observing a sequence of i.i.d.\ data with distribution $P$ on $\mathbb{R}^d$, we ask the question of how one can test the null hypothesis that $P$ has a log-concave density. This paper proves one interesting negative and positive result: the non-existence of test (super)martingales, and the consistency of universal inference. To elaborate, the set of log-concave distributions $\mathcal{L}$ is a nonparametric class, which contains the set $\mathcal G$ of all possible Gaussians with any mean and covariance. Developing further the recent geometric concept of fork-convexity, we first prove that there do no exist any nontrivial test martingales or test supermartingales for $\mathcal G$ (a process that is simultaneously a nonnegative supermartingale for every distribution in $\mathcal G$), and hence also for its superset $\mathcal{L}$. Due to this negative result, we turn our attention to constructing an e-process -- a process whose expectation at any stopping time is at most one, under any distribution in $\mathcal{L}$ -- which yields a level-$α$ test by simply thresholding at $1/α$. We take the approach of universal inference, which avoids intractable likelihood asymptotics by taking the ratio of a nonanticipating likelihood over alternatives against the maximum likelihood under the null. Despite its conservatism, we show that the resulting test is consistent (power one), and derive its power against Hellinger alternatives. To the best of our knowledge, there is no other e-process or sequential test for $\mathcal{L}$.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
High-dimensional Berry-Esseen Bound for $m$-Dependent Random Samples
Authors:
Heejong Bong,
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which ar…
▽ More
In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which are inspired by the telescoping method of Chen and Shao (2004) and the recursion method of Kuchibhotla and Rinaldo (2020). Performing a dual induction based on the relationships, we obtain tight Berry-Esseen bounds for dependent samples.
△ Less
Submitted 10 December, 2022;
originally announced December 2022.
-
Mitigating multiple descents: A model-agnostic framework for risk monotonization
Authors:
Pratik Patil,
Arun Kumar Kuchibhotla,
Yuting Wei,
Alessandro Rinaldo
Abstract:
Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for…
▽ More
Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for risk monotonization based on cross-validation that takes as input a generic prediction procedure and returns a modified procedure whose out-of-sample prediction risk is, asymptotically, monotonic in the limiting aspect ratio. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that, under very mild assumptions, they provably achieve monotonic asymptotic risk behavior. Our results are applicable to a broad variety of prediction procedures and loss functions, and do not require a well-specified (parametric) model. We exemplify our framework with concrete analyses of the minimum $\ell_2$, $\ell_1$-norm least squares prediction procedures. As one of the ingredients in our analysis, we also derive novel additive and multiplicative forms of oracle risk inequalities for split cross-validation that are of independent interest.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Detecting Abrupt Changes in Sequential Pairwise Comparison Data
Authors:
Wanshan Li,
Daren Wang,
Alessandro Rinaldo
Abstract:
The Bradley-Terry-Luce (BTL) model is a classic and very popular statistical approach for eliciting a global ranking among a collection of items using pairwise comparison data. In applications in which the comparison outcomes are observed as a time series, it is often the case that data are non-stationary, in the sense that the true underlying ranking changes over time. In this paper we are concer…
▽ More
The Bradley-Terry-Luce (BTL) model is a classic and very popular statistical approach for eliciting a global ranking among a collection of items using pairwise comparison data. In applications in which the comparison outcomes are observed as a time series, it is often the case that data are non-stationary, in the sense that the true underlying ranking changes over time. In this paper we are concerned with localizing the change points in a high-dimensional BTL model with piece-wise constant parameters. We propose novel and practicable algorithms based on dynamic programming that can consistently estimate the unknown locations of the change points. We provide consistency rates for our methodology that depend explicitly on the model parameters, the temporal spacing between two consecutive change points and the magnitude of the change. We corroborate our findings with extensive numerical experiments and a real-life example.
△ Less
Submitted 29 November, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
E-detectors: a nonparametric framework for sequential change detection
Authors:
Jaehyeok Shin,
Aaditya Ramdas,
Alessandro Rinaldo
Abstract:
Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean…
▽ More
Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an \emph{e-detector}, which is composed of sums of e-processes -- a fundamental generalization of nonnegative supermartingales -- that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
△ Less
Submitted 29 October, 2023; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Denoising and change point localisation in piecewise-constant high-dimensional regression coefficients
Authors:
Fan Wang,
Oscar Hernan Madrid Padilla,
Yi Yu,
Alessandro Rinaldo
Abstract:
We study the theoretical properties of the fused lasso procedure originally proposed by \cite{tibshirani2005sparsity} in the context of a linear regression model in which the regression coefficient are totally ordered and assumed to be sparse and piecewise constant. Despite its popularity, to the best of our knowledge, estimation error bounds in high-dimensional settings have only been obtained fo…
▽ More
We study the theoretical properties of the fused lasso procedure originally proposed by \cite{tibshirani2005sparsity} in the context of a linear regression model in which the regression coefficient are totally ordered and assumed to be sparse and piecewise constant. Despite its popularity, to the best of our knowledge, estimation error bounds in high-dimensional settings have only been obtained for the simple case in which the design matrix is the identity matrix. We formulate a novel restricted isometry condition on the design matrix that is tailored to the fused lasso estimator and derive estimation bounds for both the constrained version of the fused lasso assuming dense coefficients and for its penalised version. We observe that the estimation error can be dominated by either the lasso or the fused lasso rate, depending on whether the number of non-zero coefficient is larger than the number of piece-wise constant segments. Finally, we devise a post-processing procedure to recover the piecewise-constant pattern of the coefficients. Extensive numerical experiments support our theoretical findings.
△ Less
Submitted 18 February, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model
Authors:
Heejong Bong,
Alessandro Rinaldo
Abstract:
Ranking problems based on pairwise comparisons, such as those arising in online gaming, often involve a large pool of items to order. In these situations, the gap in performance between any two items can be significant, and the smallest and largest winning probabilities can be very close to zero or one. Furthermore, each item may be compared only to a subset of all the items, so that not all pairw…
▽ More
Ranking problems based on pairwise comparisons, such as those arising in online gaming, often involve a large pool of items to order. In these situations, the gap in performance between any two items can be significant, and the smallest and largest winning probabilities can be very close to zero or one. Furthermore, each item may be compared only to a subset of all the items, so that not all pairwise comparisons are observed. In this paper, we study the performance of the Bradley-Terry-Luce model for ranking from pairwise comparison data under more realistic settings than those considered in the literature so far. In particular, we allow for near-degenerate winning probabilities and arbitrary comparison designs. We obtain novel results about the existence of the maximum likelihood estimator (MLE) and the corresponding $\ell_2$ estimation error without the bounded winning probability assumption commonly used in the literature and for arbitrary comparison graph topologies. Central to our approach is the reliance on the Fisher information matrix to express the dependence on the graph topologies and the impact of the values of the winning probabilities on the estimation risk and on the conditions for the existence of the MLE. Our bounds recover existing results as special cases but are more broadly applicable.
△ Less
Submitted 15 June, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Optimal partition recovery in general graphs
Authors:
Yi Yu,
Oscar Hernan Madrid Padilla,
Alessandro Rinaldo
Abstract:
We consider a graph-structured change point problem in which we observe a random vector with piecewise constant but unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector. When the partition $\mathcal{S}$ cons…
▽ More
We consider a graph-structured change point problem in which we observe a random vector with piecewise constant but unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $σ^2$, the size $Δ$ of the smaller element of the partition, the magnitude $κ$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut -- a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $κ^2 Δσ^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim ζ_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $ζ_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error -- measured by the symmetric difference between the true and estimated partition -- of order $ κ^{-2} σ^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes.
△ Less
Submitted 18 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
$\ell_{\infty}$-Bounds of the MLE in the BTL Model under General Comparison Graphs
Authors:
Wanshan Li,
Shamindra Shrotriya,
Alessandro Rinaldo
Abstract:
The Bradley-Terry-Luce (BTL) model is a popular statistical approach for estimating the global ranking of a collection of items using pairwise comparisons. To ensure accurate ranking, it is essential to obtain precise estimates of the model parameters in the $\ell_{\infty}$-loss. The difficulty of this task depends crucially on the topology of the pairwise comparison graph over the given items. Ho…
▽ More
The Bradley-Terry-Luce (BTL) model is a popular statistical approach for estimating the global ranking of a collection of items using pairwise comparisons. To ensure accurate ranking, it is essential to obtain precise estimates of the model parameters in the $\ell_{\infty}$-loss. The difficulty of this task depends crucially on the topology of the pairwise comparison graph over the given items. However, beyond very few well-studied cases, such as the complete and Erdös-Rényi comparison graphs, little is known about the performance of the maximum likelihood estimator MLE) of the BTL model parameters in the $\ell_{\infty}$-loss under more general graph topologies. In this paper, we derive novel, general upper bounds on the $\ell_{\infty}$ estimation error of the BTL MLE that depend explicitly on the algebraic connectivity of the comparison graph, the maximal performance gap across items and the sample complexity. We demonstrate that the derived bounds perform well and in some cases are sharper compared to known results obtained using different loss functions and more restricted assumptions and graph topologies. We carefully compare our results to Yan et al. (2012), which is closest in spirit to our work. We further provide minimax lower bounds under $\ell_{\infty}$-error that nearly match the upper bounds over a class of sufficiently regular graph topologies. Finally, we study the implications of our $\ell_{\infty}$-bounds for efficient (offline) tournament design. We illustrate and discuss our findings through various examples and simulations.
△ Less
Submitted 22 June, 2022; v1 submitted 20 October, 2021;
originally announced October 2021.
-
Lattice partition recovery with dyadic CART
Authors:
Oscar Hernan Madrid Padilla,
Yi Yu,
Alessandro Rinaldo
Abstract:
We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e.~of e…
▽ More
We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e.~of estimating the partition of the lattice induced by the constancy regions of the unknown signal, using the computationally-efficient dyadic classification and regression tree (DCART) methodology proposed by \citep{donoho1997cart}. We prove that, under appropriate regularity conditions on the shape of the partition elements, a DCART-based procedure consistently estimates the underlying partition at a rate of order $σ^2 k^* \log (N)/κ^2$, where $k^*$ is the minimal number of rectangular sub-graphs obtained using recursive dyadic partitions supporting the signal partition, $σ^2$ is the noise variance, $κ$ is the minimal magnitude of the signal difference among contiguous elements of the partition and $N$ is the size of the lattice. Furthermore, under stronger assumptions, our method attains a sharper estimation error of order $σ^2\log(N)/κ^2$, independent of $k^*$, which we show to be minimax rate optimal. Our theoretical guarantees further extend to the partition estimator based on the optimal regression tree estimator (ORT) of \cite{chatterjee2019adaptive} and to the one obtained through an NP-hard exhaustive search method. We corroborate our theoretical findings and the effectiveness of DCART for partition recovery in simulations.
△ Less
Submitted 27 October, 2021; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Optimal network online change point localisation
Authors:
Yi Yu,
Oscar Hernan Madrid Padilla,
Daren Wang,
Alessandro Rinaldo
Abstract:
We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection…
▽ More
We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NP-hard algorithms and polynomial-time algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/α) \frac{\max\{r^2/n, \, 1\}}{κ_0^2 n ρ},\\ \lesssim \log(Δ/α) \frac{\max\{r^2/n, \, \log(r)\}}{κ_0^2 n ρ}, & \mbox{with NP-hard algorithms},\\ \lesssim \log(Δ/α) \frac{r}{κ_0^2 n ρ}, & \mbox{with polynomial-time algorithms}, \end{cases} \] where $κ_0, n, ρ, r$ and $α$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall Type-I error upper bound. All the model parameters are allowed to vary as $Δ$, the location of the change point, diverges. The polynomial-time algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of Type-I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of Type-I error control, our proposed approaches outperform state-of-the-art methods.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Nonparametric iterated-logarithm extensions of the sequential generalized likelihood ratio test
Authors:
Jaehyeok Shin,
Aaditya Ramdas,
Alessandro Rinaldo
Abstract:
We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simul…
▽ More
We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simulations due to infinite horizon of the tests and the composite nonparametric nulls under consideration. Using time-uniform boundary-crossing inequalities, we carry out a unified nonasymptotic analysis of expected sample sizes of one-sided and open-ended tests over nonparametric classes of distributions (including sub-Gaussian, sub-exponential, sub-gamma, and exponential families). Finally, we present a flexible and practical method to construct time-uniform confidence sequences that are easily tunable to be uniformly close to the pointwise Chernoff bound over any target time interval.
△ Less
Submitted 13 May, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
High-dimensional CLT for Sums of Non-degenerate Random Vectors: $n^{-1/2}$-rate
Authors:
Arun Kumar Kuchibhotla,
Alessandro Rinaldo
Abstract:
In this note, we provide a Berry--Esseen bounds for rectangles in high-dimensions when the random vectors have non-singular covariance matrices. Under this assumption of non-singularity, we prove an $n^{-1/2}$ scaling for the Berry--Esseen bound for sums of mean independent random vectors with a finite third moment. The proof is essentially the method of compositions proof of multivariate Berry--E…
▽ More
In this note, we provide a Berry--Esseen bounds for rectangles in high-dimensions when the random vectors have non-singular covariance matrices. Under this assumption of non-singularity, we prove an $n^{-1/2}$ scaling for the Berry--Esseen bound for sums of mean independent random vectors with a finite third moment. The proof is essentially the method of compositions proof of multivariate Berry--Esseen bound from Senatov (2011). Similar to other existing works (Kuchibhotla et al. 2018, Fang and Koike 2020a), this note considers the applicability and effectiveness of classical CLT proof techniques for the high-dimensional case.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Berry-Esseen Bounds for Projection Parameters and Partial Correlations with Increasing Dimension
Authors:
Arun Kumar Kuchibhotla,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard errors. Our results hold in the increasing dimension setting and under minimal assumptions on the data generating distribution. In particular, we do not assume a linear regression function and only require the existence of fini…
▽ More
We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard errors. Our results hold in the increasing dimension setting and under minimal assumptions on the data generating distribution. In particular, we do not assume a linear regression function and only require the existence of finitely many moments for the response and the covariates. Furthermore, we construct confidence sets for the projection parameters in the form of hyper-rectangles and establish finite sample bounds on their coverage and accuracy. We derive analogous results for partial correlations among the entries of sub-Gaussian vectors. \end{abstract}
△ Less
Submitted 22 October, 2021; v1 submitted 19 July, 2020;
originally announced July 2020.
-
A Note on Online Change Point Detection
Authors:
Yi Yu,
Oscar Hernan Madrid Padilla,
Daren Wang,
Alessandro Rinaldo
Abstract:
We investigate sequential change point estimation and detection in univariate nonparametric settings, where a stream of independent observations from sub-Gaussian distributions with a common variance factor and piecewise-constant but otherwise unknown means are collected. We develop a simple CUSUM-based methodology that provably control the probability of false alarms or the average run length whi…
▽ More
We investigate sequential change point estimation and detection in univariate nonparametric settings, where a stream of independent observations from sub-Gaussian distributions with a common variance factor and piecewise-constant but otherwise unknown means are collected. We develop a simple CUSUM-based methodology that provably control the probability of false alarms or the average run length while minimizing, in a minimax sense, the detection delay. We allow for all the model parameters to vary in order to capture a broad range of levels of statistical hardness for the problem at hand. We further show how our methodology is applicable to the case in which multiple change points are to be estimated sequentially.
△ Less
Submitted 13 November, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Nonparametric Estimation in the Dynamic Bradley-Terry Model
Authors:
Heejong Bong,
Wanshan Li,
Shamindra Shrotriya,
Alessandro Rinaldo
Abstract:
We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the exist…
▽ More
We propose a time-varying generalization of the Bradley-Terry model that allows for nonparametric modeling of dynamic global rankings of distinct teams. We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time and is applicable in sparse settings where the Bradley-Terry may not be fit. We obtain necessary and sufficient conditions for the existence and uniqueness of our estimator. We also derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting where the Bradley-Terry model is not necessarily the true data generating process. We thoroughly test the practical effectiveness of our model using both simulated and real world data and suggest an efficient data-driven approach for bandwidth tuning.
△ Less
Submitted 28 February, 2020;
originally announced March 2020.
-
On conditional versus marginal bias in multi-armed bandits
Authors:
Jaehyeok Shin,
Aaditya Ramdas,
Alessandro Rinaldo
Abstract:
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis that has recently received considerable attention in the literature. Existing results relate in precise ways the sign and magnitude of the bias to various sources of data adaptivity, but do not apply to the conditional inference setting in which the sample means are computed only if some…
▽ More
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis that has recently received considerable attention in the literature. Existing results relate in precise ways the sign and magnitude of the bias to various sources of data adaptivity, but do not apply to the conditional inference setting in which the sample means are computed only if some specific conditions are satisfied. In this paper, we characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean. Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy. We further demonstrate, through several examples from sequential testing and best arm identification, that the sign of the conditional and marginal bias of the sample mean of an arm can be different, depending on the conditioning event. Our analysis offers new and interesting perspectives on the subtleties of assessing the bias in data adaptive settings.
△ Less
Submitted 22 February, 2021; v1 submitted 19 February, 2020;
originally announced February 2020.
-
Optimal nonparametric multivariate change point detection and localization
Authors:
Oscar Hernan Madrid Padilla,
Yi Yu,
Daren Wang,
Alessandro Rinaldo
Abstract:
We study the multivariate nonparametric change point detection problem, where the data are a sequence of independent $p$-dimensional random vectors whose distributions are piecewise-constant with Lipschitz densities changing at unknown times, called change points. We quantify the size of the distributional change at any change point with the supremum norm of the difference between the correspondin…
▽ More
We study the multivariate nonparametric change point detection problem, where the data are a sequence of independent $p$-dimensional random vectors whose distributions are piecewise-constant with Lipschitz densities changing at unknown times, called change points. We quantify the size of the distributional change at any change point with the supremum norm of the difference between the corresponding densities. We are concerned with the localization task of estimating the positions of the change points. In our analysis, we allow for the model parameters to vary with the total number of time points, including the minimal spacing between consecutive change points and the magnitude of the smallest distributional change. We provide information-theoretic lower bounds on both the localization rate and the minimal signal-to-noise ratio required to guarantee consistent localization. We formulate a novel algorithm based on kernel density estimation that nearly achieves the minimax lower bound, save possibly for logarithm factors. We have provided extensive numerical evidence to support our theoretical findings.
△ Less
Submitted 25 June, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Localizing Changes in High-Dimensional Vector Autoregressive Processes
Authors:
Daren Wang,
Yi Yu,
Alessandro Rinaldo,
Rebecca Willett
Abstract:
Autoregressive models capture stochastic processes in which past realizations determine the generative distribution of new data; they arise naturally in a variety of industrial, biomedical, and financial settings. A key challenge when working with such data is to determine when the underlying generative model has changed, as this can offer insights into distinct operating regimes of the underlying…
▽ More
Autoregressive models capture stochastic processes in which past realizations determine the generative distribution of new data; they arise naturally in a variety of industrial, biomedical, and financial settings. A key challenge when working with such data is to determine when the underlying generative model has changed, as this can offer insights into distinct operating regimes of the underlying system. This paper describes a novel dynamic programming approach to localizing changes in high-dimensional autoregressive processes and associated error rates that improve upon the prior state of the art. When the model parameters are piecewise constant over time and the corresponding process is piecewise stable, the proposed dynamic programming algorithm consistently localizes change points even as the dimensionality, the sparsity of the coefficient matrices, the temporal spacing between two consecutive change points, and the magnitude of the difference of two consecutive coefficient matrices are allowed to vary with the sample size. Furthermore, the accuracy of initial, coarse change point localization estimates can be boosted via a computationally-efficient refinement algorithm that provably improves the localization error rate. Finally, a comprehensive simulation experiments and a real data analysis are provided to show the numerical superiority of our proposed methods.
△ Less
Submitted 29 July, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Authors:
Xiaoyi Gu,
Leman Akoglu,
Alessandro Rinaldo
Abstract:
Nearest-neighbor (NN) procedures are well studied and widely used in both supervised and unsupervised learning problems. In this paper we are concerned with investigating the performance of NN-based methods for anomaly detection. We first show through extensive simulations that NN methods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a set of ben…
▽ More
Nearest-neighbor (NN) procedures are well studied and widely used in both supervised and unsupervised learning problems. In this paper we are concerned with investigating the performance of NN-based methods for anomaly detection. We first show through extensive simulations that NN methods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a set of benchmark synthetic datasets. We further consider the performance of NN methods on real datasets, and relate it to the dimensionality of the problem. Next, we analyze the theoretical properties of NN-methods for anomaly detection by studying a more general quantity called distance-to-measure (DTM), originally developed in the literature on robust geometric and topological inference. We provide finite-sample uniform guarantees for the empirical DTM and use them to derive misclassification rates for anomalous observations under various settings. In our analysis we rely on Huber's contamination model and formulate mild geometric regularity assumptions on the underlying distribution of the data.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Are sample means in multi-armed bandits positively or negatively biased?
Authors:
Jaehyeok Shin,
Aaditya Ramdas,
Alessandro Rinaldo
Abstract:
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is typically not an unbiased estimator of its true mean. In this paper, we decouple three different sources of this selection bias: adaptive \emph{sampling} of arms, adaptive \emph{stopping} of the experiment, and adaptively \emph{choosing} which arm to study. Through a new notion called ``optimism'' that capt…
▽ More
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is typically not an unbiased estimator of its true mean. In this paper, we decouple three different sources of this selection bias: adaptive \emph{sampling} of arms, adaptive \emph{stopping} of the experiment, and adaptively \emph{choosing} which arm to study. Through a new notion called ``optimism'' that captures certain natural monotonic behaviors of algorithms, we provide a clean and unified analysis of how optimistic rules affect the sign of the bias. The main takeaway message is that optimistic sampling induces a negative bias, but optimistic stopping and optimistic choosing both induce a positive bias. These results are derived in a general stochastic MAB setup that is entirely agnostic to the final aim of the experiment (regret minimization or best-arm identification or anything else). We provide examples of optimistic rules of each type, demonstrate that simulations confirm our theoretical predictions, and pose some natural but hard open problems.
△ Less
Submitted 26 October, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Optimal nonparametric change point detection and localization
Authors:
Oscar Hernan Madrid Padilla,
Yi Yu,
Daren Wang,
Alessandro Rinaldo
Abstract:
We study change point detection and localization for univariate data in fully nonparametric settings in which, at each time point, we acquire an i.i.d. sample from an unknown distribution. We quantify the magnitude of the distributional changes at the change points using the Kolmogorov--Smirnov distance. We allow all the relevant parameters -- the minimal spacing between two consecutive change poi…
▽ More
We study change point detection and localization for univariate data in fully nonparametric settings in which, at each time point, we acquire an i.i.d. sample from an unknown distribution. We quantify the magnitude of the distributional changes at the change points using the Kolmogorov--Smirnov distance. We allow all the relevant parameters -- the minimal spacing between two consecutive change points, the minimal magnitude of the changes in the Kolmogorov--Smirnov distance, and the number of sample points collected at each time point -- to change with the length of time series. We generalize the renowned binary segmentation (e.g. Scott and Knott, 1974) algorithm and its variant, the wild binary segmentation of Fryzlewicz (2014), both originally designed for univariate mean change point detection problems, to our nonparametric settings and exhibit rates of consistency for both of them. In particular, we prove that the procedure based on wild binary segmentation is nearly minimax rate-optimal. We further demonstrate a phase transition in the space of model parameters that separates parameter combinations for which consistent localization is possible from the ones for which this task is statistical unfeasible. Finally, we provide extensive numerical experiments to support our theory. R code is available at https://github.com/hernanmp/NWBS.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Homotopy Reconstruction via the Cech Complex and the Vietoris-Rips Complex
Authors:
Jisu Kim,
Jaehyeok Shin,
Frédéric Chazal,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
We derive conditions under which the reconstruction of a target space is topologically correct via the Čech complex or the Vietoris-Rips complex obtained from possibly noisy point cloud data. We provide two novel theoretical results. First, we describe sufficient conditions under which any non-empty intersection of finitely many Euclidean balls intersected with a positive reach set is contractible…
▽ More
We derive conditions under which the reconstruction of a target space is topologically correct via the Čech complex or the Vietoris-Rips complex obtained from possibly noisy point cloud data. We provide two novel theoretical results. First, we describe sufficient conditions under which any non-empty intersection of finitely many Euclidean balls intersected with a positive reach set is contractible, so that the Nerve theorem applies for the restricted Čech complex. Second, we demonstrate the homotopy equivalence of a positive $μ$-reach set and its offsets. Applying these results to the restricted Čech complex and using the interleaving relations with the Čech complex (or the Vietoris-Rips complex), we formulate conditions guaranteeing that the target space is homotopy equivalent to the Čech complex (or the Vietoris-Rips complex), in terms of the $μ$-reach. Our results sharpen existing results.
△ Less
Submitted 12 May, 2020; v1 submitted 16 March, 2019;
originally announced March 2019.
-
On the bias, risk and consistency of sample means in multi-armed bandits
Authors:
Jaehyeok Shin,
Aaditya Ramdas,
Alessandro Rinaldo
Abstract:
The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squar…
▽ More
The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.
△ Less
Submitted 29 April, 2021; v1 submitted 2 February, 2019;
originally announced February 2019.
-
Univariate Mean Change Point Detection: Penalization, CUSUM and Optimality
Authors:
Daren Wang,
Yi Yu,
Alessandro Rinaldo
Abstract:
The problem of univariate mean change point detection and localization based on a sequence of $n$ independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound…
▽ More
The problem of univariate mean change point detection and localization based on a sequence of $n$ independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound $σ^2$ on the noise variance, the minimal spacing $Δ$ between two consecutive change points and the minimal magnitude $κ$ of the changes, are allowed to vary with $n$. We first show that consistent localization of the change points, when the signal-to-noise ratio $\frac{κ\sqrtΔ}σ < \sqrt{\log(n)}$, is impossible. In contrast, when $\frac{κ\sqrtΔ}σ$ diverges with $n$ at the rate of at least $\sqrt{\log(n)}$, we demonstrate that two computationally-efficient change point estimators, one based on the solution to an $\ell_0$-penalized least squares problem and the other on the popular wild binary segmentation algorithm, are both consistent and achieve a localization rate of the order $\frac{σ^2}{κ^2} \log(n)$. We further show that such rate is minimax optimal, up to a $\log(n)$ term.
△ Less
Submitted 6 June, 2019; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension
Authors:
Jisu Kim,
Jaehyeok Shin,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the kernel and the data generating distribution than previously used in the literature. We first propose a novel concept, called the volume dimension, to measure th…
▽ More
We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the kernel and the data generating distribution than previously used in the literature. We first propose a novel concept, called the volume dimension, to measure the intrinsic dimension of the support of a probability distribution based on the rates of decay of the probability of vanishing Euclidean balls. Our bounds depend on the volume dimension and generalize the existing bounds derived in the literature. In particular, when the data-generating distribution has a bounded Lebesgue density or is supported on a sufficiently well-behaved lower-dimensional manifold, our bound recovers the same convergence rate depending on the intrinsic dimension of the support as ones known in the literature. At the same time, our results apply to more general cases, such as the ones of distribution with unbounded densities or supported on a mixture of manifolds with different dimensions. Analogous bounds are derived for the derivative of the KDE, of any order. Our results are generally applicable but are especially useful for problems in geometric inference and topological data analysis, including level set estimation, density-based clustering, modal clustering and mode hunting, ridge estimation and persistent homology.
△ Less
Submitted 31 December, 2019; v1 submitted 13 October, 2018;
originally announced October 2018.
-
Markov Properties of Discrete Determinantal Point Processes
Authors:
Kayvan Sadeghi,
Alessandro Rinaldo
Abstract:
Determinantal point processes (DPPs) are probabilistic models for repulsion. When used to represent the occurrence of random subsets of a finite base set, DPPs allow to model global negative associations in a mathematically elegant and direct way. Discrete DPPs have become popular and computationally tractable models for solving several machine learning tasks that require the selection of diverse…
▽ More
Determinantal point processes (DPPs) are probabilistic models for repulsion. When used to represent the occurrence of random subsets of a finite base set, DPPs allow to model global negative associations in a mathematically elegant and direct way. Discrete DPPs have become popular and computationally tractable models for solving several machine learning tasks that require the selection of diverse objects, and have been successfully applied in numerous real-life problems. Despite their popularity, the statistical properties of such models have not been adequately explored. In this note, we derive the Markov properties of discrete DPPs and show how they can be expressed using graphical models.
△ Less
Submitted 27 January, 2019; v1 submitted 4 October, 2018;
originally announced October 2018.
-
Optimal Covariance Change Point Localization in High Dimension
Authors:
Daren Wang,
Yi Yu,
Alessandro Rinaldo
Abstract:
We study the problem of change point detection for covariance matrices in high dimensions. We assume that we observe a sequence {X_i}_{i=1,...,n} of independent and centered p-dimensional sub-Gaussian random vectors whose covariance matrices are piecewise constant. Our task is to recover with high accuracy the number and locations of the change points, which are assumed unknown. Our generic model…
▽ More
We study the problem of change point detection for covariance matrices in high dimensions. We assume that we observe a sequence {X_i}_{i=1,...,n} of independent and centered p-dimensional sub-Gaussian random vectors whose covariance matrices are piecewise constant. Our task is to recover with high accuracy the number and locations of the change points, which are assumed unknown. Our generic model setting allows for all the model parameters to change with n, including the dimension p, the minimal spacing between consecutive change points, the magnitude of smallest change size and the maximal Orlicz- 2 norm of the covariance matrices of the sample points. Without assuming any additional structural assumption, such as low rank matrices or having sparse principle components, we set up a general framework and a benchmark result for the covariance change point detection problem. We introduce two procedures, one based on the binary segmentation algorithm (e.g. Vostrikova, 1981) and the other on its extension known as wild binary segmentation of Fryzlewicz (2014), and demonstrate that, under suitable conditions, both procedures are able to consistently es- timate the number and locations of change points. Our second algorithm, called Wild Binary Segmentation through Independent Projection (WBSIP), is shown to be optimal in the sense of allowing for the minimax scaling in all the relevant parameters. Our minimax analysis reveals a phase transition effect based on the problem of change point localization. To the best of our knowledge, this type of results has not been established elsewhere in the high-dimensional change point detection literature.
△ Less
Submitted 21 August, 2018; v1 submitted 28 December, 2017;
originally announced December 2017.
-
On Exchangeability in Network Models
Authors:
Steffen L. Lauritzen,
Alessandro Rinaldo,
Kayvan Sadeghi
Abstract:
We derive representation theorems for exchangeable distributions on finite and infinite graphs using elementary arguments based on geometric and graph-theoretic concepts. Our results elucidate some of the key differences, and their implications, between statistical network models that are finitely exchangeable and models that define a consistent sequence of probability distributions on graphs of i…
▽ More
We derive representation theorems for exchangeable distributions on finite and infinite graphs using elementary arguments based on geometric and graph-theoretic concepts. Our results elucidate some of the key differences, and their implications, between statistical network models that are finitely exchangeable and models that define a consistent sequence of probability distributions on graphs of increasing size.
△ Less
Submitted 14 September, 2018; v1 submitted 12 September, 2017;
originally announced September 2017.
-
DBSCAN: Optimal Rates For Density Based Clustering
Authors:
Daren Wang,
Xinyang Lu,
Alessandro Rinaldo
Abstract:
We study the problem of optimal estimation of the density cluster tree under various assumptions on the underlying density. Building up from the seminal work of Chaudhuri et al. [2014], we formulate a new notion of clustering consistency which is better suited to smooth densities, and derive minimax rates of consistency for cluster tree estimation for Holder smooth densities of arbitrary degree α.…
▽ More
We study the problem of optimal estimation of the density cluster tree under various assumptions on the underlying density. Building up from the seminal work of Chaudhuri et al. [2014], we formulate a new notion of clustering consistency which is better suited to smooth densities, and derive minimax rates of consistency for cluster tree estimation for Holder smooth densities of arbitrary degree α. We present a computationally efficient, rate optimal cluster tree estimator based on a straightforward extension of the popular density-based clustering algorithm DBSCAN by Ester et al. [1996]. The procedure relies on a kernel density estimator with an appropriate choice of the kernel and bandwidth to produce a sequence of nested random geometric graphs whose connected components form a hierarchy of clusters. The resulting optimal rates for cluster tree estimation depend on the degree of smoothness of the underlying density and, interestingly, match minimax rates for density estimation under the supremum norm. Our results complement and extend the analysis of the DBSCAN algorithm in Sriperumbudur and Steinwart [2012]. Finally, we consider level set estimation and cluster consistency for densities with jump discontinuities, where the sizes of the jumps and the distance among clusters are allowed to vanish as the sample size increases. We demonstrate that our DBSCAN-based algorithm remains minimax rate optimal in this setting as well.
△ Less
Submitted 4 December, 2019; v1 submitted 9 June, 2017;
originally announced June 2017.
-
Estimating the Reach of a Manifold
Authors:
Eddie Aamari,
Jisu Kim,
Frédéric Chazal,
Bertrand Michel,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
Various problems in manifold estimation make use of a quantity called the reach, denoted by $τ\_M$, which is a measure of the regularity of the manifold. This paper is the first investigation into the problem of how to estimate the reach. First, we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. A…
▽ More
Various problems in manifold estimation make use of a quantity called the reach, denoted by $τ\_M$, which is a measure of the regularity of the manifold. This paper is the first investigation into the problem of how to estimate the reach. First, we study the geometry of the reach through an approximation perspective. We derive new geometric results on the reach for submanifolds without boundary. An estimator $\hatτ$ of $τ\_{M}$ is proposed in a framework where tangent spaces are known, and bounds assessing its efficiency are derived. In the case of i.i.d. random point cloud $\mathbb{X}\_{n}$, $\hatτ(\mathbb{X}\_{n})$ is showed to achieve uniform expected loss bounds over a $\mathcal{C}^3$-like model. Finally, we obtain upper and lower bounds on the minimax rate for estimating the reach.
△ Less
Submitted 8 April, 2019; v1 submitted 12 May, 2017;
originally announced May 2017.
-
Random Networks, Graphical Models, and Exchangeability
Authors:
Steffen Lauritzen,
Alessandro Rinaldo,
Kayvan Sadeghi
Abstract:
We study conditional independence relationships for random networks and their interplay with exchangeability. We show that, for finitely exchangeable network models, the empirical subgraph densities are maximum likelihood estimates of their theoretical counterparts. We then characterize all possible Markov structures for finitely exchangeable random graphs, thereby identifying a new class of Marko…
▽ More
We study conditional independence relationships for random networks and their interplay with exchangeability. We show that, for finitely exchangeable network models, the empirical subgraph densities are maximum likelihood estimates of their theoretical counterparts. We then characterize all possible Markov structures for finitely exchangeable random graphs, thereby identifying a new class of Markov network models corresponding to bidirected Kneser graphs. In particular, we demonstrate that the fundamental property of dissociatedness corresponds to a Markov property for exchangeable networks described by bidirected line graphs. Finally we study those exchangeable models that are also summarized in the sense that the probability of a network only depends onthe degree distribution, and identify a class of models that is dual to the Markov graphs of Frank and Strauss (1986). Particular emphasis is placed on studying consistency properties of network models under the process of forming subnetworks and we show that the only consistent systems of Markov properties correspond to the empty graph, the bidirected line graph of the complete graph, and the complete graph.
△ Less
Submitted 21 November, 2017; v1 submitted 29 January, 2017;
originally announced January 2017.
-
Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference
Authors:
Alessandro Rinaldo,
Larry Wasserman,
Max G'Sell,
Jing Lei
Abstract:
Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-free approach to inference and we establish results on…
▽ More
Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-free approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We show that an alternative, called the image bootstrap, has higher coverage accuracy at the cost of more computation. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. There is a inference-prediction tradeoff: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.
△ Less
Submitted 2 April, 2018; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Approximate Recovery in Changepoint Problems, from $\ell_2$ Estimation Error Rates
Authors:
Kevin Lin,
James Sharpnack,
Alessandro Rinaldo,
Ryan J. Tibshirani
Abstract:
In the 1-dimensional multiple changepoint detection problem, we prove that any procedure with a fast enough $\ell_2$ error rate, in terms of its estimation of the underlying piecewise constant mean vector, automatically has an (approximate) changepoint screening property---specifically, each true jump in the underlying mean vector has an estimated jump nearby. We also show, again assuming only kno…
▽ More
In the 1-dimensional multiple changepoint detection problem, we prove that any procedure with a fast enough $\ell_2$ error rate, in terms of its estimation of the underlying piecewise constant mean vector, automatically has an (approximate) changepoint screening property---specifically, each true jump in the underlying mean vector has an estimated jump nearby. We also show, again assuming only knowledge of the $\ell_2$ error rate, that a simple post-processing step can be used to eliminate spurious estimated changepoints, and thus delivers an (approximate) changepoint recovery property---specifically, in addition to the screening property described above, we are assured that each estimated jump has a true jump nearby. As a special case, we focus on the application of these results to the 1-dimensional fused lasso, i.e., 1-dimensional total variation denoising, and compare the implications with existing results from the literature. We also study extensions to related problems, such as changepoint detection over graphs.
△ Less
Submitted 2 December, 2016; v1 submitted 21 June, 2016;
originally announced June 2016.
-
Statistical Inference for Cluster Trees
Authors:
Jisu Kim,
Yen-Chi Chen,
Sivaraman Balakrishnan,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical clus…
▽ More
A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyze their properties and assess their suitability for inference. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.
△ Less
Submitted 12 February, 2017; v1 submitted 20 May, 2016;
originally announced May 2016.
-
Hierarchical Models for Independence Structures of Networks
Authors:
Kayvan Sadeghi,
Alessandro Rinaldo
Abstract:
We introduce a new family of network models, called hierarchical network models, that allow us to represent in an explicit manner the stochastic dependence among the dyads (random ties) of the network. In particular, each member of this family can be associated with a graphical model defining conditional independence clauses among the dyads of the network, called the dependency graph. Every networ…
▽ More
We introduce a new family of network models, called hierarchical network models, that allow us to represent in an explicit manner the stochastic dependence among the dyads (random ties) of the network. In particular, each member of this family can be associated with a graphical model defining conditional independence clauses among the dyads of the network, called the dependency graph. Every network model with dyadic independence assumption can be generalized to construct members of this new family. Using this new framework, we generalize the Erdös-Rényi and beta-models to create hierarchical Erdös-Rényi and beta-models. We describe various methods for parameter estimation as well as simulation studies for models with sparse dependency graphs.
△ Less
Submitted 25 November, 2019; v1 submitted 15 May, 2016;
originally announced May 2016.
-
Minimax Rates for Estimating the Dimension of a Manifold
Authors:
Jisu Kim,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
Many algorithms in machine learning and computational geometry require, as input, the intrinsic dimension of the manifold that supports the probability distribution of the data. This parameter is rarely known and therefore has to be estimated. We characterize the statistical difficulty of this problem by deriving upper and lower bounds on the minimax rate for estimating the dimension. First, we co…
▽ More
Many algorithms in machine learning and computational geometry require, as input, the intrinsic dimension of the manifold that supports the probability distribution of the data. This parameter is rarely known and therefore has to be estimated. We characterize the statistical difficulty of this problem by deriving upper and lower bounds on the minimax rate for estimating the dimension. First, we consider the problem of testing the hypothesis that the support of the data-generating probability distribution is a well-behaved manifold of intrinsic dimension $d_1$ versus the alternative that it is of dimension $d_2$, with $d_{1}<d_{2}$. With an i.i.d. sample of size $n$, we provide an upper bound on the probability of choosing the wrong dimension of $O\left( n^{-\left(d_{2}/d_{1}-1-ε\right)n} \right)$, where $ε$ is an arbitrarily small positive number. The proof is based on bounding the length of the traveling salesman path through the data points. We also demonstrate a lower bound of $Ω\left( n^{-(2d_{2}-2d_{1}+ε)n} \right)$, by applying Le Cam's lemma with a specific set of $d_{1}$-dimensional probability distributions. We then extend these results to get minimax rates for estimating the dimension of well-behaved manifolds. We obtain an upper bound of order $O \left( n^{-(\frac{1}{m-1}-ε)n} \right)$ and a lower bound of order $Ω\left( n^{-(2+ε)n} \right)$, where $m$ is the embedding dimension.
△ Less
Submitted 30 December, 2019; v1 submitted 3 May, 2016;
originally announced May 2016.
-
Distribution-Free Predictive Inference For Regression
Authors:
Jing Lei,
Max G'Sell,
Alessandro Rinaldo,
Ryan J. Tibshirani,
Larry Wasserman
Abstract:
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guarantee…
▽ More
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
△ Less
Submitted 8 March, 2017; v1 submitted 14 April, 2016;
originally announced April 2016.
-
On the Geometry and Extremal Properties of the Edge-Degeneracy Model
Authors:
Nicolas Kim,
Dane Wilburne,
Sonja Petrović,
Alessandro Rinaldo
Abstract:
The edge-degeneracy model is an exponential random graph model that uses the graph degeneracy, a measure of the graph's connection density, and number of edges in a graph as its sufficient statistics. We show this model is relatively well-behaved by studying the statistical degeneracy of this model through the geometry of the associated polytope.
The edge-degeneracy model is an exponential random graph model that uses the graph degeneracy, a measure of the graph's connection density, and number of edges in a graph as its sufficient statistics. We show this model is relatively well-behaved by studying the statistical degeneracy of this model through the geometry of the associated polytope.
△ Less
Submitted 16 September, 2016; v1 submitted 30 January, 2016;
originally announced February 2016.
-
Uniform Asymptotic Inference and the Bootstrap After Model Selection
Authors:
Ryan J. Tibshirani,
Alessandro Rinaldo,
Robert Tibshirani,
Larry Wasserman
Abstract:
Recently, Tibshirani et al. (2016) proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples n grows and the…
▽ More
Recently, Tibshirani et al. (2016) proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples n grows and the dimension d of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that is provably (asymptotically) conservative, and in practice, often delivers shorter intervals than those from the original normality-based approach. Finally, we prove that the test statistic of Tibshirani et al. (2016) does not enjoy uniform validity in a high-dimensional setting, when the dimension d is allowed grow.
△ Less
Submitted 9 August, 2017; v1 submitted 20 June, 2015;
originally announced June 2015.
-
Robust Topological Inference: Distance To a Measure and Kernel Distance
Authors:
Frédéric Chazal,
Brittany T. Fasy,
Fabrizio Lecci,
Bertrand Michel,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
Let P be a distribution with support S. The salient features of S can be quantified with persistent homology, which summarizes topological features of the sublevel sets of the distance function (the distance of any point x to S). Given a sample from P we can infer the persistent homology using an empirical version of the distance function. However, the empirical distance function is highly non-rob…
▽ More
Let P be a distribution with support S. The salient features of S can be quantified with persistent homology, which summarizes topological features of the sublevel sets of the distance function (the distance of any point x to S). Given a sample from P we can infer the persistent homology using an empirical version of the distance function. However, the empirical distance function is highly non-robust to noise and outliers. Even one outlier is deadly. The distance-to-a-measure (DTM), introduced by Chazal et al. (2011), and the kernel distance, introduced by Phillips et al. (2014), are smooth functions that provide useful topological information but are robust to noise and outliers. Chazal et al. (2014) derived concentration bounds for DTM. Building on these results, we derive limiting distributions and confidence sets, and we propose a method for choosing tuning parameters.
△ Less
Submitted 22 December, 2014;
originally announced December 2014.
-
Statistical Models for Degree Distributions of Networks
Authors:
Kayvan Sadeghi,
Alessandro Rinaldo
Abstract:
We define and study the statistical models in exponential family form whose sufficient statistics are the degree distributions and the bi-degree distributions of undirected labelled simple graphs. Graphs that are constrained by the joint degree distributions are called $dK$-graphs in the computer science literature and this paper attempts to provide the first statistically grounded analysis of thi…
▽ More
We define and study the statistical models in exponential family form whose sufficient statistics are the degree distributions and the bi-degree distributions of undirected labelled simple graphs. Graphs that are constrained by the joint degree distributions are called $dK$-graphs in the computer science literature and this paper attempts to provide the first statistically grounded analysis of this type of models. In addition to formalizing these models, we provide some preliminary results for the parameter estimation and the asymptotic behaviour of the model for degree distribution, and discuss the parameter estimation for the model for bi-degree distribution.
△ Less
Submitted 14 November, 2014;
originally announced November 2014.
-
$β$ models for random hypergraphs with a given degree sequence
Authors:
Despina Stasi,
Kayvan Sadeghi,
Alessandro Rinaldo,
Sonja Petrović,
Stephen E. Fienberg
Abstract:
We introduce the beta model for random hypergraphs in order to represent the occurrence of multi-way interactions among agents in a social network. This model builds upon and generalizes the well-studied beta model for random graphs, which instead only considers pairwise interactions. We provide two algorithms for fitting the model parameters, IPS (iterative proportional scaling) and fixed point a…
▽ More
We introduce the beta model for random hypergraphs in order to represent the occurrence of multi-way interactions among agents in a social network. This model builds upon and generalizes the well-studied beta model for random graphs, which instead only considers pairwise interactions. We provide two algorithms for fitting the model parameters, IPS (iterative proportional scaling) and fixed point algorithm, prove that both algorithms converge if maximum likelihood estimator (MLE) exists, and provide algorithmic and geometric ways of dealing the issue of MLE existence.
△ Less
Submitted 3 July, 2014;
originally announced July 2014.
-
Subsampling Methods for Persistent Homology
Authors:
Frédéric Chazal,
Brittany Terese Fasy,
Fabrizio Lecci,
Bertrand Michel,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples…
▽ More
Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples of the data and then combine the resulting estimates. We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity.
△ Less
Submitted 7 June, 2014;
originally announced June 2014.
-
Consistency of spectral clustering in stochastic block models
Authors:
Jing Lei,
Alessandro Rinaldo
Abstract:
We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynom…
▽ More
We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.
△ Less
Submitted 30 December, 2014; v1 submitted 6 December, 2013;
originally announced December 2013.
-
Stochastic Convergence of Persistence Landscapes and Silhouettes
Authors:
Frédéric Chazal,
Brittany Terese Fasy,
Fabrizio Lecci,
Alessandro Rinaldo,
Larry Wasserman
Abstract:
Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagra…
▽ More
Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagram into a well-behaved real-valued function. We investigate the statistical properties of landscapes, such as weak convergence of the average landscapes and convergence of the bootstrap. In addition, we introduce an alternate functional summary of persistent homology, which we call the silhouette, and derive an analogous statistical theory.
△ Less
Submitted 1 December, 2013;
originally announced December 2013.