-
Gaussian and Bootstrap Approximation for Matching-based Average Treatment Effect Estimators
Authors:
Zhaoyang Shi,
Chinmoy Bhattacharjee,
Krishnakumar Balasubramanian,
Wolfgang Polonik
Abstract:
We establish Gaussian approximation bounds for covariate and rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing these estimators through the lens of stabilization theory, we employ the Malliavin-Stein method to derive our results. Our bounds precisely quantify the impact of key problem parameters, including the number of matches and treatment balance, on the accuracy of th…
▽ More
We establish Gaussian approximation bounds for covariate and rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing these estimators through the lens of stabilization theory, we employ the Malliavin-Stein method to derive our results. Our bounds precisely quantify the impact of key problem parameters, including the number of matches and treatment balance, on the accuracy of the Gaussian approximation. Additionally, we develop multiplier bootstrap procedures to estimate the limiting distribution in a fully data-driven manner, and we leverage the derived Gaussian approximation results to further obtain bootstrap approximation bounds. Our work not only introduces a novel theoretical framework for commonly used ATE estimators, but also provides data-driven methods for constructing non-asymptotically valid confidence intervals.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
Multivariate Gaussian Approximation for Random Forest via Region-based Stabilization
Authors:
Zhaoyang Shi,
Chinmoy Bhattacharjee,
Krishnakumar Balasubramanian,
Wolfgang Polonik
Abstract:
We derive Gaussian approximation bounds for $k$-Potential Nearest Neighbor ($k$-PNN) based random forest predictions based on a set of training points given by a Poisson process under fairly mild regularity assumptions on the data generating process. Our approach is based on the key observation that $k$-PNN based random forest predictions satisfy a certain geometric property called region-based st…
▽ More
We derive Gaussian approximation bounds for $k$-Potential Nearest Neighbor ($k$-PNN) based random forest predictions based on a set of training points given by a Poisson process under fairly mild regularity assumptions on the data generating process. Our approach is based on the key observation that $k$-PNN based random forest predictions satisfy a certain geometric property called region-based stabilization. We also compare the rates with those of $k$-nearest neighbor-based random forests, highlighting a form of universality in our result. In the process of developing our results, we also establish a probabilistic result on multivariate Gaussian approximation bounds for general functionals of Poisson process that are region-based stabilizing. This general result makes use of the Malliavin-Stein method, and is potentially applicable to various related statistical problems.
△ Less
Submitted 2 May, 2025; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Nonsmooth Nonparametric Regression via Fractional Laplacian Eigenmaps
Authors:
Zhaoyang Shi,
Krishnakumar Balasubramanian,
Wolfgang Polonik
Abstract:
We develop nonparametric regression methods for the case when the true regression function is not necessarily smooth. More specifically, our approach is using the fractional Laplacian and is designed to handle the case when the true regression function lies in an $L_2$-fractional Sobolev space with order $s\in (0,1)$. This function class is a Hilbert space lying between the space of square-integra…
▽ More
We develop nonparametric regression methods for the case when the true regression function is not necessarily smooth. More specifically, our approach is using the fractional Laplacian and is designed to handle the case when the true regression function lies in an $L_2$-fractional Sobolev space with order $s\in (0,1)$. This function class is a Hilbert space lying between the space of square-integrable functions and the first-order Sobolev space consisting of differentiable functions. It contains fractional power functions, piecewise constant or polynomial functions and bump function as canonical examples. For the proposed approach, we prove upper bounds on the in-sample mean-squared estimation error of order $n^{-\frac{2s}{2s+d}}$, where $d$ is the dimension, $s$ is the aforementioned order parameter and $n$ is the number of observations. We also provide preliminary empirical results validating the practical performance of the developed estimators.
△ Less
Submitted 10 June, 2025; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Adaptive and non-adaptive minimax rates for weighted Laplacian-eigenmap based nonparametric regression
Authors:
Zhaoyang Shi,
Krishnakumar Balasubramanian,
Wolfgang Polonik
Abstract:
We show both adaptive and non-adaptive minimax rates of convergence for a family of weighted Laplacian-Eigenmap based nonparametric regression methods, when the true regression function belongs to a Sobolev space and the sampling density is bounded from above and below. The adaptation methodology is based on extensions of Lepski's method and is over both the smoothness parameter (…
▽ More
We show both adaptive and non-adaptive minimax rates of convergence for a family of weighted Laplacian-Eigenmap based nonparametric regression methods, when the true regression function belongs to a Sobolev space and the sampling density is bounded from above and below. The adaptation methodology is based on extensions of Lepski's method and is over both the smoothness parameter ($s\in\mathbb{N}_{+}$) and the norm parameter ($M>0$) determining the constraints on the Sobolev space. Our results extend the non-adaptive result in \cite{green2021minimax}, established for a specific normalized graph Laplacian, to a wide class of weighted Laplacian matrices used in practice, including the unnormalized Laplacian and random walk Laplacian.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
A Flexible Approach for Normal Approximation of Geometric and Topological Statistics
Authors:
Zhaoyang Shi,
Krishnakumar Balasubramanian,
Wolfgang Polonik
Abstract:
We derive normal approximation results for a class of stabilizing functionals of binomial or Poisson point process, that are not necessarily expressible as sums of certain score functions. Our approach is based on a flexible notion of the add-one cost operator, which helps one to deal with the second-order cost operator via suitably appropriate first-order operators. We combine this flexible notio…
▽ More
We derive normal approximation results for a class of stabilizing functionals of binomial or Poisson point process, that are not necessarily expressible as sums of certain score functions. Our approach is based on a flexible notion of the add-one cost operator, which helps one to deal with the second-order cost operator via suitably appropriate first-order operators. We combine this flexible notion with the theory of strong stabilization to establish our results. We illustrate the applicability of our results by establishing normal approximation results for certain geometric and topological statistics arising frequently in practice. Several existing results also emerge as special cases of our approach.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Antimodes and Graphical Anomaly Exploration via Adaptive Depth Quantile Functions
Authors:
Gabriel Chandler,
Wolfgang Polonik
Abstract:
This work proposes and investigates a novel method for anomaly detection and shows it to be competitive in a variety of Euclidean and non-Euclidean situations. It is based on an extension of the depth quantile functions (DQF) approach. The DQF approach encodes geometric information about a point cloud via functions of a single variable, whereas each observation in a data set is associated with a s…
▽ More
This work proposes and investigates a novel method for anomaly detection and shows it to be competitive in a variety of Euclidean and non-Euclidean situations. It is based on an extension of the depth quantile functions (DQF) approach. The DQF approach encodes geometric information about a point cloud via functions of a single variable, whereas each observation in a data set is associated with a single such function. Plotting these functions provides a very beneficial visualization aspect. This technique can be applied to any data lying in a Hilbert space.
The proposed anomaly detection approach is motivated by the geometric insight of the presence of anomalies in data being tied to the existence of antimodes in the data generating distribution. Coupling this insight with novel theoretical understanding into the shape of the DQFs gives rise to the proposed adaptive DQF (aDQF) methodology. Applications to various data sets illustrate the DQF and aDQF's strong anomaly detection performance, and the benefits of its visualization aspects.
△ Less
Submitted 17 July, 2023; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Algorithms for ridge estimation with convergence guarantees
Authors:
Wanli Qiao,
Wolfgang Polonik
Abstract:
The extraction of filamentary structure from a point cloud is discussed. The filaments are modeled as ridge lines or higher dimensional ridges of an underlying density. We propose two novel algorithms, and provide theoretical guarantees for their convergences, by which we mean that the algorithms can asymptotically recover the full ridge set. We consider the new algorithms as alternatives to the S…
▽ More
The extraction of filamentary structure from a point cloud is discussed. The filaments are modeled as ridge lines or higher dimensional ridges of an underlying density. We propose two novel algorithms, and provide theoretical guarantees for their convergences, by which we mean that the algorithms can asymptotically recover the full ridge set. We consider the new algorithms as alternatives to the Subspace Constrained Mean Shift (SCMS) algorithm for which no such theoretical guarantees are known.
△ Less
Submitted 31 December, 2024; v1 submitted 25 April, 2021;
originally announced April 2021.
-
Testing For Global Covariate Effects in Dynamic Interaction Event Networks
Authors:
Alexander Kreiss,
Enno Mammen,
Wolfgang Polonik
Abstract:
In statistical network analysis it is common to observe so called interaction data. Such data is characterized by actors forming the vertices and interacting along edges of the network, where edges are randomly formed and dissolved over the observation horizon. In addition covariates are observed and the goal is to model the impact of the covariates on the interactions. We distinguish two types of…
▽ More
In statistical network analysis it is common to observe so called interaction data. Such data is characterized by actors forming the vertices and interacting along edges of the network, where edges are randomly formed and dissolved over the observation horizon. In addition covariates are observed and the goal is to model the impact of the covariates on the interactions. We distinguish two types of covariates: global, system-wide covariates (i.e. covariates taking the same value for all individuals, such as seasonality) and local, dyadic covariates modeling interactions between two individuals in the network. Existing continuous time network models are extended to allow for comparing a completely parametric model and a model that is parametric only in the local covariates but has a global non-parametric time component. This allows, for instance, to test whether global time dynamics can be explained by simple global covariates like weather, seasonality etc. The procedure is applied to a bike-sharing network by using weather and weekdays as global covariates and distances between the bike stations as local covariates.
△ Less
Submitted 15 June, 2023; v1 submitted 26 March, 2021;
originally announced March 2021.
-
High Order Adjusted Block-wise Empirical Likelihood For Weakly Dependent Data
Authors:
Guangxing Wang,
Wolfgang Polonik
Abstract:
It is well known that the empirical likelihood ratio confidence region suffers finite sample under-coverage issue, and this severely hampers its application in statistical inferences.} The root cause of this under-coverage is an upper limit imposed by the convex hull of the estimating functions that is used in the construction of the profile empirical likelihood. For i.i.d data, various methods ha…
▽ More
It is well known that the empirical likelihood ratio confidence region suffers finite sample under-coverage issue, and this severely hampers its application in statistical inferences.} The root cause of this under-coverage is an upper limit imposed by the convex hull of the estimating functions that is used in the construction of the profile empirical likelihood. For i.i.d data, various methods have been proposed to solve this issue by modifying the convex hull, but it is not clear how well these methods perform when the data are no longer independent. In this paper, we propose an adjusted blockwise empirical likelihood that is designed for weakly dependent multivariate data. We show that our method not only preserves the much celebrated asymptotic $χ^2-$distribution, but also improves the finite sample coverage probability by removing the upper limit imposed by the convex hull. Further, we show that our method is also Bartlett correctable, thus is able to achieve high order asymptotic coverage accuracy.
△ Less
Submitted 13 August, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Autism Spectrum Disorder Classification using Graph Kernels on Multidimensional Time Series
Authors:
Rushil Anirudh,
Jayaraman J. Thiagarajan,
Irene Kim,
Wolfgang Polonik
Abstract:
We present an approach to model time series data from resting state fMRI for autism spectrum disorder (ASD) severity classification. We propose to adopt kernel machines and employ graph kernels that define a kernel dot product between two graphs. This enables us to take advantage of spatio-temporal information to capture the dynamics of the brain network, as opposed to aggregating them in the spat…
▽ More
We present an approach to model time series data from resting state fMRI for autism spectrum disorder (ASD) severity classification. We propose to adopt kernel machines and employ graph kernels that define a kernel dot product between two graphs. This enables us to take advantage of spatio-temporal information to capture the dynamics of the brain network, as opposed to aggregating them in the spatial or temporal dimension. In addition to the conventional similarity graphs, we explore the use of L1 graph using sparse coding, and the persistent homology of time delay embeddings, in the proposed pipeline for ASD classification. In our experiments on two datasets from the ABIDE collection, we demonstrate a consistent and significant advantage in using graph kernels over traditional linear or non linear kernels for a variety of time series features.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
Local Neighborhood Fusion in Locally Constant Gaussian Graphical Models
Authors:
Apratim Ganguly,
Wolfgang Polonik
Abstract:
In this paper we penetrate and extend the notion of local constancy in graphical models that has been introduced by Honorio et al. (2009). We propose Neighborhood-Fused Lasso, a method for model selection in high-dimensional graphical models, leveraging locality information. Our approach is based on an extension of the idea of node-wise regression (Meinshausen-Bühlmann, 2006) by adding a fusion pe…
▽ More
In this paper we penetrate and extend the notion of local constancy in graphical models that has been introduced by Honorio et al. (2009). We propose Neighborhood-Fused Lasso, a method for model selection in high-dimensional graphical models, leveraging locality information. Our approach is based on an extension of the idea of node-wise regression (Meinshausen-Bühlmann, 2006) by adding a fusion penalty. We propose a fast numerical algorithm for our approach, and provide theoretical and numerical evidence for the fact that our methodology outperforms related approaches that are ignoring the locality information. We further investigate the compatibility issues in our proposed methodology and derive bound for the quadratic prediction error and $l_1$-bounds on the estimated coefficients.
△ Less
Submitted 31 October, 2014;
originally announced October 2014.