-
GMM with Many Weak Moment Conditions and Nuisance Parameters: General Theory and Applications to Causal Inference
Authors:
Rui Wang,
Kwun Chuen Gary Chan,
Ting Ye
Abstract:
Weak identification is a common issue for many statistical problems -- for example, when instrumental variables are weakly correlated with treatment, or when proxy variables are weakly correlated with unmeasured confounders. Under weak identification, standard estimation methods, such as the generalized method of moments (GMM), can have sizeable bias in finite samples or even asymptotically. In ad…
▽ More
Weak identification is a common issue for many statistical problems -- for example, when instrumental variables are weakly correlated with treatment, or when proxy variables are weakly correlated with unmeasured confounders. Under weak identification, standard estimation methods, such as the generalized method of moments (GMM), can have sizeable bias in finite samples or even asymptotically. In addition, many practical settings involve a growing number of nuisance parameters, adding further complexity to the problem. In this paper, we study estimation and inference under a general nonlinear moment model with many weak moment conditions and many nuisance parameters. To obtain debiased inference for finite-dimensional target parameters, we demonstrate that Neyman orthogonality plays a stronger role than in conventional settings with strong identification. We study a general two-step debiasing estimator that allows for possibly nonparametric first-step estimation of nuisance parameters, and we establish its consistency and asymptotic normality under a many weak moment asymptotic regime. Our theory accommodates both high-dimensional moment conditions and function-valued nuisance parameters. We provide high-level assumptions for a general setting and discuss specific applications to the problems of estimation and inference with weak instruments and weak proxies.
△ Less
Submitted 25 June, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
Perfect Clustering in Nonuniform Hypergraphs
Authors:
Ga-Ming Angus Chan,
Zachary Lubberts
Abstract:
While there has been tremendous activity in the area of statistical network inference on graphs, hypergraphs have not enjoyed the same attention, on account of their relative complexity and the lack of tractable statistical models. We introduce a hyper-edge-centric model for analyzing hypergraphs, called the interaction hypergraph, which models natural sampling methods for hypergraphs in neuroscie…
▽ More
While there has been tremendous activity in the area of statistical network inference on graphs, hypergraphs have not enjoyed the same attention, on account of their relative complexity and the lack of tractable statistical models. We introduce a hyper-edge-centric model for analyzing hypergraphs, called the interaction hypergraph, which models natural sampling methods for hypergraphs in neuroscience and communication networks, and accommodates interactions involving different numbers of entities. We define latent embeddings for the interactions in such a network, and analyze their estimators. In particular, we show that a spectral estimate of the interaction latent positions can achieve perfect clustering once enough interactions are observed.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
From Distributional Robustness to Robust Statistics: A Confidence Sets Perspective
Authors:
Gabriel Chan,
Bart Van Parys,
Amine Bennouna
Abstract:
We establish a connection between distributionally robust optimization (DRO) and classical robust statistics. We demonstrate that this connection arises naturally in the context of estimation under data corruption, where the goal is to construct ``minimal'' confidence sets for the unknown data-generating distribution. Specifically, we show that a DRO ambiguity set, based on the Kullback-Leibler di…
▽ More
We establish a connection between distributionally robust optimization (DRO) and classical robust statistics. We demonstrate that this connection arises naturally in the context of estimation under data corruption, where the goal is to construct ``minimal'' confidence sets for the unknown data-generating distribution. Specifically, we show that a DRO ambiguity set, based on the Kullback-Leibler divergence and total variation distance, is uniformly minimal, meaning it represents the smallest confidence set that contains the unknown distribution with at a given confidence power. Moreover, we prove that when parametric assumptions are imposed on the unknown distribution, the ambiguity set is never larger than a confidence set based on the optimal estimator proposed by Huber. This insight reveals that the commonly observed conservatism of DRO formulations is not intrinsic to these formulations themselves but rather stems from the non-parametric framework in which these formulations are employed.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations
Authors:
Parikshit Solunke,
Vitoria Guardieiro,
Joao Rulff,
Peter Xenopoulos,
Gromit Yeuk-Yin Chan,
Brian Barr,
Luis Gustavo Nonato,
Claudio Silva
Abstract:
With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compar…
▽ More
With the increasing use of black-box Machine Learning (ML) techniques in critical applications, there is a growing demand for methods that can provide transparency and accountability for model predictions. As a result, a large number of local explainability methods for black-box models have been developed and popularized. However, machine learning explanations are still hard to evaluate and compare due to the high dimensionality, heterogeneous representations, varying scales, and stochastic nature of some of these methods. Topological Data Analysis (TDA) can be an effective method in this domain since it can be used to transform attributions into uniform graph representations, providing a common ground for comparison across different explanation methods.
We present a novel topology-driven visual analytics tool, Mountaineer, that allows ML practitioners to interactively analyze and compare these representations by linking the topological graphs back to the original data distribution, model predictions, and feature attributions. Mountaineer facilitates rapid and iterative exploration of ML explanations, enabling experts to gain deeper insights into the explanation techniques, understand the underlying data distributions, and thus reach well-founded conclusions about model behavior. Furthermore, we demonstrate the utility of Mountaineer through two case studies using real-world data. In the first, we show how Mountaineer enabled us to compare black-box ML explanations and discern regions of and causes of disagreements between different explanations. In the second, we demonstrate how the tool can be used to compare and understand ML models themselves. Finally, we conducted interviews with three industry experts to help us evaluate our work.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
The Instrumental Variable Model with Categorical Instrument, Treatment and Outcome: Characterization, Partial Identification, and Statistical Inference
Authors:
Yilin Song,
F. Richard Guo,
K. C. Gary Chan,
Thomas S. Richardson
Abstract:
Instrumental variable (IV) analysis is a crucial tool in estimating causal relationships by addressing the issue of confounding variables that may bias the results. Among other work on IV models with binary exposure and outcomes, Richardson and Robins (2014) studied the instrumental variable model with binary exposure (X) and binary outcome (Y) with an instrument (Z) that takes Q states where Q>=2…
▽ More
Instrumental variable (IV) analysis is a crucial tool in estimating causal relationships by addressing the issue of confounding variables that may bias the results. Among other work on IV models with binary exposure and outcomes, Richardson and Robins (2014) studied the instrumental variable model with binary exposure (X) and binary outcome (Y) with an instrument (Z) that takes Q states where Q>=2. However, IV models beyond binary X and Y have been less explored. In this work, we consider the instrumental variable model with categorical X, Y, Z taking values in {1, ..., K}, {1, ..., M}, and {1, ..., Q} respectively. We first give a simple closed-form characterization of the set of joint distributions of the potential outcomes P(Y(x=1), ..., Y(x=K)) compatible with a given observed probability distribution P(X, Y | Z). We further show the bounds we derived are necessary, sufficient, and non-redundant, and they hold under various versions of the independence assumptions that have been discussed in the literature. We also provide how a confidence region of any convex function of the joint counterfactual probability including the average causal effect (ATE) can be computed using an algorithm proposed by Guo and Richardson (2021) which is based on a new tail bound for the KL-divergence. We implement our bounds and provide practical recommendations through a real data example of a cash-incentive smoking cessation program.
△ Less
Submitted 20 November, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Flexible Functional Treatment Effect Estimation
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Xiaoke Zhang,
Kwun Chuen Gary Chan
Abstract:
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight…
▽ More
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weights are constructed by directly minimizing the uniform balancing error resulting from a decomposition of the WMKRR estimator, instead of being estimated under a particular treatment selection model. Despite the complex structure of the uniform balancing error derived under WMKRR, finite-dimensional convex algorithms can be applied to efficiently solve for the proposed weights thanks to a representer theorem. The optimal convergence rate is shown to be attainable by the proposed WMKRR estimator without any smoothness assumption on the true weight function. Corresponding empirical performance is demonstrated by a simulation study and a real data application.
△ Less
Submitted 12 November, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Likelihood-based Spacings Goodness-of-Fit Statistics for Univariate Shape-constrained Densities
Authors:
Kwun Chuen Gary Chan,
Hok Kan Ling,
Chuan-Fa Tang,
Sheung Chi Phillip Yam
Abstract:
A variety of statistics based on sample spacings has been studied in the literature for testing goodness-of-fit to parametric distributions. To test the goodness-of-fit to a nonparametric class of univariate shape-constrained densities, including widely studied classes such as k-monotone and log-concave densities, a likelihood ratio test with a working alternative density estimate based on the spa…
▽ More
A variety of statistics based on sample spacings has been studied in the literature for testing goodness-of-fit to parametric distributions. To test the goodness-of-fit to a nonparametric class of univariate shape-constrained densities, including widely studied classes such as k-monotone and log-concave densities, a likelihood ratio test with a working alternative density estimate based on the spacings of the observations is considered, and is shown to be asymptotically normal and distribution-free under the null, consistent under fixed alternatives, and admits bootstrap calibration. The distribution-freeness under the null comes from the fact that the asymptotic dominant term depends only on a function of the spacings of transformed outcomes that are uniformly distributed. Applications and extensions of theoretical results in the literature of shape-constrained estimation are required to show that the average log-density ratio converges to zero at a faster rate than the sample spacing term under the null, and diverges under the alternatives. Numerical studies are conducted to demonstrate that the test is applicable to various classes of shape-constrained densities and has a good balance between type-I error control under the null and power under alternative distributions.
△ Less
Submitted 25 October, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Arithmetic circuit tensor networks, multivariable function representation, and high-dimensional integration
Authors:
Ruojing Peng,
Johnnie Gray,
Garnet Kin-Lic Chan
Abstract:
Many computational problems can be formulated in terms of high-dimensional functions. Simple representations of such functions and resulting computations with them typically suffer from the "curse of dimensionality", an exponential cost dependence on dimension. Tensor networks provide a way to represent certain classes of high-dimensional functions with polynomial memory. This results in computati…
▽ More
Many computational problems can be formulated in terms of high-dimensional functions. Simple representations of such functions and resulting computations with them typically suffer from the "curse of dimensionality", an exponential cost dependence on dimension. Tensor networks provide a way to represent certain classes of high-dimensional functions with polynomial memory. This results in computations where the exponential cost is ameliorated or in some cases, removed, if the tensor network representation can be obtained. Here, we introduce a direct mapping from the arithmetic circuit of a function to arithmetic circuit tensor networks, avoiding the need to perform any optimization or functional fit. We demonstrate the power of the circuit construction in examples of multivariable integration on the unit hypercube in up to 50 dimensions, where the complexity of integration can be understood from the circuit structure. We find very favorable cost scaling compared to quasi-Monte-Carlo integration for these cases, and further give an example where efficient quasi-Monte-Carlo cannot be theoretically performed without knowledge of the underlying tensor network circuit structure.
△ Less
Submitted 16 August, 2022;
originally announced September 2022.
-
Matrix Completion with Model-free Weighting
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Xiaojun Mao,
Kwun Chuen Gary Chan
Abstract:
In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based…
▽ More
In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based on the proposed weighted empirical risk enjoys appealing theoretical guarantees. In particular, the proposed method achieves a stronger guarantee than existing work in terms of the scaling with respect to the observation probabilities, under asymptotically heterogeneous missing settings (where entry-wise observation probabilities can be of different orders). These settings can be regarded as a better theoretical model of missing patterns with highly varying probabilities. We also provide a new minimax lower bound under a class of heterogeneous settings. Numerical experiments are also provided to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Estimation of fractal dimension for a class of Non-Gaussian stationary processes and fields
Authors:
Grace Chan,
Andrew T. A. Wood
Abstract:
We present the asymptotic distribution theory for a class of increment-based estimators of the fractal dimension of a random field of the form g{X(t)}, where g:R\to R is an unknown smooth function and X(t) is a real-valued stationary Gaussian field on R^d, d=1 or 2, whose covariance function obeys a power law at the origin. The relevant theoretical framework here is ``fixed domain'' (or ``infill…
▽ More
We present the asymptotic distribution theory for a class of increment-based estimators of the fractal dimension of a random field of the form g{X(t)}, where g:R\to R is an unknown smooth function and X(t) is a real-valued stationary Gaussian field on R^d, d=1 or 2, whose covariance function obeys a power law at the origin. The relevant theoretical framework here is ``fixed domain'' (or ``infill'') asymptotics. Surprisingly, the limit theory in this non-Gaussian case is somewhat richer than in the Gaussian case (the latter is recovered when g is affine), in part because estimators of the type considered may have an asymptotic variance which is random in the limit. Broadly, when g is smooth and nonaffine, three types of limit distributions can arise, types (i), (ii) and (iii), say. Each type can be represented as a random integral. More specifically, type (i) can be represented as the integral of a certain random function with respect to Lebesgue measure; type (ii) can be represented as the integral of a second random function
△ Less
Submitted 25 June, 2004;
originally announced June 2004.