Skip to main content

Showing 1–10 of 10 results for author: Chakrabortty, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2305.12789  [pdf, other

    stat.ME math.ST stat.ML

    The Decaying Missing-at-Random Framework: Model Doubly Robust Causal Inference with Partially Labeled Data

    Authors: Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic

    Abstract: In modern large-scale observational studies, data collection constraints often result in partially labeled datasets, posing challenges for reliable causal inference, especially due to potential labeling bias and relatively small size of the labeled data. This paper introduces a decaying missing-at-random (decaying MAR) framework and associated approaches for doubly robust causal inference on treat… ▽ More

    Submitted 21 April, 2025; v1 submitted 22 May, 2023; originally announced May 2023.

  2. arXiv:2201.10208  [pdf, other

    stat.ME math.ST stat.ML

    Semi-Supervised Quantile Estimation: Robust and Efficient Inference in High Dimensional Settings

    Authors: Abhishek Chakrabortty, Guorong Dai, Raymond J. Carroll

    Abstract: We consider quantile estimation in a semi-supervised setting, characterized by two available data sets: (i) a small or moderate sized labeled data set containing observations for a response and a set of possibly high dimensional covariates, and (ii) a much larger unlabeled data set where only the covariates are observed. We propose a family of semi-supervised estimators for the response quantile(s… ▽ More

    Submitted 14 August, 2024; v1 submitted 25 January, 2022; originally announced January 2022.

  3. arXiv:2201.00468  [pdf, other

    stat.ME math.ST stat.ML

    A General Framework for Treatment Effect Estimation in Semi-Supervised and High Dimensional Settings

    Authors: Abhishek Chakrabortty, Guorong Dai

    Abstract: In this article, we aim to provide a general and complete understanding of semi-supervised (SS) causal inference for treatment effects. Specifically, we consider two such estimands: (a) the average treatment effect and (b) the quantile treatment effect, as prototype cases, in an SS setting, characterized by two available data sets: (i) a labeled data set of size $n$, providing observations for a r… ▽ More

    Submitted 14 August, 2024; v1 submitted 2 January, 2022; originally announced January 2022.

  4. arXiv:2104.06667  [pdf, ps, other

    stat.ME math.ST stat.ML

    Double Robust Semi-Supervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying Overlap

    Authors: Yuqian Zhang, Abhishek Chakrabortty, Jelena Bradic

    Abstract: Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, L, the SS setting is characterized by an additional, much larger sized, unlabeled data, U. The setting of |U| >> |L|, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called "positivity" or "overlap" assumption. H… ▽ More

    Submitted 18 May, 2023; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: 88 pages; Revised version; Accepted by Information and Inference: A Journal of the IMA

    Journal ref: Information and Inference: A Journal of the IMA (2023), Vol. 12, No. 3, 2066-2159

  5. arXiv:1911.11345  [pdf, other

    stat.ME math.ST stat.ML

    High Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework

    Authors: Abhishek Chakrabortty, Jiarui Lu, T. Tony Cai, Hongzhe Li

    Abstract: We consider high dimensional $M$-estimation in settings where the response $Y$ is possibly missing at random and the covariates $\mathbf{X} \in \mathbb{R}^p$ can be high dimensional compared to the sample size $n$. The parameter of interest $\boldsymbolθ_0 \in \mathbb{R}^d$ is defined as the minimizer of the risk of a convex loss, under a fully non-parametric model, and $\boldsymbolθ_0$ itself is… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 34 pages, 4 tables; (Supplement: 58 pages, 10 tables);

  6. arXiv:1809.10652  [pdf, other

    stat.ME math.ST stat.ML

    Inference for Individual Mediation Effects and Interventional Effects in Sparse High-Dimensional Causal Graphical Models

    Authors: Abhishek Chakrabortty, Preetam Nandy, Hongzhe Li

    Abstract: We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on a response variable. While there has been significant research on this classical topic, little work has been done when the set of potential mediators is high-dimensional (HD). A further complication arises when these mediators are interrelated (with unknown dependencies). In part… ▽ More

    Submitted 28 July, 2021; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Revised version; 50 pages, 6 tables, 5 figures

    MSC Class: 62F12; 62H05; 62H10; 62J05; 92B15; 62A09

  7. arXiv:1804.02605  [pdf, other

    math.ST stat.ME stat.ML

    Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

    Authors: Arun Kumar Kuchibhotla, Abhishek Chakrabortty

    Abstract: Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much more general exponential type (namely sub-… ▽ More

    Submitted 9 May, 2022; v1 submitted 7 April, 2018; originally announced April 2018.

    Comments: 68 pages; Revised version; To appear in Information and Inference: A Journal of the IMA

    MSC Class: 60G50; 62J05; 60B20; 62J07; 62E17; 60F05; 60E15

    Journal ref: Information and Inference: A Journal of the IMA (2022), Vol. 11, No. 4, 1389-1456

  8. Estimating Average Treatment Effects with a Double-Index Propensity Score

    Authors: David Cheng, Abhishek Chakrabortty, Ashwin N. Ananthakrishnan, Tianxi Cai

    Abstract: We consider estimating average treatment effects (ATE) of a binary treatment in observational data when data-driven variable selection is needed to select relevant covariates from a moderately large number of available covariates $\mathbf{X}$. To leverage covariates among $\mathbf{X}$ predictive of the outcome for efficiency gain while using regularization to fit a parametric propensity score (PS)… ▽ More

    Submitted 25 October, 2020; v1 submitted 4 February, 2017; originally announced February 2017.

    Comments: 48 pages, 1 figure; Finalized revised version

    Journal ref: Biometrics 76 (2020) 767-777

  9. arXiv:1701.05230  [pdf, other

    stat.ME math.ST stat.ML

    Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes

    Authors: Abhishek Chakrabortty, Matey Neykov, Raymond Carroll, Tianxi Cai

    Abstract: We consider the recovery of regression coefficients, denoted by $\boldsymbolβ_0$, for a single index model (SIM) relating a binary outcome $Y$ to a set of possibly high dimensional covariates $\boldsymbol{X}$, based on a large but 'unlabeled' dataset $\mathcal{U}$, with $Y$ never observed. On $\mathcal{U}$, we fully observe $\boldsymbol{X}$ and additionally, a surrogate $S$ which, while not being… ▽ More

    Submitted 30 June, 2018; v1 submitted 18 January, 2017; originally announced January 2017.

    Comments: 50 pages, 3 tables, 1 figure

    MSC Class: 62J12; 62J07; 62H30; 62G32; 62F10; 62F30

  10. arXiv:1701.04889  [pdf, other

    stat.ME math.ST stat.ML

    Efficient and Adaptive Linear Regression in Semi-Supervised Settings

    Authors: Abhishek Chakrabortty, Tianxi Cai

    Abstract: We consider the linear regression problem under semi-supervised settings wherein the available data typically consists of: (i) a small or moderate sized 'labeled' data, and (ii) a much larger sized 'unlabeled' data. Such data arises naturally from settings where the outcome, unlike the covariates, is expensive to obtain, a frequent scenario in modern studies involving large databases like electron… ▽ More

    Submitted 19 August, 2017; v1 submitted 17 January, 2017; originally announced January 2017.

    Comments: 51 pages; Revised version - to appear in The Annals of Statistics

    MSC Class: 62F35; 62J05; 62F10; 62F12; 62G08

    Journal ref: The Annals of Statistics 2018, Vol. 46, No. 4, 1541-1572