Skip to main content

Showing 1–50 of 124 results for author: Xu, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.05771  [pdf, ps, other

    stat.ME

    Statistical methods for cost-effectiveness analysis of left-truncated censored survival data with treatment delays

    Authors: Polyna Khudyakov, Li Xu, Ce Yang, Donna Spiegelman, Molin Wang

    Abstract: The incremental cost-effectiveness ratio (ICER) and incremental net benefit (INB) are widely used for cost-effectiveness analysis. We develop methods for estimation and inference for the ICER and INB which use the semiparametric stratified Cox proportional hazard model, allowing for adjustment for risk factors. Since in public health settings, patients often begin treatment after they become eligi… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 24 pages, 4 figures, has Supplementary

  2. arXiv:2504.09253  [pdf, other

    stat.ME math.ST

    Statistical Inference for High-Dimensional Robust Linear Regression Models via Recursive Online-Score Estimation

    Authors: Dian Zheng, Lingzhou Xue

    Abstract: This paper introduces a novel framework for estimation and inference in penalized M-estimators applied to robust high-dimensional linear regression models. Traditional methods for high-dimensional statistical inference, which predominantly rely on convex likelihood-based approaches, struggle to address the nonconvexity inherent in penalized M-estimation with nonconvex objective functions. Our prop… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  3. arXiv:2504.05498  [pdf, other

    stat.ME

    Adaptive Design for Contour Estimation from Computer Experiments with Quantitative and Qualitative Inputs

    Authors: A. Shahrokhian, X. Deng, C. D. Lin, P. Ranjan, L. Xu

    Abstract: Computer experiments with quantitative and qualitative inputs are widely used to study many scientific and engineering processes. Much of the existing work has focused on design and modeling or process optimization for such experiments. This paper proposes an adaptive design approach for estimating a contour from computer experiments with quantitative and qualitative inputs. A new criterion is int… ▽ More

    Submitted 29 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  4. arXiv:2503.19304  [pdf, ps, other

    stat.ME

    Statistical Inference for High-dimensional Matrix-variate Factor Models with Missing Observations

    Authors: Yongxia Zhang, Jinwen Liang, Liwen Xu, Keming Yu, Maozai Tian

    Abstract: This paper develops an inferential theory for high-dimensional matrix-variate factor models with missing observations. We propose an easy-to-use all-purpose method that involves two straightforward steps. First, we perform principal component analysis on two re-weighted covariance matrices to obtain the row and column loadings. Second, we utilize these loadings along with the matrix-variate data t… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  5. arXiv:2503.16187  [pdf, other

    cs.LG stat.ML

    Manifold learning in metric spaces

    Authors: Liane Xu, Amit Singer

    Abstract: Laplacian-based methods are popular for dimensionality reduction of data lying in $\mathbb{R}^N$. Several theoretical results for these algorithms depend on the fact that the Euclidean distance approximates the geodesic distance on the underlying submanifold which the data are assumed to lie on. However, for some applications, other metrics, such as the Wasserstein distance, may provide a more app… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  6. arXiv:2502.02859  [pdf, other

    stat.ML cs.LG

    Gap-Dependent Bounds for Federated $Q$-learning

    Authors: Haochen Zhang, Zhong Zheng, Lingzhou Xue

    Abstract: We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  7. arXiv:2501.14919  [pdf, other

    stat.ME

    Clustering of functional data prone to complex heteroscedastic measurement error

    Authors: Andi Mai, Lan Xue, Roger Zoh, Carmen Tekwe

    Abstract: Several factors make clustering of functional data challenging, including the infinite-dimensional space to which observations belong and the lack of a defined probability density function for the functional random variable. To overcome these barriers, researchers either assume that observations belong to a finite-dimensional space spanned by basis functions or apply nonparametric smoothing method… ▽ More

    Submitted 31 January, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  8. arXiv:2501.08288  [pdf, other

    stat.ML cs.LG math.PR physics.data-an q-bio.QM

    Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

    Authors: Pedro Pessoa, Max Schweiger, Lance W. Q. Xu, Tristan Manha, Ayush Saurabh, Julian Antolin Camarena, Steve Pressé

    Abstract: Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For the $x=a+b$ example, $a$ can be fluorescence background and $b$ the signal of interest whose statistics are to be learned from the measure… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  9. arXiv:2412.08064  [pdf, other

    math.ST math.PR stat.ME stat.ML

    Statistical Convergence Rates of Optimal Transport Map Estimation between General Distributions

    Authors: Yizhe Ding, Runze Li, Lingzhou Xue

    Abstract: This paper studies the convergence rates of optimal transport (OT) map estimators, a topic of growing interest in statistics, machine learning, and various scientific fields. Despite recent advancements, existing results rely on regularity assumptions that are very restrictive in practice and much stricter than those in Brenier's Theorem, including the compactness and convexity of the probability… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 44 pages, 8 figures

    MSC Class: 62G20; 62G20; 26D10

  10. arXiv:2412.07987  [pdf, other

    stat.ME math.ST stat.ML

    Hypothesis Testing for High-Dimensional Matrix-Valued Data

    Authors: Shijie Cui, Danning Li, Runze Li, Lingzhou Xue

    Abstract: This paper addresses hypothesis testing for the mean of matrix-valued data in high-dimensional settings. We investigate the minimum discrepancy test, originally proposed by Cragg (1997), which serves as a rank test for lower-dimensional matrices. We evaluate the performance of this test as the matrix dimensions increase proportionally with the sample size, and identify its limitations when matrix… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  11. arXiv:2410.17297  [pdf, ps, other

    stat.ML cs.LG math.PR

    Error estimates between SGD with momentum and underdamped Langevin diffusion

    Authors: Arnaud Guillin, Yu Wang, Lihu Xu, Haoran Yang

    Abstract: Stochastic gradient descent with momentum is a popular variant of stochastic gradient descent, which has recently been reported to have a close relationship with the underdamped Langevin diffusion. In this paper, we establish a quantitative error estimate between them in the 1-Wasserstein and total variation distances.

    Submitted 22 October, 2024; originally announced October 2024.

  12. arXiv:2410.08934  [pdf, other

    stat.ML cs.DC cs.LG math.ST stat.CO

    The Effect of Personalization in FedProx: A Fine-grained Analysis on Statistical Accuracy and Communication Efficiency

    Authors: Xin Yu, Zelin He, Ying Sun, Lingzhou Xue, Runze Li

    Abstract: FedProx is a simple yet effective federated learning method that enables model personalization via regularization. Despite remarkable success in practice, a rigorous analysis of how such a regularization provably improves the statistical accuracy of each client's local model hasn't been fully established. Setting the regularization strength heuristically presents a risk, as an inappropriate choice… ▽ More

    Submitted 4 December, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  13. arXiv:2410.07574  [pdf, other

    stat.ML cs.LG

    Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

    Authors: Zhong Zheng, Haochen Zhang, Lingzhou Xue

    Abstract: We study the gap-dependent bounds of two important algorithms for on-policy Q-learning for finite-horizon episodic tabular Markov Decision Processes (MDPs): UCB-Advantage (Zhang et al. 2020) and Q-EarlySettled-Advantage (Li et al. 2021). UCB-Advantage and Q-EarlySettled-Advantage improve upon the results based on Hoeffding-type bonuses and achieve the almost optimal $\sqrt{T}$-type regret bound in… ▽ More

    Submitted 9 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

  14. arXiv:2409.01570  [pdf, other

    stat.ML cs.LG eess.SP math.ST stat.ME

    Smoothed Robust Phase Retrieval

    Authors: Zhong Zheng, Lingzhou Xue

    Abstract: The phase retrieval problem in the presence of noise aims to recover the signal vector of interest from a set of quadratic measurements with infrequent but arbitrary corruptions, and it plays an important role in many scientific applications. However, the essential geometric structure of the nonconvex robust phase retrieval based on the $\ell_1$-loss is largely unknown to study spurious local solu… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 32 pages, 8 figures

  15. arXiv:2407.15084  [pdf, other

    stat.ME stat.AP

    High-dimensional log contrast models with measurement errors

    Authors: Wenxi Tan, Lingzhou Xue, Songshan Yang, Xiang Zhan

    Abstract: High-dimensional compositional data are frequently encountered in many fields of modern scientific research. In regression analysis of compositional data, the presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component of the composition has an impact on others. To simultaneously add… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  16. arXiv:2406.11942  [pdf, other

    stat.ME stat.AP

    Clustering functional data with measurement errors: a simulation-based approach

    Authors: Tingyu Zhu, Lan Xue, Carmen Tekwe, Keith Diaz, Mark Benden, Roger Zoh

    Abstract: Clustering analysis of functional data, which comprises observations that evolve continuously over time or space, has gained increasing attention across various scientific disciplines. Practical applications often involve functional data that are contaminated with measurement errors arising from imprecise instruments, sampling errors, or other sources. These errors can significantly distort the in… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    MSC Class: 62

  17. arXiv:2406.04743  [pdf, other

    cs.LG cs.CR cs.DC stat.AP

    When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain

    Authors: Lei Xu, Yulong Chen, Yuntian Chen, Longfeng Nie, Xuetao Wei, Liang Xue, Dongxiao Zhang

    Abstract: Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2405.18795  [pdf, other

    stat.ML cs.LG

    Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

    Authors: Zhong Zheng, Haochen Zhang, Lingzhou Xue

    Abstract: In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication co… ▽ More

    Submitted 9 March, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.17734  [pdf, other

    cs.LG stat.AP

    Towards Efficient Disaster Response via Cost-effective Unbiased Class Rate Estimation through Neyman Allocation Stratified Sampling Active Learning

    Authors: Yanbing Bai, Xinyi Wu, Lai Xu, Jihan Pei, Erick Mas, Shunichi Koshimura

    Abstract: With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  20. arXiv:2405.02551  [pdf, other

    stat.ME math.ST stat.AP

    Power-Enhanced Two-Sample Mean Tests for High-Dimensional Compositional Data with Application to Microbiome Data Analysis

    Authors: Danning Li, Lingzhou Xue, Haoyi Yang, Xiufan Yu

    Abstract: Testing differences in mean vectors is a fundamental task in the analysis of high-dimensional compositional data. Existing methods may suffer from low power if the underlying signal pattern is in a situation that does not favor the deployed test. In this work, we develop two-sample power-enhanced mean tests for high-dimensional compositional data based on the combination of $p$-values, which integ… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 31 pages

  21. arXiv:2404.10063  [pdf, other

    stat.ME

    Adjusting for bias due to measurement error in functional quantile regression models with error-prone functional and scalar covariates

    Authors: Xiwei Chen, Yuanyuan Luan, Roger S. Zoh, Lan Xue, Sneha Jadhav, Carmen D. Tekwe

    Abstract: Wearable devices enable the continuous monitoring of physical activity (PA) but generate complex functional data with poorly characterized errors. Most work on functional data views the data as smooth, latent curves obtained at discrete time intervals with some random noise with mean zero and constant variance. Viewing this noise as homoscedastic and independent ignores potential serial correlatio… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2404.09353  [pdf, other

    stat.ME stat.AP stat.ML

    A Unified Combination Framework for Dependent Tests with Applications to Microbiome Association Studies

    Authors: Xiufan Yu, Linjun Zhang, Arun Srinivasan, Min-ge Xie, Lingzhou Xue

    Abstract: We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating $p$-values and also a more recent general method of combining confidence distributions, but makes generalizations t… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  23. arXiv:2404.06735  [pdf, other

    stat.ML cs.LG math.ST stat.AP stat.ME

    A Copula Graphical Model for Multi-Attribute Data using Optimal Transport

    Authors: Qi Zhang, Bing Li, Lingzhou Xue

    Abstract: Motivated by modern data forms such as images and multi-view data, the multi-attribute graphical model aims to explore the conditional independence structure among vectors. Under the Gaussian assumption, the conditional independence between vectors is characterized by blockwise zeros in the precision matrix. To relax the restrictive Gaussian assumption, in this paper, we introduce a novel semipara… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 37 pages

  24. arXiv:2402.04933  [pdf, other

    cs.LG stat.AP

    Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

    Authors: Biyonka Liang, Lily Xu, Aparna Taneja, Milind Tambe, Lucas Janson

    Abstract: Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requi… ▽ More

    Submitted 5 February, 2025; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 29 pages, 18 figures

  25. arXiv:2401.00461  [pdf, other

    stat.ME

    A Penalized Functional Linear Cox Regression Model for Spatially-defined Environmental Exposure with an Estimated Buffer Distance

    Authors: Jooyoung Lee, Zhibing He, Charlotte Roscoe, Peter James, Li Xu, Donna Spiegelman, David Zucker, Molin Wang

    Abstract: In environmental health research, it is of interest to understand the effect of the neighborhood environment on health. Researchers have shown a protective association between green space around a person's residential address and depression outcomes. In measuring exposure to green space, distance buffers are often used. However, buffer distances differ across studies. Typically, the buffer distanc… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: 27 pages, 5 figures

  26. arXiv:2312.15023  [pdf, other

    cs.LG stat.ML

    Federated Q-Learning: Linear Regret Speedup with Low Communication Cost

    Authors: Zhong Zheng, Fengyu Gao, Lingzhou Xue, Jing Yang

    Abstract: In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample com… ▽ More

    Submitted 7 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 51 pages

  27. arXiv:2312.08324  [pdf, other

    stat.AP

    Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data

    Authors: Bencong Zhu, Guanyu Hu, Yang Xie, Lin Xu, Xiaodan Fan, Qiwei Li

    Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  28. arXiv:2311.08661  [pdf, other

    stat.ML cs.CV cs.LG eess.IV

    Deep Neural Network Identification of Limnonectes Species and New Class Detection Using Image Data

    Authors: Li Xu, Yili Hong, Eric P. Smith, David S. McLeod, Xinwei Deng, Laura J. Freeman

    Abstract: As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequately resolve. One such challenge is presented by… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 26 pages, 11 Figures

  29. arXiv:2310.19273  [pdf, other

    cs.LG cs.AI stat.ML

    The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

    Authors: Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of… ▽ More

    Submitted 16 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  30. arXiv:2310.07817  [pdf, other

    stat.ME math.ST

    Nonlinear global Fréchet regression for random objects via weak conditional expectation

    Authors: Satarupa Bhattacharjee, Bing Li, Lingzhou Xue

    Abstract: Random objects are complex non-Euclidean data taking value in general metric space, possibly devoid of any underlying vector space structure. Such data are getting increasingly abundant with the rapid advancement in technology. Examples include probability distributions, positive semi-definite matrices, and data on Riemannian manifolds. However, except for regression for object-valued response wit… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    MSC Class: 62G05; 62J02; 62G08; 62J99

  31. arXiv:2308.04585  [pdf, other

    stat.ML cs.LG

    Kernel Single Proxy Control for Deterministic Confounding

    Authors: Liyuan Xu, Arthur Gretton

    Abstract: We consider the problem of causal effect estimation with an unobserved confounder, where we observe a single proxy variable that is associated with the confounder. Although it has been shown that the recovery of an average causal effect is impossible in general from a single proxy variable, we show that causal recovery is possible if the outcome is generated deterministically. This generalizes exi… ▽ More

    Submitted 18 March, 2025; v1 submitted 8 August, 2023; originally announced August 2023.

  32. arXiv:2305.12809  [pdf, other

    cs.LG cs.AI stat.ML

    Relabeling Minimal Training Subset to Flip a Prediction

    Authors: Jinghan Yang, Linjie Xu, Lequan Yu

    Abstract: When facing an unsatisfactory prediction from a machine learning model, users can be interested in investigating the underlying reasons and exploring the potential for reversing the outcome. We ask: To flip the prediction on a test point $x_t$, how to identify the smallest training subset $\mathcal{S}_t$ that we need to relabel? We propose an efficient algorithm to identify and relabel such a subs… ▽ More

    Submitted 3 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  33. arXiv:2304.12522  [pdf, other

    math.OC cs.LG eess.SP stat.CO stat.ML

    A New Inexact Proximal Linear Algorithm with Adaptive Stopping Criteria for Robust Phase Retrieval

    Authors: Zhong Zheng, Shiqian Ma, Lingzhou Xue

    Abstract: This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and re… ▽ More

    Submitted 8 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 23 pages

  34. arXiv:2304.02651  [pdf, other

    stat.ME

    Generalized functional linear regression models with a mixture of complex function-valued and scalar-valued covariates prone to measurement error

    Authors: Yuanyuan Luan, Roger S. Zoh, Sneha Jadhav, Lan Xue, Carmen D. Tekwe

    Abstract: While extensive work has been done to correct for biases due to measurement error in scalar-valued covariates prone to errors in generalized linear regression models, limited work has been done to address biases associated with functional covariates prone to errors or the combination of scalar and functional covariates prone to errors in these models. We propose Simulation Extrapolation (SIMEX) an… ▽ More

    Submitted 12 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  35. arXiv:2302.06075  [pdf, other

    stat.ME cs.LG stat.AP stat.ML stat.OT

    A Graphical Point Process Framework for Understanding Removal Effects in Multi-Touch Attribution

    Authors: Jun Tao, Qian Chen, James W. Snyder Jr., Arava Sai Kumar, Amirhossein Meisami, Lingzhou Xue

    Abstract: Marketers employ various online advertising channels to reach customers, and they are particularly interested in attribution for measuring the degree to which individual touchpoints contribute to an eventual conversion. The availability of individual customer-level path-to-purchase data and the increasing number of online marketing channels and types of touchpoints bring new challenges to this fun… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 38 pages, 10 figures

  36. arXiv:2212.14194  [pdf, ps, other

    math.ST stat.CO stat.ME stat.ML

    Theoretical Guarantees for Sparse Principal Component Analysis based on the Elastic Net

    Authors: Teng Zhang, Haoyi Yang, Lingzhou Xue

    Abstract: Sparse principal component analysis (SPCA) is widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) are still unknown. This paper aims to address this critical gap. We firs… ▽ More

    Submitted 27 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 60 pages

  37. arXiv:2212.13741  [pdf, other

    stat.ML cs.LG math.ST

    Distribution Estimation of Contaminated Data via DNN-based MoM-GANs

    Authors: Fang Xie, Lihu Xu, Qiuran Yao, Huiming Zhang

    Abstract: This paper studies the distribution estimation of contaminated data by the MoM-GAN method, which combines generative adversarial net (GAN) and median-of-mean (MoM) estimation. We use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. Theoretically, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator measured by in… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  38. arXiv:2210.06610  [pdf, other

    cs.LG stat.ME

    A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

    Authors: Liyuan Xu, Arthur Gretton

    Abstract: We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment. The goal in both cases is to recover the treatment effect without having an access to a hidden confounder. This objective is attained by first estimating the conditional mean of the desired outcome variable given relevant covariates (the "first stage" regr… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  39. arXiv:2210.00025  [pdf, other

    cs.LG stat.ML

    Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

    Authors: Siddhartha Banerjee, Sean R. Sinclair, Milind Tambe, Lily Xu, Christina Lee Yu

    Abstract: Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data… ▽ More

    Submitted 19 March, 2025; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 55 pages (30 pages main paper), 9 figures

  40. arXiv:2209.13526  [pdf, other

    stat.AP

    Hypothesis Testing for Detecting Outlier Evaluators

    Authors: Li Xu, Molin Wang

    Abstract: In epidemiological studies, very often, evaluators obtain measurements of disease outcomes for study participants. In this paper, we propose a two-stage procedure for detecting outlier evaluators. In the first stage, a regression model is fitted to obtain the evaluators' effects. The outlier evaluators are considered as those with different effects compared with the normal evaluators. In the secon… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  41. arXiv:2207.04613  [pdf, other

    stat.ME math.ST stat.ML

    Nonlinear Sufficient Dimension Reduction for Distribution-on-Distribution Regression

    Authors: Qi Zhang, Bing Li, Lingzhou Xue

    Abstract: We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional… ▽ More

    Submitted 24 April, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 36 pages

  42. arXiv:2205.09879  [pdf, other

    stat.AP stat.CO

    Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

    Authors: Li Xu, Yili Hong, Max D. Morris, Kirk W. Cameron

    Abstract: Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performanc… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 31 pages, 10 figures

  43. arXiv:2202.04208  [pdf, other

    stat.ME cs.LG econ.EM

    Validating Causal Inference Methods

    Authors: Harsh Parikh, Carlos Varjao, Louise Xu, Eric Tchetgen Tchetgen

    Abstract: The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based… ▽ More

    Submitted 29 July, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: 5 figures, 13 pages

    Journal ref: PMLR 162:17346-17358, 2022

  44. arXiv:2202.02474  [pdf, other

    stat.ML cs.LG

    Importance Weighting Approach in Kernel Bayes' Rule

    Authors: Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton

    Abstract: We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected kernel posterior features, based on regression from learned neural net or kernel features of the observations. All quantities involved in the Bayesian update are learned from observed data, making the method entirely model-free. The resulting algorithm i… ▽ More

    Submitted 10 August, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  45. arXiv:2201.09766  [pdf, other

    stat.AP

    Design Strategies and Approximation Methods for High-Performance Computing Variability Management

    Authors: Yueyao Wang, Li Xu, Yili Hong, Rong Pan, Tyler Chang, Thomas Lux, Jon Bernard, Layne Watson, Kirk Cameron

    Abstract: Performance variability management is an active research area in high-performance computing (HPC). We focus on input/output (I/O) variability. To study the performance variability, computer scientists often use grid-based designs (GBDs) to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models could be biased particul… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 29 pages, 6 figures

  46. arXiv:2201.03182  [pdf, other

    stat.ML cs.LG math.ST

    Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance Assumption

    Authors: Lihu Xu, Fang Yao, Qiuran Yao, Huiming Zhang

    Abstract: There has been a surge of interest in developing robust estimators for models with heavy-tailed and bounded variance data in statistics and machine learning, while few works impose unbounded variance. This paper proposes two type of robust estimators, the ridge log-truncated M-estimator and the elastic net log-truncated M-estimator. The first estimator is applied to convex regressions such as quan… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: 44 pages

  47. arXiv:2112.14674  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    An additive graphical model for discrete data

    Authors: Jun Tao, Bing Li, Lingzhou Xue

    Abstract: We introduce a nonparametric graphical model for discrete node variables based on additive conditional independence. Additive conditional independence is a three way statistical relation that shares similar properties with conditional independence by satisfying the semi-graphoid axioms. Based on this relation we build an additive graphical model for discrete variables that does not suffer from the… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 33 pages

  48. arXiv:2111.05391  [pdf, ps, other

    cs.SE cs.AI stat.AP

    Statistical Perspectives on Reliability of Artificial Intelligence Systems

    Authors: Yili Hong, Jiayi Lian, Li Xu, Jie Min, Yueyao Wang, Laura J. Freeman, Xinwei Deng

    Abstract: Artificial intelligence (AI) systems have become increasingly popular in many areas. Nevertheless, AI technologies are still in their developing stages, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that the AI systems can be used with confidence by the general public. In this paper, we provide statistical perspectives on the reliabili… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: 40 pages

  49. arXiv:2111.03950  [pdf, other

    stat.ME cs.LG econ.EM stat.ML

    Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves

    Authors: Rahul Singh, Liyuan Xu, Arthur Gretton

    Abstract: We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression. By embedding Pearl's mediation formula and Robins' g-formula with kernels, we allow treatments, mediators, and covariates to be continuous in general spaces, and also allow for nonlinear treatment-confounder feedback. Our key innovation is a reproducing kernel Hilbert spa… ▽ More

    Submitted 16 March, 2025; v1 submitted 6 November, 2021; originally announced November 2021.

    Comments: Material in this draft previously appeared in a working paper presented at the 2020 NeurIPS Workshop on ML for Economic Policy (arXiv:2010.04855v1). We have divided the original working paper (arXiv:2010.04855v1) into two projects: one paper focusing on time-fixed settings (arXiv:2010.04855) and this paper focusing on time-varying settings

  50. arXiv:2110.00467  [pdf, other

    stat.ME math.ST stat.ML

    Dimension Reduction for Fréchet Regression

    Authors: Qi Zhang, Lingzhou Xue, Bing Li

    Abstract: With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fréchet regression model (Peterson & Müller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method f… ▽ More

    Submitted 6 December, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 36 pages