Skip to main content

Showing 1–50 of 106 results for author: Zhao, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.09860  [pdf, ps, other

    stat.ME math.ST stat.AP stat.CO

    Robust and Computationally Efficient Trimmed L-Moments Estimation for Parametric Distributions

    Authors: Chudamani Poudyal, Qian Zhao, Hari Sitaula

    Abstract: This paper proposes a robust and computationally efficient estimation framework for fitting parametric distributions based on trimmed L-moments. Trimmed L-moments extend classical L-moment theory by downweighting or excluding extreme order statistics, resulting in estimators that are less sensitive to outliers and heavy tails. We construct estimators for both location-scale and shape parameters us… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2503.09565  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization

    Authors: Zixiang Chen, Greg Yang, Qingyue Zhao, Quanquan Gu

    Abstract: Despite deep neural networks' powerful representation learning capabilities, theoretical understanding of how networks can simultaneously achieve meaningful feature learning and global convergence remains elusive. Existing approaches like the neural tangent kernel (NTK) are limited because features stay close to their initialization in this parametrization, leaving open questions about feature pro… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 29 pages, 5 figures, 2 tables

  3. arXiv:2502.06051  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

    Authors: Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Tong Zhang, Quanquan Gu

    Abstract: Although many popular reinforcement learning algorithms are underpinned by $f$-divergence regularization, their sample complexity with respect to the \emph{regularized objective} still lacks a tight characterization. In this paper, we analyze $f$-divergence-regularized offline policy learning. For reverse Kullback-Leibler (KL) divergence, arguably the most commonly used one, we give the first… ▽ More

    Submitted 30 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: 38 pages

  4. arXiv:2501.07046  [pdf, ps, other

    stat.ML cs.LG

    Differentially Private Kernelized Contextual Bandits

    Authors: Nikola Pavlovic, Sudeep Salgia, Qing Zhao

    Abstract: We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space (RKHS). We study this problem under the additional constraint of joint differential privacy, where the agents needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  5. arXiv:2501.03222  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Characterizing the Accuracy-Communication-Privacy Trade-off in Distributed Stochastic Convex Optimization

    Authors: Sudeep Salgia, Nikola Pavlovic, Yuejie Chi, Qing Zhao

    Abstract: We consider the problem of differentially private stochastic convex optimization (DP-SCO) in a distributed setting with $M$ clients, where each of them has a local dataset of $N$ i.i.d. data samples from an underlying data distribution. The objective is to design an algorithm to minimize a convex population loss using a collaborative effort across $M$ clients, while ensuring the privacy of the loc… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  6. arXiv:2501.00854  [pdf, other

    stat.ME cs.LG

    A Graphical Approach to State Variable Selection in Off-policy Learning

    Authors: Joakim Blach Andersen, Qingyuan Zhao

    Abstract: Sequential decision problems are widely studied across many areas of science. A key challenge when learning policies from historical data - a practice commonly referred to as off-policy learning - is how to ``identify'' the impact of a policy of interest when the observed data are not randomized. Off-policy learning has mainly been studied in two settings: dynamic treatment regimes (DTRs), where t… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 25 pages (not including appendix and references), 10 figures, 2 tables

  7. arXiv:2412.09431  [pdf, other

    q-bio.PE q-bio.QM stat.ME

    Explicit modeling of density dependence in spatial capture-recapture models

    Authors: Qing Zhao, Yunyi Shen

    Abstract: Density dependence occurs at the individual level but is often evaluated at the population level, leading to difficulties or even controversies in detecting such a process. Bayesian individual-based models such as spatial capture-recapture (SCR) models provide opportunities to study density dependence at the individual level, but such an approach remains to be developed and evaluated. In this stud… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  8. arXiv:2412.03321  [pdf, other

    cs.LG stat.ML

    Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis

    Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao

    Abstract: Tensor decompositions play a crucial role in numerous applications related to multi-way data analysis. By employing a Bayesian framework with sparsity-inducing priors, Bayesian Tensor Ring (BTR) factorization offers probabilistic estimates and an effective approach for automatically adapting the tensor ring rank during the learning process. However, previous BTR method employs an Automatic Relevan… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: ICONIP 2023

  9. arXiv:2411.01625  [pdf, other

    stat.ML cs.AI cs.LG

    Counterfactual explainability of black-box prediction models

    Authors: Zijun Gao, Qingyuan Zhao

    Abstract: It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual ex… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

    Comments: 19 pages, 3 figures

  10. arXiv:2409.02397  [pdf, other

    stat.ME stat.AP

    High-dimensional Bayesian Model for Disease-Specific Gene Detection in Spatial Transcriptomics

    Authors: Qicheng Zhao, Qihuang Zhang

    Abstract: Identifying disease-indicative genes is critical for deciphering disease mechanisms and has attracted significant interest in biomedical research. Spatial transcriptomics offers unprecedented insights for the detection of disease-specific genes by enabling within-tissue contrasts. However, this new technology poses challenges for conventional statistical models developed for RNA-sequencing, as the… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 23 Pages

  11. arXiv:2406.19531  [pdf, other

    stat.ML cs.LG

    Off-policy Evaluation with Deeply-abstracted States

    Authors: Meiling Hao, Pingfan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi

    Abstract: Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally designed for policy learning -- in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abs… ▽ More

    Submitted 3 March, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 56 pages, 5 figures

    ACM Class: G.3; I.2.6; G.1.2

  12. arXiv:2405.16219  [pdf, other

    cs.LG stat.ML

    Deep Causal Generative Models with Property Control

    Authors: Qilong Zhao, Shiyu Wang, Guangji Bai, Bo Pan, Zhaohui Qin, Liang Zhao

    Abstract: Generating data with properties of interest by external users while following the right causation among its intrinsic factors is important yet has not been well addressed jointly. This is due to the long-lasting challenge of jointly identifying key latent variables, their causal relations, and their correlation with properties of interest, as well as how to leverage their discoveries toward causal… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  13. arXiv:2405.07026  [pdf, other

    stat.ME

    Selective Randomization Inference for Adaptive Experiments

    Authors: Tobias Freidling, Qingyuan Zhao, Zijun Gao

    Abstract: Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are not pre-specified, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific ad… ▽ More

    Submitted 26 October, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

  14. arXiv:2405.02373  [pdf, other

    math.OC cs.LG stat.ML

    Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia

    Abstract: This paper studies an online optimal resource reservation problem in communication networks with job transfers where the goal is to minimize the reservation cost while maintaining the blocking cost under a certain budget limit. To tackle this problem, we propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints. We then analyze the perform… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.15558

  15. arXiv:2403.06942  [pdf, other

    eess.SY cs.LG stat.ML

    Grid Monitoring with Synchro-Waveform and AI Foundation Model Technologies

    Authors: Lang Tong, Xinyi Wang, Qing Zhao

    Abstract: Purpose:This article advocates for the development of a next-generation grid monitoring and control system designed for future grids dominated by inverter-based resources. Leveraging recent progress in generative artificial intelligence (AI), machine learning, and networking technology, we develop a physics-based AI foundation model with high-resolution synchro-waveform measurement technology to e… ▽ More

    Submitted 25 January, 2025; v1 submitted 11 March, 2024; originally announced March 2024.

  16. arXiv:2402.13870  [pdf, ps, other

    cs.LG eess.SP stat.AP

    Generative Probabilistic Time Series Forecasting and Applications in Grid Operations

    Authors: Xinyi Wang, Lang Tong, Qing Zhao

    Abstract: Generative probabilistic forecasting produces future time series samples according to the conditional probability distribution given past time series observations. Such techniques are essential in risk-based decision-making and planning under uncertainty with broad applications in grid operations, including electricity price forecasting, risk-based economic dispatch, and stochastic optimizations.… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted at CISS 2024. arXiv admin note: text overlap with arXiv:2306.03782

  17. arXiv:2402.13182  [pdf, other

    cs.LG cs.DC stat.ML

    Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

    Authors: Nikola Pavlovic, Sudeep Salgia, Qing Zhao

    Abstract: We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggrega… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  18. arXiv:2401.17518  [pdf, other

    stat.ME math.ST

    Model Uncertainty and Selection of Risk Models for Left-Truncated and Right-Censored Loss Data

    Authors: Qian Zhao, Sahadeb Upretee, Daoping Yu

    Abstract: Insurance loss data are usually in the form of left-truncation and right-censoring due to deductibles and policy limits respectively. This paper investigates the model uncertainty and selection procedure when various parametric models are constructed to accommodate such left-truncated and right-censored data. The joint asymptotic properties of the estimators have been established using the Delta m… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: Risks, 2023, 11(11),188

  19. arXiv:2401.16651  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    A constructive approach to selective risk control

    Authors: Zijun Gao, Wenjie Hu, Qingyuan Zhao

    Abstract: Many modern applications require using data to select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the fixed point iteration of the Benjamini-Yekutieli (BY) pro… ▽ More

    Submitted 8 November, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 9 figures, 2 tables

  20. arXiv:2401.07711  [pdf, other

    cs.LG stat.ML

    Efficient Nonparametric Tensor Decomposition for Binary and Count Data

    Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao

    Abstract: In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-l… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: AAAI-24

  21. arXiv:2311.10023  [pdf, other

    stat.ML cs.LG

    Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia

    Abstract: We tackle in this paper an online network resource allocation problem with job transfers. The network is composed of many servers connected by communication links. The system operates in discrete time; at each time slot, the administrator reserves resources at servers for future job requests, and a cost is incurred for the reservations made. Then, after receptions, the jobs may be transferred betw… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  22. arXiv:2310.15351  [pdf, other

    cs.LG stat.ML

    Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  23. arXiv:2310.07838  [pdf, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

    Authors: Qingyue Zhao, Banghua Zhu

    Abstract: We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the… ▽ More

    Submitted 14 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 41 pages, 2 figures; Appendix polished

  24. arXiv:2309.06053  [pdf, ps, other

    stat.ME math.ST

    Confounder selection via iterative graph expansion

    Authors: F. Richard Guo, Qingyuan Zhao

    Abstract: Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confou… ▽ More

    Submitted 24 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 29 pages; added link to Shiny web app

  25. arXiv:2308.00950  [pdf, other

    stat.ME

    Beta-trees: Multivariate histograms with confidence statements

    Authors: Guenther Walther, Qian Zhao

    Abstract: Multivariate histograms are difficult to construct due to the curse of dimensionality. Motivated by $k$-d trees in computer science, we show how to construct an efficient data-adaptive partition of Euclidean space that possesses the following two properties: With high confidence the distribution from which the data are generated is close to uniform on each rectangle of the partition; and despite t… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    MSC Class: 62G15

  26. arXiv:2306.09507  [pdf, ps, other

    stat.AP math.ST stat.CO stat.ME

    Credibility Theory Based on Winsorizing

    Authors: Qian Zhao, Chudamani Poudyal

    Abstract: The classical Bühlmann credibility model has been widely applied to premium estimation for group insurance contracts and other insurance types. In this paper, we develop a robust Bühlmann credibility model using the winsorized version of loss data, also known as the winsorized mean (a robust alternative to the traditional individual mean). This approach assumes that the observed sample data come f… ▽ More

    Submitted 22 July, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Journal ref: European Actuarial Journal, 2024

  27. arXiv:2305.15558  [pdf, other

    math.OC cs.LG stat.ML

    Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Shima Kheradmand

    Abstract: In this paper, we study an optimal online resource reservation problem in a simple communication network. The network is composed of two compute nodes linked by a local communication link. The system operates in discrete time; at each time slot, the administrator reserves resources for servers before the actual job requests are known. A cost is incurred for the reservations made. Then, after the c… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  28. arXiv:2305.08359  [pdf, other

    cs.LG math.OC stat.ML

    Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

    Authors: Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

    Abstract: Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it remains an open question that if such results can be carried over to adversarial RL, where the reward is adversarially chosen at each episode. In this paper, we… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 34 pages

  29. arXiv:2303.01552  [pdf, other

    stat.ME math.ST stat.AP

    Simultaneous Hypothesis Testing Using Internal Negative Controls with An Application to Proteomics

    Authors: Zijun Gao, Qingyuan Zhao

    Abstract: Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (''negative result'') is expected. Motivated by a real proteomic dataset, we will present three promising and closely connected methods of using negative controls to assist simultaneous hypothesis testing. The first method uses negative controls to construct a permutation p-v… ▽ More

    Submitted 19 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 41 pages, 10 figures, 3 tables

  30. arXiv:2301.00040  [pdf, other

    stat.ME

    Optimization-based Sensitivity Analysis for Unmeasured Confounding using Partial Correlations

    Authors: Tobias Freidling, Qingyuan Zhao

    Abstract: Causal inference necessarily relies upon untestable assumptions; hence, it is crucial to assess the robustness of obtained results to violations of identification assumptions. However, such sensitivity analysis is only occasionally undertaken in practice, as many existing methods require analytically tractable solutions and their results are often difficult to interpret. We take a more flexible ap… ▽ More

    Submitted 16 May, 2025; v1 submitted 30 December, 2022; originally announced January 2023.

  31. arXiv:2211.08637  [pdf, other

    stat.OT

    Near-peer mentoring in data science: Two experiences at Stanford University

    Authors: Chiara Sabatti, Qian Zhao

    Abstract: Universities have been expanding the data science programs for undergraduate students, with the simultaneous goal of reaching and retaining students from underrepresented groups in the data science workforce. The set of new programs also offer opportunities to involve graduate students, fostering their growth as future leaders in data science education. We describe two programs that use the near p… ▽ More

    Submitted 8 June, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

  32. arXiv:2211.04697  [pdf, other

    stat.ME math.ST

    $L^{\infty}$- and $L^2$-sensitivity analysis for causal inference with unmeasured confounding

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: Sensitivity analysis for the unconfoundedness assumption is crucial in observational studies. For this purpose, the marginal sensitivity model (MSM) gained popularity recently due to its good interpretability and mathematical properties. However, as a quantification of confounding strength, the $L^{\infty}$-bound it puts on the logit difference between the observed and full data propensity scores… ▽ More

    Submitted 24 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 57 pages, 3 figures, 2 tables

  33. arXiv:2210.13358  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Novelty Detection in Time Series via Weak Innovations Representation: A Deep Learning Approach

    Authors: Xinyi Wang, Mei-jen Lee, Qing Zhao, Lang Tong

    Abstract: We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of all past samples of the time series. A novelty detection algorithm is developed for the online detection of novel changes in the probability structure in the in… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  34. arXiv:2210.09026  [pdf, other

    cs.LG stat.ML

    WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

    Authors: Xi Chen, Tianyu Shi, Qingpeng Zhao, Yuchen Sun, Yunfei Gao, Xiangjun Wang

    Abstract: Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments such as Arcade Learning Environment, MuJoCo, and ViZDoom. However, they are hardly extensible to more complicated problems, mainly due to the lack of complexity and variations in the environments they are trained and tested on. Furthermore, they are not extensible t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  35. arXiv:2210.04360  [pdf, other

    stat.ME

    A unified analysis of regression adjustment in randomized experiments

    Authors: Katarzyna Reluga, Ting Ye, Qingyuan Zhao

    Abstract: Regression adjustment is broadly applied in randomized trials under the premise that it usually improves the precision of a treatment effect estimator. However, previous work has shown that this is not always true. To further understand this phenomenon, we develop a unified comparison of the asymptotic variance of a class of linear regression-adjusted estimators. Our analysis is based on the class… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: 17 pages, 1 figure, 2 tables

    MSC Class: 62F10; 62J99 ACM Class: G.3

  36. arXiv:2209.06620  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

    Authors: Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

    Abstract: Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a d… ▽ More

    Submitted 27 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: First two authors contribute equally

  37. arXiv:2208.14035  [pdf, other

    stat.ME

    Almost exact Mendelian randomization

    Authors: Matthew J Tudball, George Davey Smith, Qingyuan Zhao

    Abstract: Mendelian randomization (MR) is a natural experimental design based on the random transmission of genes from parents to offspring. However, this inferential basis is typically only implicit or used as an informal justification. As parent-offspring data becomes more widely available, we advocate a different approach to MR that is exactly based on this natural randomization, thereby formalizing the… ▽ More

    Submitted 18 April, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 41 pages, 10 figures

    MSC Class: 62D20; 62G10; 62P10

  38. arXiv:2208.13871  [pdf, ps, other

    stat.ME math.ST

    Confounder Selection: Objectives and Approaches

    Authors: F. Richard Guo, Anton Rask Lundborg, Qingyuan Zhao

    Abstract: Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection… ▽ More

    Submitted 24 September, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 15 pages

  39. arXiv:2208.08944  [pdf, other

    stat.ME

    An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

    Authors: Qian Zhao, Emmanuel J. Candes

    Abstract: Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions.… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

  40. arXiv:2207.07948  [pdf, other

    stat.ML cs.LG

    Collaborative Learning in Kernel-based Bandits for Distributed Users

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We study collaborative learning among distributed clients facilitated by a central server. Each client is interested in maximizing a personalized objective function that is a weighted sum of its local objective and a global objective. Each client has direct access to random bandit feedback on its local objective, but only has a partial view of the global objective and relies on information exchang… ▽ More

    Submitted 17 April, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

  41. arXiv:2206.00099  [pdf, other

    stat.ML cs.LG

    Provably and Practically Efficient Neural Contextual Bandits

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider the neural contextual bandit problem. In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  42. arXiv:2204.02477  [pdf, ps, other

    stat.ME

    Method of Winsorized Moments for Robust Fitting of Truncated and Censored Lognormal Distributions

    Authors: Chudamani Poudyal, Qian Zhao, Vytaras Brazauskas

    Abstract: When constructing parametric models to predict the cost of future claims, several important details have to be taken into account: (i) models should be designed to accommodate deductibles, policy limits, and coinsurance factors, (ii) parameters should be estimated robustly to control the influence of outliers on model predictions, and (iii) all point predictions should be augmented with estimates… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: 35 pages, 4 figures, etc

  43. What is a randomization test?

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: The meaning of randomization tests has become obscure in statistics education and practice over the last century. This article makes a fresh attempt at rectifying this core concept of statistics. A new term -- "quasi-randomization test" -- is introduced to define significance tests based on theoretical models and distinguish these tests from the "randomization tests" based on the physical act of r… ▽ More

    Submitted 4 April, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 45 pages, 2 figures. Accepted for publication in the Journal of American Statistical Association on 26th March, 2023. arXiv admin note: substantial text overlap with arXiv:2104.10618

    MSC Class: 62G10; 62B15

  44. arXiv:2203.08857  [pdf, other

    stat.ML cs.AI cs.LG

    Noisy Tensor Completion via Low-rank Tensor Ring

    Authors: Yuning Qiu, Guoxu Zhou, Qibin Zhao, Shengli Xie

    Abstract: Tensor completion is a fundamental tool for incomplete data analysis, where the goal is to predict missing entries from partial observations. However, existing methods often make the explicit or implicit assumption that the observed entries are noise-free to provide a theoretical guarantee of exact recovery of missing entries, which is quite restrictive in practice. To remedy such drawbacks, this… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  45. arXiv:2110.13391  [pdf

    stat.AP

    Analyzing the Data of COVID-19 with Quasi-Distribution Fitting Based on Piecewise B-spline Curves

    Authors: Qingliang Zhao, Zhenhuan Lu, Yiduo Wang

    Abstract: Facing the world wide coronavirus disease 2019 (COVID-19) pandemic, a new fitting method (QDF, quasi-distribution fitting) which could be used to analyze the data of COVID-19 is developed based on piecewise quasi-uniform B-spline curves. For any given country or district, it simulates the distribution histogram data which is made from the daily confirmed cases (or the other data including daily re… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  46. arXiv:2104.10618  [pdf, other

    math.ST stat.ME

    Multiple conditional randomization tests for lagged and spillover treatment effects

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: We consider the problem of constructing multiple independent conditional randomization tests using a single dataset. Because the tests are independent, the randomization p-values can be interpreted individually and combined using standard methods for multiple testing. We give a simple, sequential construction of such tests, and then discuss its application to three problems: Rosenbaum's evidence f… ▽ More

    Submitted 11 October, 2024; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: 43 pages, 7 figures; Part of the original version of this paper can be found at arXiv:2203.10980; To appear in Biometrika

    MSC Class: 62G10; 62B15

  47. arXiv:2101.11552  [pdf, other

    cs.LG stat.ML

    Efficient Graph Deep Learning in TensorFlow with tf_geometric

    Authors: Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, Changsheng Xu

    Abstract: We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1.x and 2.x. tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs. The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framewor… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: 7 pages, 5 figures

  48. arXiv:2011.14047  [pdf, other

    cs.LG stat.ML

    Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding

    Authors: Cesar F. Caiafa, Ziyao Wang, Jordi Solé-Casals, Qibin Zhao

    Abstract: In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier,… ▽ More

    Submitted 17 April, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: 11 pages, 7 figures, paper accepted for presentation at L2ID Workshop at CVPR 2021 (19-25 June, 2021)

  49. arXiv:2010.13997  [pdf, other

    stat.ML cs.LG

    A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider sequential optimization of an unknown function in a reproducing kernel Hilbert space. We propose a Gaussian process-based algorithm and establish its order-optimal regret performance (up to a poly-logarithmic factor). This is the first GP-based algorithm with an order-optimal regret guarantee. The proposed algorithm is rooted in the methodology of domain shrinking realized through a se… ▽ More

    Submitted 29 October, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to NeurIPS 2021

  50. arXiv:2009.11828  [pdf, other

    stat.ME

    Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials

    Authors: Ting Ye, Jun Shao, Yanyao Yi, Qingyuan Zhao

    Abstract: In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better p… ▽ More

    Submitted 13 July, 2021; v1 submitted 24 September, 2020; originally announced September 2020.