Skip to main content

Showing 1–34 of 34 results for author: Zhang, A R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.09358  [pdf, ps, other

    stat.ME

    Functional Tensor Regression

    Authors: Tongyu Li, Fang Yao, Anru R. Zhang

    Abstract: Tensor regression has attracted significant attention in statistical research. This study tackles the challenge of handling covariates with smooth varying structures. We introduce a novel framework, termed functional tensor regression, which incorporates both the tensor and functional aspects of the covariate. To address the high dimensionality and functional continuity of the regression coefficie… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  2. arXiv:2506.09208  [pdf, ps, other

    stat.AP stat.CO stat.ME stat.ML

    Integrated Analysis for Electronic Health Records with Structured and Sporadic Missingness

    Authors: Jianbin Tan, Yan Zhang, Chuan Hong, T. Tony Cai, Tianxi Cai, Anru R. Zhang

    Abstract: Objectives: We propose a novel imputation method tailored for Electronic Health Records (EHRs) with structured and sporadic missingness. Such missingness frequently arises in the integration of heterogeneous EHR datasets for downstream clinical applications. By addressing these gaps, our method provides a practical solution for integrated analysis, enhancing data utility and advancing the understa… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2505.23046  [pdf, ps, other

    stat.ME math.NA math.ST stat.ML

    Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence

    Authors: Runshi Tang, Julien Chhor, Olga Klopp, Anru R. Zhang

    Abstract: Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in nois… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  4. arXiv:2501.07336  [pdf, other

    stat.AP stat.ME

    Subtype-Aware Registration of Longitudinal Electronic Health Records

    Authors: Xin Gai, Shiyi Jiang, Anru R. Zhang

    Abstract: Electronic Health Records (EHRs) contain extensive patient information that can inform downstream clinical decisions, such as mortality prediction, disease phenotyping, and disease onset prediction. A key challenge in EHR data analysis is the temporal gap between when a condition is first recorded and its actual onset time. Such timeline misalignment can lead to artificially distinct biomarker tre… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  5. arXiv:2501.06652  [pdf, other

    math.ST math.NA math.OC stat.ME

    High-order Accurate Inference on Manifolds

    Authors: Chengzhu Huang, Anru R. Zhang

    Abstract: We present a new framework for statistical inference on Riemannian manifolds that achieves high-order accuracy, addressing the challenges posed by non-Euclidean parameter spaces frequently encountered in modern data science. Our approach leverages a novel and computationally efficient procedure to reach higher-order asymptotic precision. In particular, we develop a bootstrap algorithm on Riemannia… ▽ More

    Submitted 26 January, 2025; v1 submitted 11 January, 2025; originally announced January 2025.

  6. arXiv:2411.15660  [pdf, other

    math.ST cs.IT stat.ML

    Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm

    Authors: Jingyang Li, T. Tony Cai, Dong Xia, Anru R. Zhang

    Abstract: Federated Learning (FL) has gained significant recent attention in machine learning for its enhanced privacy and data security, making it indispensable in fields such as healthcare, finance, and personalized services. This paper investigates federated PCA and estimation for spiked covariance matrices under distributed differential privacy constraints. We establish minimax rates of convergence, wit… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  7. arXiv:2410.14046  [pdf, other

    stat.ML cs.LG math.NA stat.CO stat.ME

    Tensor Decomposition with Unaligned Observations

    Authors: Runshi Tang, Tamara Kolda, Anru R. Zhang

    Abstract: This paper presents a canonical polyadic (CP) tensor decomposition that addresses unaligned observations. The mode with unaligned observations is represented using functions in a reproducing kernel Hilbert space (RKHS). We introduce a versatile loss function that effectively accounts for various types of data, including binary, integer-valued, and positive-valued types. Additionally, we propose an… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  8. arXiv:2410.03619  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    Functional Singular Value Decomposition

    Authors: Jianbin Tan, Pixu Shi, Anru R. Zhang

    Abstract: Heterogeneous functional data commonly arise in time series and longitudinal studies. To uncover the statistical structures of such data, we propose Functional Singular Value Decomposition (FSVD), a unified framework encompassing various tasks for the analysis of functional data with potential heterogeneity. We establish the mathematical foundation of FSVD by proving its existence and providing it… ▽ More

    Submitted 16 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  9. arXiv:2405.03042  [pdf, other

    stat.ME stat.AP stat.CO

    Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis

    Authors: Zihan Zhu, Xin Gai, Anru R. Zhang

    Abstract: In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  10. arXiv:2404.03805  [pdf, ps, other

    stat.ME

    Blessing of dimension in Bayesian inference on covariance matrices

    Authors: Shounak Chattopadhyay, Anru R. Zhang, David B. Dunson

    Abstract: Bayesian factor analysis is routinely used for dimensionality reduction in modeling of high-dimensional covariance matrices. Factor analytic decompositions express the covariance as a sum of a low rank and diagonal matrix. In practice, Gibbs sampling algorithms are typically used for posterior computation, alternating between updating the latent factors, loadings, and residual variances. In this a… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  11. arXiv:2311.08629  [pdf, other

    stat.AP

    Soft Phenotyping for Sepsis via EHR Time-aware Soft Clustering

    Authors: Shiyi Jiang, Xin Gai, Miriam Treggiari, William W. Stead, Yuankang Zhao, C. David Page, Anru R. Zhang

    Abstract: Objective: Sepsis is one of the most serious hospital conditions associated with high mortality. Sepsis is the result of a dysregulated immune response to infection that can lead to multiple organ dysfunction and death. Due to the wide variability in the causes of sepsis, clinical presentation, and the recovery trajectories, identifying sepsis sub-phenotypes is crucial to advance our understanding… ▽ More

    Submitted 5 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  12. arXiv:2310.14146  [pdf, other

    stat.AP

    Cocaine Use Prediction with Tensor-based Machine Learning on Multimodal MRI Connectome Data

    Authors: Anru R. Zhang, Ryan P. Bell, Chen An, Runshi Tang, Shana A. Hall, Cliburn Chan, Kareem Al-Khalil, Christina S. Meade

    Abstract: This paper considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study utilized functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the datasets were transformed in… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  13. arXiv:2307.00575  [pdf, other

    stat.ME cs.LG math.NA math.ST

    Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model

    Authors: Runshi Tang, Ming Yuan, Anru R. Zhang

    Abstract: This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two step… ▽ More

    Submitted 4 August, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: Journal of the Royal Statistical Society, Series B, to appear

  14. arXiv:2303.05024  [pdf, other

    math.ST cs.LG cs.SI stat.ML

    Phase transition for detecting a small community in a large network

    Authors: Jiashun Jin, Zheng Tracy Ke, Paxton Turner, Anru R. Zhang

    Abstract: How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based $χ^2$-test was shown to be powerful in the presence of an Erdős-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the $χ^2$-test may be a modeling artifact, and it may disappear once we replace the Erdős-Renyi model by… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  15. arXiv:2209.12715  [pdf, other

    cs.CV cs.LG stat.AP stat.ML

    Enhancing convolutional neural network generalizability via low-rank weight approximation

    Authors: Chenyin Gao, Shu Yang, Anru R. Zhang

    Abstract: Noise is ubiquitous during image acquisition. Sufficient denoising is often an important first step for image processing. In recent decades, deep neural networks (DNNs) have been widely used for image denoising. Most DNN-based image denoising methods require a large-scale dataset or focus on supervised settings, in which single/pairs of clean images or a set of noisy images are required. This pose… ▽ More

    Submitted 1 August, 2024; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: accepted by IET Image Processing

  16. arXiv:2207.12484  [pdf, other

    stat.ME

    Core Shrinkage Covariance Estimation for Matrix-variate Data

    Authors: Peter Hoff, Andrew McCormack, Anru R. Zhang

    Abstract: A separable covariance model for a random matrix provides a parsimonious description of the covariances among the rows and among the columns of the matrix, and permits likelihood-based inference with a very small sample size. However, in many applications the assumption of exact separability is unlikely to be met, and data analysis with a separable model may overlook or misrepresent important depe… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    MSC Class: 62H20; 15A23

  17. arXiv:2206.08756  [pdf, other

    math.ST cs.LG math.OC stat.ME stat.ML

    Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay

    Authors: Yuetian Luo, Anru R. Zhang

    Abstract: We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter tensor/matrix without the prior knowledge of its intrinsic rank. We propose the Riemannian gradient descent (RGD) and Riemannian Gauss-Newton (RGN) methods and cope with the challenge of unknown rank by studying the effect of rank over-parameterization. We p… ▽ More

    Submitted 15 January, 2024; v1 submitted 17 June, 2022; originally announced June 2022.

  18. arXiv:2204.04209  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Learning Polynomial Transformations

    Authors: Sitan Chen, Jerry Li, Yuanzhi Li, Anru R. Zhang

    Abstract: We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 121 pages, comments welcome

  19. arXiv:2108.04201  [pdf, other

    stat.ME math.ST stat.AP

    Guaranteed Functional Tensor Singular Value Decomposition

    Authors: Rungang Han, Pixu Shi, Anru R. Zhang

    Abstract: This paper introduces the functional tensor singular value decomposition (FTSVD), a novel dimension reduction framework for tensors with one functional mode and several tabular modes. The problem is motivated by high-order longitudinal data analysis. Our model assumes the observed data to be a random realization of an approximate CP low-rank functional tensor measured on a discrete time grid. Inco… ▽ More

    Submitted 25 October, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Journal of the American Statistical Association, to appear

  20. arXiv:2108.01772  [pdf, other

    math.OC cs.IT cs.LG eess.SP stat.ML

    Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization

    Authors: Yuetian Luo, Xudong Li, Anru R. Zhang

    Abstract: In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-o… ▽ More

    Submitted 12 August, 2024; v1 submitted 3 August, 2021; originally announced August 2021.

  21. arXiv:2104.12031  [pdf, other

    stat.ML cs.LG math.NA math.OC stat.ME

    Low-rank Tensor Estimation via Riemannian Gauss-Newton: Statistical Optimality and Second-Order Convergence

    Authors: Yuetian Luo, Anru R. Zhang

    Abstract: In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear… ▽ More

    Submitted 8 July, 2023; v1 submitted 24 April, 2021; originally announced April 2021.

  22. arXiv:2012.14844  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Inference for Low-rank Tensors -- No Need to Debias

    Authors: Dong Xia, Anru R. Zhang, Yuchen Zhou

    Abstract: In this paper, we consider the statistical inference for several low-rank tensor models. Specifically, in the Tucker low-rank tensor PCA or regression model, provided with any estimates achieving some attainable error rate, we develop the data-driven confidence regions for the singular subspace of the parameter tensor based on the asymptotic distribution of an updated estimate by two-iteration alt… ▽ More

    Submitted 29 October, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: to appear at the Annals of Statistics

  23. arXiv:2012.09996  [pdf, other

    stat.ME math.ST stat.ML

    Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit

    Authors: Rungang Han, Yuetian Luo, Miaoyan Wang, Anru R. Zhang

    Abstract: High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd alg… ▽ More

    Submitted 10 October, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Journal of the Royal Statistical Society, Series B, to appear

  24. arXiv:2011.08360  [pdf, other

    math.OC cs.LG math.NA stat.CO stat.ML

    Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence

    Authors: Yuetian Luo, Wen Huang, Xudong Li, Anru R. Zhang

    Abstract: In this paper, we propose {\it \underline{R}ecursive} {\it \underline{I}mportance} {\it \underline{S}ketching} algorithm for {\it \underline{R}ank} constrained least squares {\it \underline{O}ptimization} (RISRO). The key step of RISRO is recursive importance sketching, a new sketching framework based on deterministically designed recursive projections, which significantly differs from the randomi… ▽ More

    Submitted 4 December, 2022; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: Accepted by Operations Research

  25. arXiv:2010.02482  [pdf, other

    math.ST cs.LG math.NA stat.CO stat.ME

    Optimal High-order Tensor SVD via Tensor-Train Orthogonal Iteration

    Authors: Yuchen Zhou, Anru R. Zhang, Lili Zheng, Yazhen Wang

    Abstract: This paper studies a general framework for high-order tensor SVD. We propose a new computationally efficient algorithm, tensor-train orthogonal iteration (TTOI), that aims to estimate the low tensor-train rank structure from the noisy high-order tensor observation. The proposed TTOI consists of initialization via TT-SVD (Oseledets, 2011) and new iterative backward/forward updates. We develop the g… ▽ More

    Submitted 24 January, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: to appear in IEEE Transactions on Information Theory

  26. arXiv:2009.05870  [pdf, ps, other

    stat.ML cs.CC cs.LG math.CO

    Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection

    Authors: Yuetian Luo, Anru R. Zhang

    Abstract: We note the significance of hypergraphic planted clique (HPC) detection in the investigation of computational hardness for a range of tensor problems. We ask if more evidence for the computational hardness of HPC detection can be developed. In particular, we conjecture if it is possible to establish the equivalence of the computational hardness between HPC and PC detection.

    Submitted 12 September, 2020; originally announced September 2020.

    Comments: Published at Proceedings of Conference on Learning Theory, 2020

  27. arXiv:2008.02437  [pdf, other

    math.ST cs.LG math.NA stat.ML

    A Sharp Blockwise Tensor Perturbation Bound for Orthogonal Iteration

    Authors: Yuetian Luo, Garvesh Raskutti, Ming Yuan, Anru R. Zhang

    Abstract: In this paper, we develop novel perturbation bounds for the high-order orthogonal iteration (HOOI) [DLDMV00b]. Under mild regularity conditions, we establish blockwise tensor perturbation bounds for HOOI with guarantees for both tensor reconstruction in Hilbert-Schmidt norm $\|\widehat{\bcT} - \bcT \|_{\tHS}$ and mode-$k$ singular subspace estimation in Schatten-$q$ norm… ▽ More

    Submitted 5 June, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

  28. arXiv:2005.10743  [pdf, other

    math.ST cs.CC cs.LG stat.ME stat.ML

    Tensor Clustering with Planted Structures: Statistical Optimality and Computational Limits

    Authors: Yuetian Luo, Anru R. Zhang

    Abstract: This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify the sharp bo… ▽ More

    Submitted 2 October, 2023; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Done a few clarifications and added low-degree polynomial based evidence for HPDS recovery conjecture 2

  29. arXiv:2002.11255  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    An Optimal Statistical and Computational Framework for Generalized Tensor Estimation

    Authors: Rungang Han, Rebecca Willett, Anru R. Zhang

    Abstract: This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we int… ▽ More

    Submitted 4 February, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

  30. arXiv:1909.09851  [pdf, other

    math.ST cs.LG stat.ML

    Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference

    Authors: T. Tony Cai, Anru R. Zhang, Yuchen Zhou

    Abstract: We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model -- an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are establishe… ▽ More

    Submitted 6 May, 2022; v1 submitted 21 September, 2019; originally announced September 2019.

    Comments: IEEE Transactions on Information Theory, to appear

  31. arXiv:1811.11709  [pdf, other

    stat.ME math.ST stat.AP

    High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

    Authors: Pixu Shi, Yuchen Zhou, Anru R. Zhang

    Abstract: In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain criti… ▽ More

    Submitted 10 March, 2021; v1 submitted 28 November, 2018; originally announced November 2018.

  32. arXiv:1810.08316  [pdf, other

    math.ST stat.CO stat.ME stat.ML

    Heteroskedastic PCA: Algorithm, Optimality, and Applications

    Authors: Anru R. Zhang, T. Tony Cai, Yihong Wu

    Abstract: A general framework for principal component analysis (PCA) in the presence of heteroskedastic noise is introduced. We propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries of the sample covariance matrix to remove estimation bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covaria… ▽ More

    Submitted 1 April, 2021; v1 submitted 18 October, 2018; originally announced October 2018.

  33. arXiv:1711.00101  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    Nonparametric covariance estimation for mixed longitudinal studies, with applications in midlife women's health

    Authors: Anru R. Zhang, Kehui Chen

    Abstract: In mixed longitudinal studies, a group of subjects enter the study at different ages (cross-sectional) and are followed for successive years (longitudinal). In the context of such studies, we consider nonparametric covariance estimation with samples of noisy and partially observed functional trajectories. The proposed algorithm is based on a noniterative sequential-aggregation scheme with only bas… ▽ More

    Submitted 1 December, 2020; v1 submitted 31 October, 2017; originally announced November 2017.

  34. arXiv:1305.4896  [pdf, other

    stat.ME

    Methods to Calculate the Upper Bound of Gini Coefficient Based on Grouped Data and the Result for China

    Authors: Pixu Shi, Anru R. Zhang

    Abstract: Determining an upper bound, particularly the optimal upper bound of the Gini coefficient when dealing with grouped data without specified income brackets, remains an important and open question. In this paper, we introduce an efficient algorithm to calculate the exact optimal upper bound of the Gini coefficient with provable guarantees. To exemplify these methods, we also offer computed results fo… ▽ More

    Submitted 14 January, 2025; v1 submitted 21 May, 2013; originally announced May 2013.