Skip to main content

Showing 1–50 of 188 results for author: Li, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2507.04438  [pdf, ps, other

    quant-ph cs.DS cs.LG math.OC stat.ML

    Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities

    Authors: Yuexin Su, Ziyi Yang, Peiyuan Huang, Tongyang Li, Yinyu Ye

    Abstract: Bandits with knapsacks (BwK) constitute a fundamental model that combines aspects of stochastic integer programming with online learning. Classical algorithms for BwK with a time horizon $T$ achieve a problem-independent regret bound of ${O}(\sqrt{T})$ and a problem-dependent bound of ${O}(\log T)$. In this paper, we initiate the study of the BwK model in the setting of quantum computing, where bo… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: 33 pages

  2. arXiv:2507.00402  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    GRAND: Graph Release with Assured Node Differential Privacy

    Authors: Suqing Liu, Xuan Bi, Tianxi Li

    Abstract: Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data -- particularly at the node level -- remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based approaches, which restrict output to pre-specified network statistics, or fai… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  3. arXiv:2506.22361  [pdf, ps, other

    stat.ME

    A General Test for Independent and Identically Distributed Hypothesis

    Authors: Tongyu Li, Jonas Mueller, Fang Yao

    Abstract: We propose a simple and intuitive test for arguably the most prevailing hypothesis in statistics that data are independent and identically distributed (IID), based on a newly introduced off-diagonal sequential U-process. This IID test is fully nonparametric and applicable to random objects in general spaces, while requiring no specific alternatives such as structural breaks or serial dependence, w… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  4. arXiv:2506.13792  [pdf, ps, other

    cs.AI cs.CL cs.LG stat.AP

    ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution

    Authors: Gonçalo Hora de Carvalho, Lazar S. Popov, Sander Kaatee, Kristinn R. Thórisson, Tangrui Li, Pétur Húni Björnsson, Jilles S. Dibangoye

    Abstract: We introduce ICE-ID, a novel benchmark dataset for historical identity resolution, comprising 220 years (1703-1920) of Icelandic census records. ICE-ID spans multiple generations of longitudinal data, capturing name variations, demographic changes, and rich genealogical links. To the best of our knowledge, this is the first large-scale, open tabular dataset specifically designed to study long-term… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  5. arXiv:2506.13064  [pdf, ps, other

    cs.LG stat.ML

    CoIFNet: A Unified Framework for Multivariate Time Series Forecasting with Missing Values

    Authors: Kai Tang, Ji Zhang, Hua Meng, Minbo Ma, Qi Xiong, Fengmao Lv, Jie Xu, Tianrui Li

    Abstract: Multivariate time series forecasting (MTSF) is a critical task with broad applications in domains such as meteorology, transportation, and economics. Nevertheless, pervasive missing values caused by sensor failures or human errors significantly degrade forecasting accuracy. Prior efforts usually employ an impute-then-forecast paradigm, leading to suboptimal predictions due to error accumulation an… ▽ More

    Submitted 20 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

  6. arXiv:2506.12207  [pdf, ps, other

    stat.ME

    Estimating treatment effects with a unified semi-parametric difference-in-differences approach

    Authors: Julia C. Thome, Andrew J. Spieker, Peter F. Rebeiro, Chun Li, Tong Li, Bryan E. Shepherd

    Abstract: Difference-in-differences (DID) approaches are widely used for estimating causal effects with observational data before and after an intervention. DID traditionally estimates the average treatment effect among the treated after making a parallel trends assumption on the means of the outcome. With skewed outcomes, a transformation is often needed; however, the transformation may be difficult to cho… ▽ More

    Submitted 20 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  7. arXiv:2506.09358  [pdf, ps, other

    stat.ME

    Functional Tensor Regression

    Authors: Tongyu Li, Fang Yao, Anru R. Zhang

    Abstract: Tensor regression has attracted significant attention in statistical research. This study tackles the challenge of handling covariates with smooth varying structures. We introduce a novel framework, termed functional tensor regression, which incorporates both the tensor and functional aspects of the covariate. To address the high dimensionality and functional continuity of the regression coefficie… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  8. arXiv:2506.04695  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models

    Authors: Xingwu Chen, Tianle Li, Difan Zou

    Abstract: Reinforcement learning (RL) has demonstrated remarkable success in enhancing model capabilities, including instruction-following, preference learning, and reasoning. Yet despite its empirical successes, the mechanisms by which RL improves reasoning abilities remain poorly understood. We present a systematic study of Reinforcement Learning with Verifiable Rewards (RLVR), showing that its primary be… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 30 pages, 6 figures, 1 table

  9. arXiv:2506.02363  [pdf, ps, other

    stat.ME

    Function-on-function Differential Regression

    Authors: Tongyu Li, Fang Yao

    Abstract: Function-on-function regression has been a topic of substantial interest due to its broad applicability, where the relation between functional predictor and response is concerned. In this article, we propose a new framework for modeling the regression mapping that extends beyond integral type, motivated by the prevalence of physical phenomena governed by differential relations, which is referred t… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  10. arXiv:2505.24259  [pdf, ps, other

    stat.ME

    Partially-shared Imaging Regression on Integrating Heterogeneous Brain-Cognition Associations across Alzheimer's Diagnoses

    Authors: Yang Sui, Qi Xu, Ting Li, Yang Bai, Annie Qu

    Abstract: This paper is motivated by the heterogeneous associations among demographic covariates, imaging data, and cognitive performances across different diagnostic groups within the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. We propose a novel PArtially-shared Imaging Regression (PAIR) model with smooth spatial component integration to capture heterogeneous imaging coefficients across mult… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  11. arXiv:2505.10311  [pdf, other

    eess.IV eess.SP stat.AP stat.ML

    Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems

    Authors: Jeffrey Alido, Tongyu Li, Yu Sun, Lei Tian

    Abstract: Conventional score-based diffusion models (DMs) may struggle with anisotropic Gaussian diffusion processes due to the required inversion of covariance matrices in the denoising score matching training objective \cite{vincent_connection_2011}. We propose Whitened Score (WS) diffusion models, a novel framework based on stochastic differential equations that learns the Whitened Score function instead… ▽ More

    Submitted 20 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  12. arXiv:2505.05364  [pdf

    stat.AP

    Machine learning bridging battery field data and laboratory data

    Authors: Yanbin Zhao, Hao Liu, Zhihua Deng, Tong Li, Haoyi Jiang, Zhenfei Ling, Xingkai Wang, Lei Zhang, Xiaoping Ouyang

    Abstract: Aiming at the dilemma that most laboratory data-driven diagnostic and prognostic methods cannot be applied to field batteries in passenger cars and energy storage systems, this paper proposes a method to bridge field data and laboratory data using machine learning. Only two field real impedances corresponding to a medium frequency and a high frequency are needed to predict laboratory real impedanc… ▽ More

    Submitted 13 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 73 pages, 21 figures

  13. arXiv:2504.18835  [pdf

    stat.AP

    Machine learning accelerates fuel cell life testing

    Authors: Yanbin Zhao, Hao Liu, Zhihua Deng, Haoyi Jiang, Zhenfei Ling, Zhiyang Liu, Xingkai Wang, Tong Li, Xiaoping Ouyang

    Abstract: Accelerated life testing (ALT) can significantly reduce the economic, time, and labor costs of life testing in the process of equipment, device, and material research and development (R&D), and improve R&D efficiency. This paper proposes a performance characterization data prediction (PCDP) method and a life prediction-driven ALT (LP-ALT) method to accelerate the life test of polymer electrolyte m… ▽ More

    Submitted 7 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

    Comments: 39 pages, 25 figures

  14. arXiv:2503.05799  [pdf, other

    eess.SY eess.SP stat.ML

    From Target Tracking to Targeting Track -- Part III: Stochastic Process Modeling and Online Learning

    Authors: Tiancheng Li, Jingyuan Wang, Guchong Li, Dengwei Gao

    Abstract: This is the third part of a series of studies that model the target trajectory, which describes the target state evolution over continuous time, as a sample path of a stochastic process (SP). By adopting a deterministic-stochastic decomposition framework, we decompose the learning of the trajectory SP into two sequential stages: the first fits the deterministic trend of the trajectory using a curv… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Part III of a series of companion papers; 10 pages, 6 figures

  15. arXiv:2503.01728  [pdf, other

    cs.LG stat.ME

    DeepSuM: Deep Sufficient Modality Learning Framework

    Authors: Zhe Gao, Jian Huang, Ting Li, Xueqin Wang

    Abstract: Multimodal learning has become a pivotal approach in developing robust learning models with applications spanning multimedia, robotics, large language models, and healthcare. The efficiency of multimodal systems is a critical concern, given the varying costs and resource demands of different modalities. This underscores the necessity for effective modality selection to balance performance gains ag… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  16. arXiv:2502.15609  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    On the Robustness of Transformers against Context Hijacking for Linear Classification

    Authors: Tianle Li, Chenyang Zhang, Xingwu Chen, Yuan Cao, Difan Zou

    Abstract: Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  17. arXiv:2501.03883  [pdf, ps, other

    stat.ME stat.CO

    Spline Quantile Regression

    Authors: Ta-Hsin Li, Nimrod Megiddo

    Abstract: Quantile regression is a powerful tool capable of offering a richer view of the data as compared to least-squares regression. Quantile regression is typically performed individually on a few quantiles or a grid of quantiles without considering the similarity of the underlying regression coefficients at nearby quantiles. When needed, an ad hoc post-processing procedure such as kernel smoothing is e… ▽ More

    Submitted 8 April, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  18. arXiv:2412.20840  [pdf, other

    stat.ME

    Identifying average causal effect in regression discontinuity design with auxiliary data

    Authors: Xinqin Feng, Wenjie Hu, Pu Yang, Tingyu Li, Xiao-Hua Zhou

    Abstract: Regression discontinuity designs are widely used when treatment assignment is determined by whether a running variable exceeds a predefined threshold. However, most research focuses on estimating local causal effects at the threshold, leaving the challenge of identifying treatment effects away from the cutoff largely unaddressed. The primary difficulty in this context is that the treatment assignm… ▽ More

    Submitted 2 January, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

  19. arXiv:2412.19735  [pdf, other

    stat.AP

    A General Framework of Brain Region Detection And Genetic Variants Selection in Imaging Genetics

    Authors: Siqiang Su, Zhenghao Li, Long Feng, Ting Li

    Abstract: Imaging genetics is a growing field that employs structural or functional neuroimaging techniques to study individuals with genetic risk variants potentially linked to specific illnesses. This area presents considerable challenges to statisticians due to the heterogeneous information and different data forms it involves. In addition, both imaging and genetic data are typically high-dimensional, cr… ▽ More

    Submitted 30 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

  20. arXiv:2412.19251  [pdf, other

    stat.ME math.ST

    Network double autoregression

    Authors: Tingting Li, Hao Wang

    Abstract: Modeling high-dimensional time series with simple structures is a challenging problem. This paper proposes a network double autoregression (NDAR) model, which combines the advantages of network structure and the double autoregression (DAR) model, to handle high-dimensional, conditionally heteroscedastic, and network-structured data within a simple framework. The parameters of the model are estimat… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  21. arXiv:2412.18145  [pdf, other

    stat.ME cs.SI physics.soc-ph

    Supervised centrality via sparse network influence regression: an application to the 2021 Henan floods' social network

    Authors: Yingying Ma, Wei Lan, Chenlei Leng, Ting Li, Hansheng Wang

    Abstract: The social characteristics of players in a social network are closely associated with their network positions and relational importance. Identifying those influential players in a network is of great importance as it helps to understand how ties are formed, how information is propagated, and, in turn, can guide the dissemination of new information. Motivated by a Sina Weibo social network analysis… ▽ More

    Submitted 27 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

  22. arXiv:2412.17622  [pdf, other

    cs.LG stat.ML

    Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

    Authors: Parham Rezaei, Farzan Farnia, Cheuk Ting Li

    Abstract: The availability of multiple training algorithms and architectures for generative models requires a selection mechanism to form a single model over a group of well-trained generation models. The selection task is commonly addressed by identifying the model that maximizes an evaluation score based on the diversity and quality of the generated data. However, such a best-model identification approach… ▽ More

    Submitted 22 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Journal ref: Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025

  23. arXiv:2412.17163  [pdf, ps, other

    stat.ME

    Spline Autoregression Method for Estimation of Quantile Spectrum

    Authors: Ta-Hsin Li

    Abstract: The quantile spectrum was introduced in Li (2012; 2014) as an alternative tool for spectral analysis of time series. It has the capability of providing a richer view of time series data than that offered by the ordinary spectrum especially for nonlinear dynamics such as stochastic volatility. A novel method, called spline autoregression (SAR), is proposed in this paper for estimating the quantile… ▽ More

    Submitted 30 December, 2024; v1 submitted 22 December, 2024; originally announced December 2024.

  24. arXiv:2412.14291  [pdf, other

    math.OC cs.LG stat.ML

    Projected gradient methods for nonconvex and stochastic optimization: new complexities and auto-conditioned stepsizes

    Authors: Guanghui Lan, Tianjiao Li, Yangyang Xu

    Abstract: We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the "vanilla" PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieve… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  25. arXiv:2412.08051  [pdf, other

    stat.ME cs.SI math.ST stat.CO stat.ML

    Two-way Node Popularity Model for Directed and Bipartite Networks

    Authors: Bing-Yi Jing, Ting Li, Jiangzhou Wang, Ya Wang

    Abstract: There has been extensive research on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from differe… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  26. arXiv:2412.02513  [pdf, ps, other

    stat.ME

    Quantile-Crossing Spectrum and Spline Autoregression Estimation

    Authors: Ta-Hsin Li

    Abstract: The quantile-crossing spectrum is the spectrum of quantile-crossing processes created from a time series by the indicator function that shows whether or not the time series lies above or below a given quantile at a given time. This bivariate function of frequency and quantile level provides a richer view of serial dependence than that offered by the ordinary spectrum. We propose a new method for e… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  27. arXiv:2410.12224  [pdf, other

    cs.LG stat.ME

    Causally-Aware Unsupervised Feature Selection Learning

    Authors: Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li

    Abstract: Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and… ▽ More

    Submitted 25 January, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

  28. arXiv:2410.04579  [pdf, other

    cs.CL cs.LG stat.ML

    Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

    Authors: Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi

    Abstract: Data abundance across different domains exhibits a long-tailed distribution: few domains have abundant data, while most face data scarcity. Our work focuses on a multilingual setting, where available data is heavily skewed towards high-resource languages. Two common strategies to address this disparity are upsampling low-resource data (Temperature Sampling) and upweighting low-resource loss (Scala… ▽ More

    Submitted 9 March, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: 19 pages, 9 figures, accepted to NAACL 2025 main conference

  29. arXiv:2410.04170  [pdf, other

    stat.ME

    Physics-encoded Spatio-temporal Regression

    Authors: Tongyu Li, Fang Yao

    Abstract: Physics-informed methods have gained a great success in analyzing data with partial differential equation (PDE) constraints, which are ubiquitous when modeling dynamical systems. Different from the common penalty-based approach, this work promotes adherence to the underlying physical mechanism that facilitates statistical procedures. The motivating application concerns modeling fluorescence recove… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  30. arXiv:2410.01979  [pdf, ps, other

    math.OC cs.LG stat.ML

    Auto-conditioned primal-dual hybrid gradient method and alternating direction method of multipliers

    Authors: Guanghui Lan, Tianjiao Li

    Abstract: Line search procedures are often employed in primal-dual methods for bilinear saddle point problems, especially when the norm of the linear operator is large or difficult to compute. In this paper, we demonstrate that line search is unnecessary by introducing a novel primal-dual method, the auto-conditioned primal-dual hybrid gradient (AC-PDHG) method, which achieves optimal complexity for solving… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  31. arXiv:2410.01163  [pdf, other

    stat.ME

    Perturbation-Robust Predictive Modeling of Social Effects by Network Subspace Generalized Linear Models

    Authors: Jianxiang Wang, Can M. Le, Tianxi Li

    Abstract: Network-linked data, where multivariate observations are interconnected by a network, are becoming increasingly prevalent in fields such as sociology and biology. These data often exhibit inherent noise and complex relational structures, complicating conventional modeling and statistical inference. Motivated by empirical challenges in analyzing such data sets, this paper introduces a family of net… ▽ More

    Submitted 19 January, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

    Comments: 52 pages, 7 figures

  32. arXiv:2409.19431  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Generalization and Robustness of the Tilted Empirical Risk

    Authors: Gholamali Aminian, Amir R. Asadi, Tian Li, Ahmad Beirami, Gesine Reinert, Samuel N. Cohen

    Abstract: The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error… ▽ More

    Submitted 7 June, 2025; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: Accepted in ICML 2025

  33. arXiv:2409.01599  [pdf, other

    stat.ME math.ST

    Multivariate Inference of Network Moments by Subsampling

    Authors: Mingyu Qi, Tianxi Li, Wen Zhou

    Abstract: In this paper, we study the characterization of a network population by analyzing a single observed network, focusing on the counts of multiple network motifs or their corresponding multivariate network moments. We introduce an algorithm based on node subsampling to approximate the nontrivial joint distribution of the network moments, and prove its asymptotic accuracy. By examining the joint distr… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  34. arXiv:2407.21242  [pdf, other

    stat.AP stat.CO

    Supervised brain node and network construction under voxel-level functional imaging

    Authors: Wanwan Xu, Selena Wang, Chichun Tan, Xilin Shen, Wenjing Luo, Todd Constable, Tianxi Li, Yize Zhao

    Abstract: Recent advancements in understanding the brain's functional organization related to behavior have been pivotal, particularly in the development of predictive models based on brain connectivity. Traditional methods in this domain often involve a two-step process by first constructing a connectivity matrix from predefined brain regions, and then linking these connections to behaviors or clinical out… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  35. arXiv:2407.21154  [pdf, other

    stat.ME

    Bayesian thresholded modeling for integrating brain node and network predictors

    Authors: Zhe Sun, Wanwan Xu, Tianxi Li, Jian Kang, Gregorio Alanis-Lobato, Yize Zhao

    Abstract: Progress in neuroscience has provided unprecedented opportunities to advance our understanding of brain alterations and their correspondence to phenotypic profiles. With data collected from various imaging techniques, studies have integrated different types of information ranging from brain structure, function, or metabolism. More recently, an emerging way to categorize imaging traits is through a… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 57 pages, 6 figures

    MSC Class: 62C10; 92B15; 62P10

  36. arXiv:2407.20057  [pdf

    physics.ao-ph cs.LG stat.AP

    Reconstructing Global Daily CO2 Emissions via Machine Learning

    Authors: Tao Li, Lixing Wang, Zihan Qiu, Philippe Ciais, Taochun Sun, Matthew W. Jones, Robbie M. Andrew, Glen P. Peters, Piyu ke, Xiaoting Huang, Robert B. Jackson, Zhu Liu

    Abstract: High temporal resolution CO2 emission data are crucial for understanding the drivers of emission changes, however, current emission dataset is only available on a yearly basis. Here, we extended a global daily CO2 emissions dataset backwards in time to 1970 using machine learning algorithm, which was trained to predict historical daily emissions on national scales based on relationships between da… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  37. arXiv:2406.13036  [pdf, other

    stat.ML cs.LG math.PR math.ST stat.CO

    Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

    Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

    Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  38. arXiv:2406.03683  [pdf, other

    cs.LG stat.ML

    Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

    Authors: Ding Huang, Ting Li, Jian Huang

    Abstract: We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 25 pages, 26 figures, and 4 tables

    MSC Class: 62G05; 68T07

  39. arXiv:2406.03596  [pdf

    stat.ME

    A Multivariate Equivalence Test Based on Mahalanobis Distance with a Data-Driven Margin

    Authors: Chao Wang, Yu-Ting Weng, Shaobo Liu, Tengfei Li, Meiyu Shen, Yi Tsong

    Abstract: Multivariate equivalence testing is needed in a variety of scenarios for drug development. For example, drug products obtained from natural sources may contain many components for which the individual effects and/or their interactions on clinical efficacy and safety cannot be completely characterized. Such lack of sufficient characterization poses a challenge for both generic drug developers to de… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  40. arXiv:2406.00317  [pdf, other

    stat.ML cs.LG stat.ME

    Combining Experimental and Historical Data for Policy Evaluation

    Authors: Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

    Abstract: This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to min… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  41. arXiv:2405.20782  [pdf, other

    cs.CR cs.IT stat.ML

    Universal Exact Compression of Differentially Private Mechanisms

    Authors: Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li

    Abstract: To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the or… ▽ More

    Submitted 10 November, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 33 pages, 5 figures

  42. arXiv:2405.12838  [pdf, ps, other

    quant-ph stat.CO

    Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

    Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

    Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

  43. arXiv:2402.17287  [pdf, other

    cs.LG cs.CV stat.ML

    An Interpretable Evaluation of Entropy-based Novelty of Generative Models

    Authors: Jingwei Zhang, Cheuk Ting Li, Farzan Farnia

    Abstract: The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine… ▽ More

    Submitted 13 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  44. arXiv:2402.05802  [pdf, other

    cs.LG stat.AP stat.ML

    Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

    Authors: Thomas A. Lasko, John M. Still, Thomas Z. Li, Marco Barbero Mota, William W. Stead, Eric V. Strobl, Bennett A. Landman, Fabien Maldonado

    Abstract: Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 29 Pages, 8 figures

    ACM Class: I.2.6; I.2.1; J.3

  45. arXiv:2401.16320  [pdf, ps, other

    quant-ph stat.ML

    A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

    Authors: X. L. Zhao, Y. M. Zhao, M. Li, T. T. Li, Q. Liu, S. Guo, X. X. Yi

    Abstract: We propose a scheme leveraging reinforcement learning to engineer control fields for generating non-classical states. It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics. The reinforcement learning agent determines the temporal sequence of control pulses, commencing from a coherent sp… ▽ More

    Submitted 14 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  46. arXiv:2312.05579  [pdf, other

    stat.ML cs.LG

    Conditional Stochastic Interpolation for Generative Learning

    Authors: Ding Huang, Jian Huang, Ting Li, Guohao Shen

    Abstract: We propose a conditional stochastic interpolation (CSI) method for learning conditional distributions. CSI is based on estimating probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the conditional drift and score functions based on CSI, which are then used to construct a… ▽ More

    Submitted 26 August, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: 57 pages, 5 figures

    MSC Class: 62G05; 68T07

  47. arXiv:2311.15598  [pdf, other

    math.ST cs.LG cs.SI stat.ME stat.ML

    Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

    Authors: Zhongyuan Lyu, Ting Li, Dong Xia

    Abstract: In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  48. arXiv:2311.05248  [pdf, other

    stat.ME math.ST

    A General Space of Belief Updates for Model Misspecification in Bayesian Networks

    Authors: Tianjin Li

    Abstract: In an ideal setting for Bayesian agents, a perfect description of the rules of the environment (i.e., the objective observation model) is available, allowing them to reason through the Bayesian posterior to update their beliefs in an optimal way. But such an ideal setting hardly ever exists in the natural world, so agents have to make do with reasoning about how they should update their beliefs si… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 14 pages, 4 figures

  49. arXiv:2311.02532  [pdf, other

    stat.ME

    Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making

    Authors: Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, Hongtu Zhu

    Abstract: A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentia… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  50. arXiv:2309.08809  [pdf

    stat.AP

    Associations Between Sleep Efficiency Variability and Cognition Among Older Adults: Cross-Sectional Accelerometer Study

    Authors: Collin Sakal, Tingyou Li, Juan Li, Xinyue Li

    Abstract: Objective: We aimed to determine the relationship between day-to-day sleep efficiency variability and cognitive function among older adults using accelerometer data and three cognitive tests. Methods: Older adults aged 65+ with 5 days of accelerometer data from the National Health and Nutrition Examination Survey (NHANES) who completed the Digit Symbol Substitution Test (DSST), the Consortium to… ▽ More

    Submitted 5 November, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Revised study design and figures