Skip to main content

Showing 1–50 of 199 results for author: LI, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.23302  [pdf, ps, other

    stat.CO

    State Space Model Programming in Turing.jl

    Authors: Tim Hargreaves, Qing Li, Charles Knipp, Frederic Wantiez, Simon J. Godsill, Hong Ge

    Abstract: State space models (SSMs) are a powerful and widely-used class of probabilistic models for analysing time-series data across various fields, from econometrics to robotics. Despite their prevalence, existing software frameworks for SSMs often lack compositionality and scalability, hindering experimentation and making it difficult to leverage advanced inference techniques. This paper introduces SSMP… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 16 pages, 6 figures, Presented at LAFI (Languages for Inference) Workshop, POPL 2025

  2. arXiv:2505.19367  [pdf, ps, other

    stat.ML cs.LG

    Adaptive Diffusion Guidance via Stochastic Optimal Control

    Authors: Iskander Azangulov, Peter Potaptchik, Qinyu Li, Eddie Aamari, George Deligiannidis, Judith Rousseau

    Abstract: Guidance is a cornerstone of modern diffusion models, playing a pivotal role in conditional generation and enhancing the quality of unconditional samples. However, current approaches to guidance scheduling--determining the appropriate guidance weight--are largely heuristic and lack a solid theoretical foundation. This work addresses these limitations on two fronts. First, we provide a theoretical… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  3. arXiv:2505.12952  [pdf, ps, other

    cs.LG stat.ML

    LoD: Loss-difference OOD Detection by Intentionally Label-Noisifying Unlabeled Wild Data

    Authors: Chuanxing Geng, Qifei Li, Xinrui Wang, Dong Liang, Songcan Chen, Pong C. Yuen

    Abstract: Using unlabeled wild data containing both in-distribution (ID) and out-of-distribution (OOD) data to improve the safety and reliability of models has recently received increasing attention. Existing methods either design customized losses for labeled ID and unlabeled wild data then perform joint optimization, or first filter out OOD data from the latter then learn an OOD detector. While achieving… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI2025

  4. arXiv:2504.20360  [pdf, other

    stat.ME

    Identification and estimation of vaccine effectiveness in the test-negative design under equi-confounding

    Authors: Christopher B. Boyer, Kendrick Qijun Li, Xu Shi, Eric J. Tchetgen Tchetgen

    Abstract: The test-negative design (TND) is frequently used to evaluate vaccine effectiveness in real-world settings. In a TND study, individuals with similar symptoms who seek care are tested for the disease of interest, and vaccine effectiveness is estimated by comparing the vaccination history of test-positive cases and test-negative controls. Traditional approaches justify the TND by assuming either (a)… ▽ More

    Submitted 11 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  5. arXiv:2504.14169  [pdf, other

    stat.ME

    Correction for nonignorable nonresponse bias in the estimation of turnout using callback data

    Authors: Xinyu Li, Naiwen Ying, Kendrick Qijun Li, Xu Shi, Wang Miao

    Abstract: Overestimation of turnout has long been an issue in election surveys, with nonresponse bias or voter overrepresentation regarded as one of the major sources of bias. However, the adjustment for nonignorable nonresponse bias is substantially challenging. Based on the ANES Non-Response Follow-Up Study concerning the 2020 U.S. presidential election, we investigate the role of callback data in adjusti… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  6. arXiv:2504.04547  [pdf, other

    stat.ME stat.AP

    Variational Bayesian Multiple Imputation in High-Dimensional Regression Models With Missing Responses

    Authors: Qiushuang Li, Recai Yucel

    Abstract: Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require specialized methods to overcome the computational burden. Using linear mixed-effects models, we develop such methods that can be applied to continuous, binary,… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  7. arXiv:2504.04539  [pdf, ps, other

    stat.ME stat.AP

    Sequential Hierarchical Regression Imputation with Variable Selection Routines

    Authors: Qiushuang Li, Recai Yucel

    Abstract: We aim to incorporate variable selection routines into variable-by-variable (or sequential) imputation in clustered data to achieve computational improvement in applications with large-scale health data. Specifically, we utilize variable selection routines using spike-and-slab priors within the Bayesian variable selection routine. The choice of these priors allows us to ``force'' variables of impo… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  8. arXiv:2504.00755  [pdf, ps, other

    stat.ME stat.CO

    Efficient computation of high-dimensional penalized piecewise constant hazard random effects models

    Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Xianlu L. Peng, Jen Jen Yeh

    Abstract: Identifying and characterizing relationships between treatments, exposures, or other covariates and time-to-event outcomes has great significance in a wide range of biomedical settings. In research areas such as multi-center clinical trials, recurrent events, and genetic studies, proportional hazard mixed effects models (PHMMs) are used to account for correlations observed in clusters within the d… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Journal ref: Statistics in Medicine 2025

  9. arXiv:2503.24322  [pdf, other

    cs.LG stat.ML

    NoProp: Training Neural Networks without Back-propagation or Forward-propagation

    Authors: Qinyu Li, Yee Whye Teh, Razvan Pascanu

    Abstract: The canonical deep learning approach for learning requires computing a gradient term at each layer by back-propagating the error signal from the output towards each learnable parameter. Given the stacked structure of neural networks, where each layer builds on the representation of the layer below, this approach leads to hierarchical representations. More abstract features live on the top layers o… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  10. arXiv:2502.13453  [pdf, other

    stat.AP

    BISON: Bi-clustering of spatial omics data with feature selection

    Authors: Bencong Zhu, Alberto Cassese, Marina Vannucci, Michele Guindani, Qiwei Li

    Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Understanding gene functions and interactions in different spatial domains is crucial, as it can enhance our comprehension of biological mechanisms, such as cancer-im… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  11. arXiv:2501.03747  [pdf, other

    cs.LG cs.CL stat.AP

    Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series

    Authors: Yuxiao Hu, Qian Li, Dongxiao Zhang, Jinyue Yan, Yuntian Chen

    Abstract: Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment but overlook LLMs' inherent strength on natural language processing -- their deep understanding of linguistic logic and structure rather th… ▽ More

    Submitted 5 April, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: no comment

  12. arXiv:2412.05669  [pdf, other

    cs.LG stat.ML

    Detecting outliers by clustering algorithms

    Authors: Qi Li, Shuliang Wang

    Abstract: Clustering and outlier detection are two important tasks in data mining. Outliers frequently interfere with clustering algorithms to determine the similarity between objects, resulting in unreliable clustering results. Currently, only a few clustering algorithms (e.g., DBSCAN) have the ability to detect outliers to eliminate interference. For other clustering algorithms, it is tedious to introduce… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  13. arXiv:2411.00992  [pdf, other

    q-bio.NC stat.CO

    Correlation of Correlation Networks: High-Order Interactions in the Topology of Brain Networks

    Authors: Qiang Li, Jingyu Liu, Vince D. Calhoun

    Abstract: To understand collective network behavior in the complex human brain, pairwise correlation networks alone are insufficient for capturing the high-order interactions that extend beyond pairwise interactions and play a crucial role in brain network dynamics. These interactions often reveal intricate relationships among multiple brain networks, significantly influencing cognitive processes. In this s… ▽ More

    Submitted 5 November, 2024; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 4 pages, 2 figures, 1 table; Submitted to IEEE International Symposium on Biomedical Imaging (ISBI 2025)

  14. arXiv:2411.00982  [pdf, other

    q-bio.NC stat.CO

    The Dynamics of Triple Interactions in Resting fMRI: Insights into Psychotic Disorders

    Authors: Qiang Li, Vince D. Calhoun, Armin Iraji

    Abstract: The human brain dynamically integrated and configured information to adapt to the environment. To capture these changes over time, dynamic second-order functional connectivity was typically used to capture transient brain patterns. However, dynamic second-order functional connectivity typically ignored interactions beyond pairwise relationships. To address this limitation, we utilized dynamic trip… ▽ More

    Submitted 5 November, 2024; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 4 pages, 3 figures; Submitted to IEEE International Symposium on Biomedical Imaging (ISBI 2025)

  15. arXiv:2410.18076  [pdf, other

    cs.LG cs.AI stat.ML

    Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

    Authors: Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine

    Abstract: Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled offline trajectory data can be leveraged to lear… ▽ More

    Submitted 23 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: 27 pages, 19 figures

  16. arXiv:2410.00229  [pdf, other

    stat.ML cs.LG math.OC math.PR

    Stochastic Inverse Problem: stability, regularization and Wasserstein gradient flow

    Authors: Qin Li, Maria Oprea, Li Wang, Yunan Yang

    Abstract: Inverse problems in physical or biological sciences often involve recovering an unknown parameter that is random. The sought-after quantity is a probability distribution of the unknown parameter, that produces data that aligns with measurements. Consequently, these problems are naturally framed as stochastic inverse problems. In this paper, we explore three aspects of this problem: direct inversio… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

  17. arXiv:2409.16308  [pdf, other

    cs.LG eess.SY physics.ao-ph physics.data-an stat.AP

    Probabilistic Spatiotemporal Modeling of Day-Ahead Wind Power Generation with Input-Warped Gaussian Processes

    Authors: Qiqi Li, Mike Ludkovski

    Abstract: We design a Gaussian Process (GP) spatiotemporal model to capture features of day-ahead wind power forecasts. We work with hourly-scale day-ahead forecasts across hundreds of wind farm locations, with the main aim of constructing a fully probabilistic joint model across space and hours of the day. To this end, we design a separable space-time kernel, implementing both temporal and spatial input wa… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 29 pages, 12 figures

  18. arXiv:2408.14410  [pdf, other

    stat.ME

    Generalized Bayesian nonparametric clustering framework for high-dimensional spatial omics data

    Authors: Bencong Zhu, Guanyu Hu, Xiaodan Fan, Qiwei Li

    Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has transformed genomic research by enabling high-throughput gene expression profiling while preserving spatial context. Identifying spatial domains within SRT data is a critical task, with numerous computational approaches currently available. However, most existing methods rely on a multi-stage pro… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  19. arXiv:2407.20288  [pdf, other

    cs.LG stat.ML

    Supervised Learning based Method for Condition Monitoring of Overhead Line Insulators using Leakage Current Measurement

    Authors: Mile Mitrovic, Dmitry Titov, Klim Volkhov, Irina Lukicheva, Andrey Kudryavzev, Petr Vorobev, Qi Li, Vladimir Terzija

    Abstract: As a new practical and economical solution to the aging problem of overhead line (OHL) assets, the technical policies of most power grid companies in the world experienced a gradual transition from scheduled preventive maintenance to a risk-based approach in asset management. Even though the accumulation of contamination is predictable within a certain degree, there are currently no effective ways… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  20. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  21. arXiv:2406.18189  [pdf, other

    stat.ME math.ST

    Functional knockoffs selection with applications to functional data analysis in high dimensions

    Authors: Xinghao Qiao, Mingya Long, Qizhai Li

    Abstract: The knockoffs is a recently proposed powerful framework that effectively controls the false discovery rate (FDR) for variable selection. However, none of the existing knockoff solutions are directly suited to handle multivariate or high-dimensional functional data, which has become increasingly prevalent in various scientific applications. In this paper, we propose a novel functional model-X knock… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  22. arXiv:2406.08209  [pdf, other

    stat.ML cs.LG math.OC

    Forward-Euler time-discretization for Wasserstein gradient flows can be wrong

    Authors: Yewei Xu, Qin Li

    Abstract: In this note, we examine the forward-Euler discretization for simulating Wasserstein gradient flows. We provide two counter-examples showcasing the failure of this discretization even for a simple case where the energy functional is defined as the KL divergence against some nicely structured probability densities. A simple explanation of this failure is also discussed.

    Submitted 12 June, 2024; originally announced June 2024.

    MSC Class: 65M12

  23. arXiv:2405.17079  [pdf, other

    stat.ML cs.LG

    Learning with User-Level Local Differential Privacy

    Authors: Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

    Abstract: User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  24. arXiv:2404.09194  [pdf, other

    stat.ME

    Bayesian modeling of co-occurrence microbial interaction networks

    Authors: Tejasv Bedi, Bencong Zhu, Michael L. Neugent, Kevin C. Lutz, Nicole J. De Nisco, Qiwei Li

    Abstract: The human body consists of microbiomes associated with the development and prevention of several diseases. These microbial organisms form several complex interactions that are informative to the scientific community for explaining disease progression and prevention. Contrary to the traditional view of the microbiome as a singular, assortative network, we introduce a novel statistical approach usin… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 25 pages

  25. arXiv:2403.17670  [pdf, other

    stat.ME

    A family of Chatterjee's correlation coefficients and their properties

    Authors: Muhong Gao, Qizhai Li

    Abstract: Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 27 pages, 4 figures

    MSC Class: 62H20; 62G05

  26. arXiv:2402.15515  [pdf

    cs.AI q-bio.QM stat.AP

    Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

    Authors: Aokun Chen, Qian Li, Yu Huang, Yongqiu Li, Yu-neng Chuang, Xia Hu, Serena Guo, Yonghui Wu, Yi Guo, Jiang Bian

    Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  27. arXiv:2401.09259  [pdf, other

    math.NA math.DS stat.ML

    Mitigating distribution shift in machine learning-augmented hybrid simulation

    Authors: Jiaxi Zhao, Qianxiao Li

    Abstract: We study the problem of distribution shift generally arising in machine-learning augmented hybrid simulation, where parts of simulation algorithms are replaced by data-driven surrogates. We first establish a mathematical framework to understand the structure of machine-learning augmented hybrid simulation problems, and the cause and effect of the associated distribution shift. We show correlations… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    MSC Class: 68T99; 65M15; 37M05

  28. arXiv:2401.04856  [pdf, other

    cs.LG stat.ML

    A Good Score Does not Lead to A Good Generative Model

    Authors: Sixu Li, Shi Chen, Qin Li

    Abstract: Score-based Generative Models (SGMs) is one leading method in generative modeling, renowned for their ability to generate high-quality samples from complex, high-dimensional data distributions. The method enjoys empirical success and is supported by rigorous theoretical convergence properties. In particular, it has been shown that SGMs can generate samples from a distribution that is close to the… ▽ More

    Submitted 27 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  29. arXiv:2401.00521  [pdf, other

    cs.LG cs.AI stat.AP

    Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data

    Authors: Yuxiao Hu, Qian Li, Xiaodan Shi, Jinyue Yan, Yuntian Chen

    Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  30. arXiv:2312.08670  [pdf, other

    stat.ME cs.AI cs.LG

    Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

    Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

    Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages;

  31. arXiv:2312.08324  [pdf, other

    stat.AP

    Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data

    Authors: Bencong Zhu, Guanyu Hu, Yang Xie, Lin Xu, Xiaodan Fan, Qiwei Li

    Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  32. arXiv:2312.07067  [pdf, other

    cs.LG cs.CR cs.CV stat.AP

    Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

    Authors: Qian Li, Yuxiao Hu, Yinpeng Dong, Dongxiao Zhang, Yuntian Chen

    Abstract: Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden h… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  33. arXiv:2312.03967  [pdf, other

    stat.ME

    Test-negative designs with various reasons for testing: statistical bias and solution

    Authors: Mengxin Yu, Tom Hongyi Liu, Kendrick Qijun Li, Nicholas Jewell, Eric Tchetgen Tchetgen, Dylan Small, Xu Shi, Bingkai Wang

    Abstract: Test-negative designs are widely used for post-market evaluation of vaccine effectiveness, particularly in cases when randomized trials are not feasible. Differing from classical test-negative designs where only healthcare-seekers with symptoms are included, recent test-negative designs have involved individuals with various reasons for testing, especially in an outbreak setting. While including t… ▽ More

    Submitted 26 April, 2025; v1 submitted 6 December, 2023; originally announced December 2023.

  34. arXiv:2311.05067  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerating Exploration with Unlabeled Prior Data

    Authors: Qiyang Li, Jason Zhang, Dibya Ghosh, Amy Zhang, Sergey Levine

    Abstract: Learning to solve tasks from a sparse reward signal is a major challenge for standard reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to solve sparse reward tasks entirely from scratch. More often, we might possess prior experience to draw on that provides considerable guidance about which actions and outcomes are possible in the world, which we can use to ex… ▽ More

    Submitted 20 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 25 pages, 16 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  35. arXiv:2310.08867  [pdf

    cs.LG cs.DB stat.ME

    A Survey of Methods for Handling Disk Data Imbalance

    Authors: Shuangshuang Yuan, Peng Wu, Yuehui Chen, Qiang Li

    Abstract: Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalanc… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  36. arXiv:2308.14671  [pdf, other

    stat.ME stat.AP stat.CO stat.OT

    A generalized Bayesian stochastic block model for microbiome community detection

    Authors: Kevin C. Lutz, Michael L. Neugent, Tejasv Bedi, Nicole J. De Nisco, Qiwei Li

    Abstract: Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the microbiome study. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essentia… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  37. arXiv:2307.01389  [pdf, other

    cs.LG stat.ME

    Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

    Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  38. arXiv:2306.01675  [pdf, other

    stat.ME physics.soc-ph

    Bayesian Segmentation Modeling of Epidemic Growth

    Authors: Tejasv Bedi, Yanxun Xu, Qiwei Li

    Abstract: Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compar… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  39. arXiv:2305.08204  [pdf, other

    stat.CO stat.ME

    glmmPen: High Dimensional Penalized Generalized Linear Mixed Models

    Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Joseph G. Ibrahim

    Abstract: Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process since model misspecification may lead to significant bias. However, the joint selection of fixed and random effects has historically been limited to… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  40. arXiv:2305.08201  [pdf, ps, other

    stat.ME stat.CO

    Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

    Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Xianlu L. Peng, Jen Jen Yeh, Joseph G. Ibrahim

    Abstract: Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effec… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  41. arXiv:2304.10466  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Deep Reinforcement Learning Requires Regulating Overfitting

    Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine

    Abstract: Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has bee… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 26 pages, 18 figures, 3 tables, The International Conference on Learning Representations (ICLR) 2023

  42. arXiv:2303.07050  [pdf, other

    stat.AP

    Evaluation of wait time saving effectiveness of triage algorithms

    Authors: Yee Lam Elim Thompson, Gary M Levine, Weijie Chen, Berkman Sahiner, Qin Li, Nicholas Petrick, Jana G Delfino, Miguel A Lago, Qian Cao, Qin Li, Frank W Samuelson

    Abstract: In the past decade, Artificial Intelligence (AI) algorithms have made promising impacts to transform healthcare in all aspects. One application is to triage patients' radiological medical images based on the algorithm's binary outputs. Such AI-based prioritization software is known as computer-aided triage and notification (CADt). Their main benefit is to speed up radiological review of images wit… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  43. arXiv:2212.09160  [pdf, other

    math.OC stat.ME

    Stochastic Economic Dispatch Considering Demand Response and Endogenous Uncertainty

    Authors: Nasrin Bayat, Qifeng Li, Joon-Hyuk Park

    Abstract: This paper considers endogenous uncertainty (EnU) in the stochastic economic dispatch (SED) problem, where the endogenous uncertainty means decision dependent uncertainty. In this problem, demand response (DR) commitment is the source of the EnU. Nevertheless, EnU is not well considered in existing literature. Our first contribution is to build up an optimization model of DR-involved SED under EnU… ▽ More

    Submitted 31 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

  44. arXiv:2212.08771  [pdf, other

    stat.AP cs.LG

    Assign Experiment Variants at Scale in Online Controlled Experiments

    Authors: Qike Li, Samir Jamkhande, Pavel Kochetkov, Pai Liu

    Abstract: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Randomization enables the inference of causality from an A/B test. The randomized assignment maps end users to experiment buckets and balances user characteristics between the groups. Therefore, experiments can attribute any outcome differences between th… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  45. arXiv:2211.03258  [pdf, other

    astro-ph.IM hep-ph physics.data-an stat.CO

    Nested sampling statistical errors

    Authors: Andrew Fowlie, Qiao Li, Huifang Lv, Yecheng Sun, Jia Zhang, Le Zheng

    Abstract: Nested sampling (NS) is a popular algorithm for Bayesian computation. We investigate statistical errors in NS both analytically and numerically. We show two analytic results. First, we show that the leading terms in Skilling's expression using information theory match the leading terms in Keeton's expression from an analysis of moments. This approximate agreement was previously only known numerica… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 12 pages + appendices, 3 figures

  46. arXiv:2210.06025  [pdf, other

    stat.ME math.ST

    Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment

    Authors: Qinmengge Li, Matthew T. Patrick, Haihan Zhang, Chachrit Khunsriraksakul, Philip E. Stuart, Johann E. Gudjonsson, Rajan Nair, James T. Elder, Dajiang J. Liu, Jian Kang, Lam C. Tsoi, Kevin He

    Abstract: Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 35 pages, 6 figures

  47. arXiv:2209.13779  [pdf

    astro-ph.SR stat.ML

    Solar Flare Index Prediction Using SDO/HMI Vector Magnetic Data Products with Statistical and Machine Learning Methods

    Authors: Hewei Zhang, Qin Li, Yanxing Yang, Ju Jing, Jason T. L. Wang, Haimin Wang, Zuofeng Shang

    Abstract: Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections (CMEs). They are the most important sources of space weather effects, that can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M-and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and… ▽ More

    Submitted 1 December, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Journal ref: The Astrophysical Journal Supplement Series (2022), Volume 263, Number 2

  48. arXiv:2209.12388  [pdf, other

    stat.ME stat.AP

    Joint and Individual Component Regression

    Authors: Peiyao Wang, Haodong Wang, Quefeng Li, Dinggang Shen, Yufeng Liu

    Abstract: Multi-group data are commonly seen in practice. Such data structure consists of data from multiple groups and can be challenging to analyze due to data heterogeneity. We propose a novel Joint and Individual Component Regression (JICO) model to analyze multi-group data. In particular, our proposed model decomposes the response into shared and group-specific components, which are driven by low-rank… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

  49. arXiv:2208.02246  [pdf, other

    cs.LG cs.AI stat.ML

    AdaCat: Adaptive Categorical Discretization for Autoregressive Models

    Authors: Qiyang Li, Ajay Jain, Pieter Abbeel

    Abstract: Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio. Most state-of-the-art models discretize continuous data into several bins and use categorical distributions over the bins to approximate the continuous data distribution. The advantage is that the categorical distribution can easily expre… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Uncertainty in Artificial Intelligence (UAI) 2022 13 pages, 4 figures

  50. arXiv:2208.01237  [pdf, ps, other

    stat.ME

    Doubly Robust Proximal Causal Inference under Confounded Outcome-Dependent Sampling

    Authors: Kendrick Qijun Li, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: Unmeasured confounding and selection bias are often of concern in observational studies and may invalidate a causal analysis if not appropriately accounted for. Under outcome-dependent sampling, a latent factor that has causal effects on the treatment, outcome, and sample selection process may cause both unmeasured confounding and selection bias, rendering standard causal parameters unidentifiable… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 43 pages, 1 figure