Skip to main content

Showing 1–50 of 152 results for author: Wu, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.14762  [pdf, ps, other

    stat.AP cs.LG cs.RO

    Markov Regime-Switching Intelligent Driver Model for Interpretable Car-Following Behavior

    Authors: Chengyuan Zhang, Cathy Wu, Lijun Sun

    Abstract: Accurate and interpretable car-following models are essential for traffic simulation and autonomous vehicle development. However, classical models like the Intelligent Driver Model (IDM) are fundamentally limited by their parsimonious and single-regime structure. They fail to capture the multi-modal nature of human driving, where a single driving state (e.g., speed, relative speed, and gap) can el… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  2. arXiv:2505.12209  [pdf, ps, other

    stat.ME

    Estimation of Treatment Harm Rate via Partitioning

    Authors: Wei Liang, Changbao Wu

    Abstract: In causal inference with binary outcomes, there is a growing interest in estimation of treatment harm rate (THR), which is a measure of treatment risk and reveals treatment effect heterogeneity in a subpopulation. The THR is generally non-identifiable even for randomized controlled trials (RCTs), and existing works focus primarily on the estimation of the THR under either untestable identification… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 38 pages, 4 figures

  3. arXiv:2505.03076  [pdf, other

    stat.AP

    Statistical Performance of Generalized Direction Detectors with Known Spatial Steering Vector

    Authors: Zhenyu Xu, Weijian Liu, Changfei Wu, Qinglei Du, Jun Liu

    Abstract: The generalized direction detection (GDD) problem involves determining the presence of a signal of interest within matrix-valued data, where the row and column spaces of the signal (if present) are known, but the speciffc coordinates are unknown. Many detectors have been proposed for GDD, yet there is a lack of analytical results regarding their statistical detection performance. This paper presen… ▽ More

    Submitted 7 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: 4 pages,2 figures.This manuscript is accepted by Signal Processing Letters

  4. arXiv:2504.11372  [pdf, other

    physics.soc-ph eess.SY stat.AP

    A Review of Stop-and-Go Traffic Wave Suppression Strategies: Variable Speed Limit vs. Jam-Absorption Driving

    Authors: Zhengbing He, Jorge Laval, Yu Han, Andreas Hegyi, Ryosuke Nishi, Cathy Wu

    Abstract: The main form of freeway traffic congestion is the familiar stop-and-go wave, characterized by wide moving jams that propagate indefinitely upstream provided enough traffic demand. They cause severe, long-lasting adverse effects, such as reduced traffic efficiency, increased driving risks, and higher vehicle emissions. This underscores the crucial importance of artificial intervention in the propa… ▽ More

    Submitted 20 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  5. arXiv:2410.19217  [pdf, ps, other

    cs.LG cs.AI stat.ML

    No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

    Authors: Changlong Wu, Ananth Grama, Wojciech Szpankowski

    Abstract: Generative models have shown impressive capabilities in synthesizing high-quality outputs across various domains. However, a persistent challenge is the occurrence of "hallucinations", where the model produces outputs that are plausible but invalid. While empirical strategies have been explored to mitigate this issue, a rigorous theoretical understanding remains elusive. In this paper, we develop… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Journal ref: International Conference on Learning Representations (ICLR 2025). URL: https://openreview.net/pdf?id=OwNoTs2r8e

  6. arXiv:2410.02920  [pdf, ps, other

    stat.ME math.ST

    Statistical Inference with Nonignorable Non-Probability Survey Samples

    Authors: Yang Liu, Meng Yuan, Pengfei Li, Changbao Wu

    Abstract: Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature, however, assumes that the participation mechanism for non-probability samples is ignorable. In this paper, we develop a pseudo-likelihood approach to estimate pa… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 34 pages, 0 figure

  7. arXiv:2408.04569  [pdf, other

    cs.LG cs.NE math.AG stat.ML

    Activation degree thresholds and expressiveness of polynomial neural networks

    Authors: Bella Finkel, Jose Israel Rodriguez, Chenxi Wu, Thomas Yahl

    Abstract: We study the expressive power of deep polynomial neural networks through the geometry of their neurovariety. We introduce the notion of the activation degree threshold of a network architecture to express when the dimension of the neurovariety achieves its theoretical maximum. We prove the existence of the activation degree threshold for all polynomial neural networks without width-one bottlenecks… ▽ More

    Submitted 24 April, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 24 pages, 1 figure

  8. arXiv:2406.14753  [pdf, other

    cs.LG stat.ME

    A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

    Authors: Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain

    Abstract: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a spe… ▽ More

    Submitted 27 November, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2405.07397  [pdf, other

    stat.ME

    The Spike-and-Slab Quantile LASSO for Robust Variable Selection in Cancer Genomics Studies

    Authors: Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu

    Abstract: Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the non-robust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  10. arXiv:2405.05294  [pdf, other

    cs.HC cs.CL cs.IT cs.LG cs.SC stat.ML

    Harmonizing Program Induction with Rate-Distortion Theory

    Authors: Hanqi Zhou, David G. Nagy, Charley M. Wu

    Abstract: Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: CogSci 2024

  11. arXiv:2404.04794  [pdf, other

    stat.ME

    A Deep Learning Approach to Nonparametric Propensity Score Estimation with Optimized Covariate Balance

    Authors: Maosen Peng, Yan Li, Chong Wu, Liang Li

    Abstract: This paper proposes a novel propensity score weighting analysis. We define two sufficient and necessary conditions for a function of the covariates to be the propensity score. The first is "local balance", which ensures the conditional independence of covariates and treatment assignment across a dense grid of propensity score values. The second condition, "local calibration", guarantees that a bal… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Corresponding author: Chong Wu (Email: [email protected]) and Liang Li (Email: [email protected])

  12. arXiv:2403.16283  [pdf, other

    stat.ME

    Sample Empirical Likelihood Methods for Causal Inference

    Authors: Jingyue Huang, Changbao Wu, Leilei Zeng

    Abstract: Causal inference is crucial for understanding the true impact of interventions, policies, or actions, enabling informed decision-making and providing insights into the underlying mechanisms that shape our world. In this paper, we establish a framework for the estimation and inference of average treatment effects using a two-sample empirical likelihood function. Two different approaches to incorpor… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  13. arXiv:2403.13179  [pdf, other

    cs.LG cs.CY stat.ML

    Predictive, scalable and interpretable knowledge tracing on structured domains

    Authors: Hanqi Zhou, Robert Bamler, Charley M. Wu, Álvaro Tejero-Cantero

    Abstract: Intelligent tutoring systems optimize the selection and timing of learning materials to enhance understanding and long-term retention. This requires estimates of both the learner's progress (''knowledge tracing''; KT), and the prerequisite structure of the learning domain (''knowledge mapping''). While recent deep learning models achieve high KT accuracy, they do so at the expense of the interpret… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  14. arXiv:2402.01342  [pdf, other

    cs.LG stat.ML

    Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion

    Authors: Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu

    Abstract: In deep learning, stochastic gradient descent often yields functionally similar yet widely scattered solutions in the weight space even under the same initialization, causing barriers in the Linear Mode Connectivity (LMC) landscape. Overcoming these barriers is crucial for understanding deep learning dynamics and enhancing model-fusion algorithms. Previous studies highlight the role of permutation… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: preprint

  15. arXiv:2401.15520  [pdf, ps, other

    cs.LG stat.ML

    Oracle-Efficient Hybrid Online Learning with Unknown Distribution

    Authors: Changlong Wu, Jin Sima, Wojciech Szpankowski

    Abstract: We study the problem of oracle-efficient hybrid online learning when the features are generated by an unknown i.i.d. process and the labels are generated adversarially. Assuming access to an (offline) ERM oracle, we show that there exists a computationally efficient online predictor that achieves a regret upper bounded by $\tilde{O}(T^{\frac{3}{4}})$ for a finite-VC class, and upper bounded by… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Journal ref: Published at Conference on Learning Theory (COLT 2024). URL: https://proceedings.mlr.press/v247/wu24a.html

  16. arXiv:2401.06919  [pdf, other

    stat.ME

    Pseudo-Empirical Likelihood Methods for Causal Inference

    Authors: Jingyue Huang, Changbao Wu, Leilei Zeng

    Abstract: Causal inference problems have remained an important research topic over the past several decades due to their general applicability in assessing a treatment effect in many different real-world settings. In this paper, we propose two inferential procedures on the average treatment effect (ATE) through a two-sample pseudo-empirical likelihood (PEL) approach. The first procedure uses the estimated p… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  17. arXiv:2312.10563  [pdf, other

    stat.ME math.ST

    Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration

    Authors: Rita Qiuran Lyu, Chong Wu, Xinwei Ma, Jingshen Wang

    Abstract: Mediation analysis is a powerful tool for studying causal pathways between exposure, mediator, and outcome variables of interest. While classical mediation analysis using observational data often requires strong and sometimes unrealistic assumptions, such as unconfoundedness, Mendelian Randomization (MR) avoids unmeasured confounding bias by employing genetic variations as instrumental variables.… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

  18. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  19. arXiv:2309.04957  [pdf, other

    stat.ME

    Winner's Curse Free Robust Mendelian Randomization with Summary Data

    Authors: Zhongming Xie, Wanheng Zhang, Jingshen Wang, Chong Wu

    Abstract: In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may sti… ▽ More

    Submitted 16 August, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

  20. arXiv:2307.15268  [pdf, other

    stat.ME

    Multivariate Differential Association Analysis

    Authors: Hoseung Song, Michael C. Wu

    Abstract: Identifying how dependence relationships vary across different conditions plays a significant role in many scientific investigations. For example, it is important for the comparison of biological systems to see if relationships between genomic features differ between cases and controls. In this paper, we seek to evaluate whether the relationships between two sets of variables is different across t… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  21. arXiv:2306.15622  [pdf, other

    stat.ME stat.AP

    Biclustering random matrix partitions with an application to classification of forensic body fluids

    Authors: Chieh-Hsi Wu, Amy D. Roeder, Geoff K. Nicholls

    Abstract: Classification of unlabeled data is usually achieved by supervised learning from labeled samples. Although there exist many sophisticated supervised machine learning methods that can predict the missing labels with a high level of accuracy, they often lack the required transparency in situations where it is important to provide interpretable results and meaningful measures of confidence. Body flui… ▽ More

    Submitted 14 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: 23 pages and 4 figures (paper); 22 pages and 6 figures (supplement); revision adds model comparisons

    MSC Class: 62F15 (Primary) 62P10 (Secondary)

  22. arXiv:2306.11880  [pdf, other

    stat.ME

    The Bayesian Regularized Quantile Varying Coefficient Model

    Authors: Fei Zhou, Jie Ren, Shuangge Ma, Cen Wu

    Abstract: The quantile varying coefficient (VC) model can flexibly capture dynamical patterns of regression coefficients. In addition, due to the quantile check loss function, it is robust against outliers and heavy-tailed distributions of the response variable, and can provide a more comprehensive picture of modeling via exploring the conditional quantiles of the response variable. Although extensive studi… ▽ More

    Submitted 9 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  23. arXiv:2306.07480  [pdf, other

    stat.ME

    ACE: Active Learning for Causal Inference with Expensive Experiments

    Authors: Difan Song, Simon Mak, C. F. Jeff Wu

    Abstract: Experiments are the gold standard for causal inference. In many applications, experimental units can often be recruited or chosen sequentially, and the adaptive execution of such experiments may offer greatly improved inference of causal quantities over non-adaptive approaches, particularly when experiments are expensive. We thus propose a novel active learning method called ACE (Active learning f… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 6 pages, 4 figures

  24. arXiv:2302.10470  [pdf, other

    stat.ME math.ST

    Breaking the Winner's Curse in Mendelian Randomization: Rerandomized Inverse Variance Weighted Estimator

    Authors: Xinwei Ma, Jingshen Wang, Chong Wu

    Abstract: Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increasingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often lead… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  25. arXiv:2302.08076  [pdf, ps, other

    stat.ME

    Augmented two-step estimating equations with nuisance functionals and complex survey data

    Authors: Puying Zhao, Changbao Wu

    Abstract: Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generaliz… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: 43 pages

  26. arXiv:2211.14671  [pdf, other

    stat.ME stat.AP

    Efficient Targeted Learning of Heterogeneous Treatment Effects for Multiple Subgroups

    Authors: Waverly Wei, Maya Petersen, Mark J van der Laan, Zeyu Zheng, Chong Wu, Jingshen Wang

    Abstract: In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treat… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: Accepted by Biometrics 2022

  27. arXiv:2211.00819  [pdf

    cs.LG cs.LO q-bio.QM stat.AP

    Interpretable estimation of the risk of heart failure hospitalization from a 30-second electrocardiogram

    Authors: Sergio González, Wan-Ting Hsieh, Davide Burba, Trista Pei-Chun Chen, Chun-Li Wang, Victor Chien-Chia Wu, Shang-Hung Chang

    Abstract: Survival modeling in healthcare relies on explainable statistical models; yet, their underlying assumptions are often simplistic and, thus, unrealistic. Machine learning models can estimate more complex relationships and lead to more accurate predictions, but are non-interpretable. This study shows it is possible to estimate hospitalization for congestive heart failure by a 30 seconds single-lead… ▽ More

    Submitted 4 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: 4 pages, 4 figures

  28. arXiv:2210.13512  [pdf, other

    cs.LG cs.AI cs.CV math.OC stat.ML

    Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

    Authors: Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

    Abstract: Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain so… ▽ More

    Submitted 4 November, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 37 pages, 2 figures, ICML 2023, minor corrections in latest version

  29. arXiv:2209.13748  [pdf, other

    stat.ME

    Conglomerate Multi-Fidelity Gaussian Process Modeling, with Application to Heavy-Ion Collisions

    Authors: Yi Ji, Henry Shaowu Yuchi, Derek Soeder, J. -F. Paquet, Steffen A. Bass, V. Roshan Joseph, C. F. Jeff Wu, Simon Mak

    Abstract: In an era where scientific experimentation is often costly, multi-fidelity emulation provides a powerful tool for predictive scientific computing. While there has been notable work on multi-fidelity modeling, existing models do not incorporate an important "conglomerate" property of multi-fidelity simulators, where the accuracies of different simulator components are controlled by different fideli… ▽ More

    Submitted 28 September, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  30. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  31. arXiv:2205.06960  [pdf, other

    stat.AP stat.ME

    Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data

    Authors: Xinzhou Guo, Waverly Wei, Molei Liu, Tianxi Cai, Chong Wu, Jingshen Wang

    Abstract: There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In th… ▽ More

    Submitted 21 October, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 25 pages, 2 figures, 5 tables

  32. arXiv:2203.11740  [pdf, other

    cs.NE cs.LG stat.ML

    The Deep Learning model of Higher-Lower-Order Cognition, Memory, and Affection- More General Than KAN

    Authors: Jun-Bo Tao, Bai-Qing Sun, Wei-Dong Zhu, Shi-You Qu, Jia-Qiang Li, Guo-Qi Li, Yan-Yan Wang, Ling-Kun Chen, Chong Wu, Yu Xiong, Jiaxuan Zhou

    Abstract: We firstly simulated disease dynamics by KAN (Kolmogorov-Arnold Networks) nearly 4 years ago, but the kernel functions in the edge include the exponential number of infected and discharged people and is also in line with the Kolmogorov-Arnold representation theorem, and the shared weights in the edge are the infection rate and cure rate, and used activation function by tanh at the node of edge. An… ▽ More

    Submitted 1 June, 2024; v1 submitted 19 March, 2022; originally announced March 2022.

  33. arXiv:2202.07939  [pdf, other

    cs.LG eess.SP stat.AP

    Clustering Enabled Few-Shot Load Forecasting

    Authors: Qiyuan Wang, Zhihui Chen, Chenye Wu

    Abstract: While the advanced machine learning algorithms are effective in load forecasting, they often suffer from low data utilization, and hence their superior performance relies on massive datasets. This motivates us to design machine learning algorithms with improved data utilization. Specifically, we consider the load forecasting for a new user in the system by observing only few shots (data points) of… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: *The first two authors contributed equally to this work, and hence are co-first authors of this work. C. Wu is the corresponding author. This work was supported in part by the Shenzhen Institute of Artificial Intelligence and Robotics for Society

  34. arXiv:2202.01210   

    stat.ML cs.LG math.ST

    Deep Layer-wise Networks Have Closed-Form Weights

    Authors: Chieh Wu, Aria Masoomi, Arthur Gretton, Jennifer Dy

    Abstract: There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP). To better mimic the brain, training a network \textit{one layer at a time} with only a "single forward pass" has been proposed as an alternative to bypass BP; we refer to these networks as "layer-wise" networks. We continue the work on layer-wise networks by answering two… ▽ More

    Submitted 7 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: Since this version is similar to an older version, I should have updated the older version instead of creating a new version. I will now retract this version, and update a previous version to this. See arXiv:2006.08539

    Journal ref: AIStats 2022

  35. arXiv:2201.09706  [pdf, other

    stat.ME

    Valid belief updates for prequentially additive loss functions arising in Semi-Modular Inference

    Authors: Geoff K. Nicholls, Jeong Eun Lee, Chieh-Hsi Wu, Chris U. Carmona

    Abstract: Model-based Bayesian evidence combination leads to models with multiple parameteric modules. In this setting the effects of model misspecification in one of the modules may in some cases be ameliorated by cutting the flow of information from the misspecified module. Semi-Modular Inference (SMI) is a framework allowing partial cuts which modulate but do not completely cut the flow of information be… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: 39 pages including supplement, 6 figures

    MSC Class: 62C10; 62C10 (Primary) 62F35; 65C05 (Secondary)

  36. arXiv:2201.02702  [pdf

    math.DS cs.LG math.OC stat.AP stat.ME

    An Improved Mathematical Model of Sepsis: Modeling, Bifurcation Analysis, and Optimal Control Study for Complex Nonlinear Infectious Disease System

    Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

    Abstract: Sepsis is a life-threatening medical emergency, which is a major cause of death worldwide and the second highest cause of mortality in the United States. Researching the optimal control treatment or intervention strategy on the comprehensive sepsis system is key in reducing mortality. For this purpose, first, this paper improves a complex nonlinear sepsis model proposed in our previous work. Then,… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: 25 pages, 7 figures, 1 table

  37. arXiv:2201.00147  [pdf

    cs.LG math.OC stat.AP stat.ME

    High-dimensional Bayesian Optimization Algorithm with Recurrent Neural Network for Disease Control Models in Time Series

    Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

    Abstract: Bayesian Optimization algorithm has become a promising approach for nonlinear global optimization problems and many machine learning applications. Over the past few years, improvements and enhancements have been brought forward and they have shown some promising results in solving the complex dynamic problems, systems of ordinary differential equations where the objective functions are computation… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

    Comments: 16 pages, 9 figures, 2 tables

  38. arXiv:2109.01785  [pdf, other

    cs.LG cs.SI stat.ML

    Node Feature Kernels Increase Graph Convolutional Network Robustness

    Authors: Mohamed El Amine Seddik, Changmin Wu, Johannes F. Lutzeyer, Michalis Vazirgiannis

    Abstract: The robustness of the much-used Graph Convolutional Networks (GCNs) to perturbations of their input is becoming a topic of increasing importance. In this paper, the random GCN is introduced for which a random matrix theory analysis is possible. This analysis suggests that if the graph is sufficiently perturbed, or in the extreme case random, then the GCN fails to benefit from the node features. It… ▽ More

    Submitted 21 February, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: 16 pages, 5 figures

  39. arXiv:2108.07301  [pdf, other

    cs.LG stat.AP

    Understanding the factors driving the opioid epidemic using machine learning

    Authors: Sachin Gavali, Chuming Chen, Julie Cowart, Xi Peng, Shanshan Ding, Cathy Wu, Tammy Anderson

    Abstract: In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors… ▽ More

    Submitted 6 December, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Accepted to IEEE International Conference on Bioinformatics & Biomedicine 2021

  40. arXiv:2108.02289  [pdf

    cs.LG math.OC stat.AP stat.ME

    High dimensional Bayesian Optimization Algorithm for Complex System in Time Series

    Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

    Abstract: At present, high-dimensional global optimization problems with time-series models have received much attention from engineering fields. Since it was proposed, Bayesian optimization has quickly become a popular and promising approach for solving global optimization problems. However, the standard Bayesian optimization algorithm is insufficient to solving the global optimal solution when the model i… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: 18 pages, 13 figures

  41. arXiv:2108.00062  [pdf

    stat.ME math.OC stat.AP

    A New Bayesian Optimization Algorithm for Complex High-Dimensional Disease Epidemic Systems

    Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

    Abstract: This paper presents an Improved Bayesian Optimization (IBO) algorithm to solve complex high-dimensional epidemic models' optimal control solution. Evaluating the total objective function value for disease control models with hundreds of thousands of control time periods is a high computational cost. In this paper, we improve the conventional Bayesian Optimization (BO) approach from two parts. The… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

    Comments: 17 pages, 14 figures

  42. arXiv:2107.08533  [pdf, other

    stat.ME

    Sparse group variable selection for gene-environment interactions in the longitudinal study

    Authors: Fei Zhou, Xi Lu, Jie Ren, Kun Fan, Shuangge Ma, Cen Wu

    Abstract: Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating struc… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  43. arXiv:2106.03748  [pdf, other

    cs.LG cs.AI cs.NE cs.RO stat.ML

    Towards robust and domain agnostic reinforcement learning competitions

    Authors: William Hebgen Guss, Stephanie Milani, Nicholay Topin, Brandon Houghton, Sharada Mohanty, Andrew Melnik, Augustin Harter, Benoit Buschmaas, Bjarne Jaster, Christoph Berganski, Dennis Heitkamp, Marko Henning, Helge Ritter, Chengjie Wu, Xiaotian Hao, Yiming Lu, Hangyu Mao, Yihuan Mao, Chao Wang, Michal Opanowicz, Anssi Kanervisto, Yanick Schraner, Christian Scheller, Xiren Zhou, Lu Liu , et al. (4 additional authors not shown)

    Abstract: Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 20 pages, several figures, published PMLR

  44. arXiv:2106.02741  [pdf, ps, other

    math.ST stat.ME

    Semiparametric inference on Gini indices of two semicontinuous populations under density ratio models

    Authors: Meng Yuan, Pengfei Li, Changbao Wu

    Abstract: The Gini index is a popular inequality measure with many applications in social and economic studies. This paper studies semiparametric inference on the Gini indices of two semicontinuous populations. We characterize the distribution of each semicontinuous population by a mixture of a discrete point mass at zero and a continuous skewed positive component. A semiparametric density ratio model is th… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 49 pages, 1 figure

  45. arXiv:2102.13232  [pdf, ps, other

    math.ST stat.ME

    Semiparametric empirical likelihood inference with estimating equations under density ratio models

    Authors: Meng Yuan, Pengfei Li, Changbao Wu

    Abstract: The density ratio model (DRM) provides a flexible and useful platform for combining information from multiple sources. In this paper, we consider statistical inference under two-sample DRMs with additional parameters defined through and/or additional auxiliary information expressed as estimating equations. We examine the asymptotic properties of the maximum empirical likelihood estimators (MELEs)… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  46. arXiv:2102.11772  [pdf, other

    stat.ME

    Identifying Gene-environment interactions with robust marginal Bayesian variable selection

    Authors: Xi Lu, Kun Fan, Jie Ren, Cen Wu

    Abstract: In high-throughput genetics studies, an important aim is to identify gene-environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G$\times$E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

  47. arXiv:2102.11338  [pdf, other

    stat.ME

    Sharp Inference on Selected Subgroups in Observational Studies

    Authors: Xinzhou Guo, Linqing Wei, Chong Wu, Jingshen Wang

    Abstract: In modern drug development, the broader availability of high-dimensional observational data provides opportunities for scientist to explore subgroup heterogeneity, especially when randomized clinical trials are unavailable due to cost and ethical constraints. However, a common practice that naively searches the subgroup with a high treatment level is often misleading due to the "subgroup selection… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  48. arXiv:2101.06592  [pdf, other

    stat.ME cs.LG

    TSEC: a framework for online experimentation under experimental constraints

    Authors: Simon Mak, Yuanshuo Zhou, Lavonne Hoang, C. F. Jeff Wu

    Abstract: Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  49. arXiv:2101.02280  [pdf

    stat.AP

    Independent Action Models and Prediction of Combination Treatment Effects for Response Rate, Duration of Response and Tumor Size Change in Oncology Drug Development

    Authors: Linda Z. Sun, Cai, Wu, Xiaoyun, Li, Cong Chen, Emmett V. Schmidt

    Abstract: An unprecedented number of new cancer targets are in development, and most are being developed in combination therapies. Early oncology development is strategically challenged in choosing the best combinations to move forward to late stage development. The most common early endpoints to be assessed in such decision-making include objective response rate, duration of response and tumor size change.… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  50. arXiv:2012.11798  [pdf, other

    stat.ME cs.LG

    APIK: Active Physics-Informed Kriging Model with Partial Differential Equations

    Authors: Jialei Chen, Zhehui Chen, Chuck Zhang, C. F. Jeff Wu

    Abstract: Kriging (or Gaussian process regression) is a popular machine learning method for its flexibility and closed-form prediction expressions. However, one of the key challenges in applying kriging to engineering systems is that the available measurement data is scarce due to the measurement limitations and high sensing costs. On the other hand, physical knowledge of the engineering system is often ava… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.