Skip to main content

Showing 1–23 of 23 results for author: Liang, W

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.12209  [pdf, ps, other

    stat.ME

    Estimation of Treatment Harm Rate via Partitioning

    Authors: Wei Liang, Changbao Wu

    Abstract: In causal inference with binary outcomes, there is a growing interest in estimation of treatment harm rate (THR), which is a measure of treatment risk and reveals treatment effect heterogeneity in a subpopulation. The THR is generally non-identifiable even for randomized controlled trials (RCTs), and existing works focus primarily on the estimation of the THR under either untestable identification… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 38 pages, 4 figures

  2. arXiv:2411.06184  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Alleviating Hyperparameter-Tuning Burden in SVM Classifiers for Pulmonary Nodules Diagnosis with Multi-Task Bayesian Optimization

    Authors: Wenhao Chi, Haiping Liu, Hongqiao Dong, Wenhua Liang, Bo Liu

    Abstract: In the field of non-invasive medical imaging, radiomic features are utilized to measure tumor characteristics. However, these features can be affected by the techniques used to discretize the images, ultimately impacting the accuracy of diagnosis. To investigate the influence of various image discretization methods on diagnosis, it is common practice to evaluate multiple discretization strategies… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: 12 pages, 4 figures, 37 references

  3. arXiv:2410.18321  [pdf, other

    cs.LG cs.CV stat.ML

    Calibrating Deep Neural Network using Euclidean Distance

    Authors: Wenhao Liang, Chang Dong, Liangwei Zheng, Zhengyang Li, Wei Zhang, Weitong Chen

    Abstract: Uncertainty is a fundamental aspect of real-world scenarios, where perfect information is rarely available. Humans naturally develop complex internal models to navigate incomplete data and effectively respond to unforeseen or partially observed events. In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. However, it does not gu… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  4. arXiv:2407.13302  [pdf, other

    stat.ME

    Non-zero block selector: A linear correlation coefficient measure for blocking-selection models

    Authors: Weixiong Liang, Yuehan Yang

    Abstract: Multiple-group data is widely used in genomic studies, finance, and social science. This study investigates a block structure that consists of covariate and response groups. It examines the block-selection problem of high-dimensional models with group structures for both responses and covariates, where both the number of blocks and the dimension within each block are allowed to grow larger than th… ▽ More

    Submitted 26 December, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2404.13707  [pdf, other

    stat.ME stat.AP

    Robust inference for the unification of confidence intervals in meta-analysis

    Authors: Wei Liang, Haicheng Huang, Hongsheng Dai, Yinghui Wei

    Abstract: Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation un… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  6. arXiv:2312.06478  [pdf, other

    stat.ME stat.ML

    Prediction De-Correlated Inference: A safe approach for post-prediction inference

    Authors: Feng Gan, Wanfeng Liang, Changliang Zou

    Abstract: In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabeled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called Prediction De-Correlated Inference (PD… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  7. arXiv:2311.13767  [pdf, other

    stat.ME

    Hierarchical False Discovery Rate Control for High-dimensional Survival Analysis with Interactions

    Authors: Weijuan Liang, Qingzhao Zhang, Shuangge Ma

    Abstract: With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a lo… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  8. arXiv:2308.05027  [pdf, other

    q-bio.BM cs.LG stat.ML

    AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies

    Authors: Karolis Martinkus, Jan Ludwiczak, Kyunghyun Cho, Wei-Ching Liang, Julien Lafrance-Vanasse, Isidro Hotzel, Arvind Rajpal, Yan Wu, Richard Bonneau, Vladimir Gligorijevic, Andreas Loukas

    Abstract: We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 July, 2023; originally announced August 2023.

    Comments: NeurIPS 2023

  9. arXiv:2306.10577  [pdf, other

    cs.LG stat.ML

    OpenDataVal: a Unified Benchmark for Data Valuation

    Authors: Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon

    Abstract: Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unifie… ▽ More

    Submitted 13 October, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 25 pages, NeurIPS 2023 Track on Datasets and Benchmarks

  10. arXiv:2301.03705  [pdf, ps, other

    stat.ME

    Locally sparse quantile estimation for a partially functional interaction model

    Authors: Weijuan Liang, Qingzhao Zhang, Shuangge Ma

    Abstract: Functional data analysis has been extensively conducted. In this study, we consider a partially functional model, under which some covariates are scalars and have linear effects, while some other variables are functional and have unspecified nonlinear effects. Significantly advancing from the existing literature, we consider a model with interactions between the functional and scalar covariates. T… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: 24 pages, 5 figures

    MSC Class: 62R10 ACM Class: G.3

  11. arXiv:2208.06567  [pdf, other

    stat.ME stat.CO

    A sequential stepwise screening procedure for sparse recovery in high-dimensional multiresponse models with complex group structures

    Authors: Weixiong Liang, Yuehan Yang

    Abstract: Multiresponse data with complex group structures in both responses and predictors arises in many fields, yet, due to the difficulty in identifying complex group structures, only a few methods have been studied on this problem. We propose a novel algorithm called sequential stepwise screening procedure (SeSS) for feature selection in high-dimensional multiresponse models with complex group structur… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.

  12. arXiv:2208.04369  [pdf, other

    cs.LG cs.CV math.ST stat.ML

    Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

    Authors: Guangcong Wang, Guangrun Wang, Wenqi Liang, Jianhuang Lai

    Abstract: We present a weight similarity measure method that can quantify the weight similarity of non-convex neural networks. To understand the weight similarity of different trained models, we propose to extract the feature representation from the weights of neural networks. We first normalize the weights of neural networks by introducing a chain normalization rule, which is used for weight representation… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

    Comments: Weight Similarity of Neural Networks

  13. arXiv:2204.12038  [pdf, other

    stat.ME stat.CO stat.ML

    Confidence Band Estimation for Survival Random Forests

    Authors: Sarah Elizabeth Formentini, Wei Liang, Ruoqing Zhu

    Abstract: Survival random forest is a popular machine learning tool for modeling censored survival data. However, there is currently no statistically valid and computationally feasible approach for estimating its confidence band. This paper proposes an unbiased confidence band estimation by extending recent developments in infinite-order incomplete U-statistics. The idea is to estimate the variance-covarian… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  14. arXiv:2108.12504  [pdf, ps, other

    stat.ME

    Statistical methods for Mendelian models with multiple genes and cancers

    Authors: Jane W. Liang, Gregory E. Idos, Christine Hong, Stephen B. Gruber, Giovanni Parmigiani, Danielle Braun

    Abstract: Risk evaluation to identify individuals who are at greater risk of cancer as a result of heritable pathogenic variants is a valuable component of individualized clinical management. Using principles of Mendelian genetics, Bayesian probability theory, and variant-specific knowledge, Mendelian models derive the probability of carrying a pathogenic variant and developing cancer in the future, based o… ▽ More

    Submitted 7 May, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 18 pages, 7 figures

  15. arXiv:2010.13011  [pdf, other

    stat.AP

    PanelPRO: A R package for multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer

    Authors: Gavin Lee, Qing Zhang, Jane W. Liang, Theodore Huang, Christine Choirat, Giovanni Parmigiani, Danielle Braun

    Abstract: Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models to identify these individuals focus on specific syndromes by including family and personal history for a small number of cancers. Recent evidence from multi-gene panel testing has shown that many syndromes on… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

  16. arXiv:2010.01772  [pdf, other

    stat.ME

    Empirical Likelihood-Based Estimation and Inference in Randomized Controlled Trials with High-Dimensional Covariates

    Authors: Wei Liang, Ying Yan

    Abstract: In this paper, we propose a data-adaptive empirical likelihood-based approach for treatment effect estimation and inference, which overcomes the obstacle of the traditional empirical likelihood-based approaches in the high-dimensional setting by adopting penalized regression and machine learning methods to model the covariate-outcome relationship. In particular, we show that our procedure successf… ▽ More

    Submitted 11 December, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: 37 pages, 3 figures

  17. arXiv:2008.13539  [pdf, other

    cs.LG stat.ML

    Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix

    Authors: Weixuan Liang, Sihang Zhou, Jian Xiong, Xinwang Liu, Siwei Wang, En Zhu, Zhiping Cai, Xin Xu

    Abstract: Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data by performing clustering on the learned optimal embedding across views. Though demonstrating promising performance in various applications, most of existing methods usually linearly combine a group of pre-specified first-order Laplacian matrices to construct the optimal Laplacian matrix, which may resu… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

  18. arXiv:2008.12989  [pdf, ps, other

    stat.ME

    Empirical Likelihood Weighted Estimation of Average Treatment Effects

    Authors: Yuanyao Tan, Xialing Wen, Wei Liang, Ying Yan

    Abstract: There has been growing attention on how to effectively and objectively use covariate information when the primary goal is to estimate the average treatment effect (ATE) in randomized clinical trials (RCTs). In this paper, we propose an effective weighting approach to extract covariate information based on the empirical likelihood (EL) method. The resulting two-sample empirical likelihood weighted… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

  19. arXiv:2006.03246  [pdf, other

    stat.ME

    Integrative Sparse Partial Least Squares

    Authors: Weijuan Liang, Shuangge Ma, Qingzhao Zhang, Tingyu Zhu

    Abstract: Partial least squares, as a dimension reduction method, has become increasingly important for its ability to deal with problems with a large number of variables. Since noisy variables may weaken the performance of the model, the sparse partial least squares (SPLS) technique has been proposed to identify important variables and generate more interpretable results. However, the small sample size of… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

  20. arXiv:2001.00576  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    DAWSON: A Domain Adaptive Few Shot Generation Framework

    Authors: Weixin Liang, Zixuan Liu, Can Liu

    Abstract: Training a Generative Adversarial Networks (GAN) for a new domain from scratch requires an enormous amount of training data and days of training time. To this end, we propose DAWSON, a Domain Adaptive FewShot Generation FrameworkFor GANs based on meta-learning. A major challenge of applying meta-learning GANs is to obtain gradients for the generator from evaluating it on development sets due to th… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

  21. arXiv:1907.11943  [pdf, other

    cs.LG cs.CV stat.ML

    Learnable Parameter Similarity

    Authors: Guangcong Wang, Jianhuang Lai, Wenqi Liang, Guangrun Wang

    Abstract: Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying relations between different visual tasks remains largely unexplored. In this paper, we propose a novel \textbf{L}earnable \textbf{P}arameter \textbf{S}imi… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

    Comments: 9 pages

  22. arXiv:1803.04640  [pdf, other

    q-bio.QM stat.AP

    Bayesian Detection of Abnormal ADS in Mutant Caenorhabditis elegans Embryos

    Authors: Wei Liang, Yuxiao Yang, Yusi Fang, Zhongying Zhao, Jie Hu

    Abstract: Cell division timing is critical for cell fate specification and morphogenesis during embryogenesis. How division timings are regulated among cells during development is poorly understood. Here we focus on the comparison of asynchrony of division between sister cells (ADS) between wild-type and mutant individuals of Caenorhabditis elegans. Since the replicate number of mutant individuals of each m… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  23. arXiv:1712.05767  [pdf, other

    stat.CO

    Sparse matrix linear models for structured high-throughput data

    Authors: Jane W. Liang, Saunak Sen

    Abstract: Recent technological advancements have led to the rapid generation of high-throughput biological data, which can be used to address novel scientific questions in broad areas of research. These data can be thought of as a large matrix with covariates annotating both rows and columns of this matrix. Matrix linear models provide a convenient way for modeling such data. In many situations, sparse esti… ▽ More

    Submitted 25 February, 2021; v1 submitted 15 December, 2017; originally announced December 2017.

    Comments: 35 pages, 7 figures