Search | arXiv e-print repository

A Two-Sample Test of Text Generation Similarity

Authors: Jingbin Xu, Chen Qian, Meimei Liu, Feng Guo

Abstract: The surge in digitized text data requires reliable inferential methods on observed textual patterns. This article proposes a novel two-sample text test for comparing similarity between two groups of documents. The hypothesis is whether the probabilistic mapping generating the textual data is identical across two groups of documents. The proposed test aims to assess text similarity by comparing the… ▽ More The surge in digitized text data requires reliable inferential methods on observed textual patterns. This article proposes a novel two-sample text test for comparing similarity between two groups of documents. The hypothesis is whether the probabilistic mapping generating the textual data is identical across two groups of documents. The proposed test aims to assess text similarity by comparing the entropy of the documents. Entropy is estimated using neural network-based language models. The test statistic is derived from an estimation-and-inference framework, where the entropy is first approximated using an estimation set, followed by inference on the remaining data set. We showed theoretically that under mild conditions, the test statistic asymptotically follows a normal distribution. A multiple data-splitting strategy is proposed to enhance test power, which combines p-values into a unified decision. Various simulation studies and a real data example demonstrated that the proposed two-sample text test maintains the nominal Type one error rate while offering greater power compared to existing methods. The proposed method provides a novel solution to assert differences in document classes, particularly in fields where large-scale textual information is crucial. △ Less

Submitted 8 May, 2025; originally announced May 2025.

arXiv:2505.04795 [pdf, other]

Assessing Risk Heterogeneity through Heavy-Tailed Frequency and Severity Mixtures

Authors: Michael R. Powers, Jiaxin Xu

Abstract: In operational risk management and actuarial finance, the analysis of risk often begins by dividing a random damage-generation process into its separate frequency and severity components. In the present article, we construct canonical families of mixture distributions for each of these components, based on a Negative Binomial kernel for frequency and a Gamma kernel for severity. The mixtures are e… ▽ More In operational risk management and actuarial finance, the analysis of risk often begins by dividing a random damage-generation process into its separate frequency and severity components. In the present article, we construct canonical families of mixture distributions for each of these components, based on a Negative Binomial kernel for frequency and a Gamma kernel for severity. The mixtures are employed to assess the heterogeneity of risk factors underlying an empirical distribution through the shape of the implied mixing distribution. From the duality of the Negative Binomial and Gamma distributions, we first derive necessary and sufficient conditions for heavy-tailed (i.e., inverse power-law) canonical mixtures. We then formulate flexible 4-parameter families of mixing distributions for Geometric and Exponential kernels to generate heavy-tailed 4-parameter mixture models, and extend these mixtures to arbitrary Negative Binomial and Gamma kernels, respectively, yielding 5-parameter mixtures for detecting and measuring risk heterogeneity. To check the robustness of such heterogeneity inferences, we show how a fitted 5-parameter model may be re-expressed in terms of alternative Negative Binomial or Gamma kernels whose associated mixing distributions form a "calibrated" family. △ Less

Submitted 7 May, 2025; originally announced May 2025.

MSC Class: 60E05; 60E10

arXiv:2505.01467 [pdf, other]

sae4health: An R Shiny Application for Small Area Estimation in Low- and Middle-Income Countries

Authors: Yunhan Wu, Qianyu Dong, Jieyi Xu, Zehang Richard Li, Jon Wakefield

Abstract: Accurate subnational estimation of health indicators is critical for public health planning, especially in low- and middle-income countries (LMICs), where data and tools are often limited. The sae4health R shiny app, built on the surveyPrev package, provides a user-friendly tool for prevalence mapping using small area estimation (SAE) methods. Both area- and unit-level models with spatial random e… ▽ More Accurate subnational estimation of health indicators is critical for public health planning, especially in low- and middle-income countries (LMICs), where data and tools are often limited. The sae4health R shiny app, built on the surveyPrev package, provides a user-friendly tool for prevalence mapping using small area estimation (SAE) methods. Both area- and unit-level models with spatial random effects are available, with fast Bayesian inference performed using Integrated Nested Laplace Approximation (INLA). Currently, the app supports analysis of over 150 indicators from Demographic and Health Surveys (DHS) across multiple administrative levels. sae4health simplifies the use of complex prevalence mapping models to support data-driven decision-making. The app provides interactive visualization, summary, and report generation functionalities for a wide range of use cases. This paper outlines the app's statistical framework and demonstrates the workflow through a case study of child stunting in Nigeria. Additional documentation is available on the supporting website (https://sae4health.stat.uw.edu). △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.09481 [pdf, other]

Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluation

Authors: Chenbin Zhang, Zhiqiang Hu, Chuchu Jiang, Wen Chen, Jie Xu, Shaoting Zhang

Abstract: Drug-target binding affinity prediction is a fundamental task for drug discovery. It has been extensively explored in literature and promising results are reported. However, in this paper, we demonstrate that the results may be misleading and cannot be well generalized to real practice. The core observation is that the canonical randomized split of a test set in conventional evaluation leaves the… ▽ More Drug-target binding affinity prediction is a fundamental task for drug discovery. It has been extensively explored in literature and promising results are reported. However, in this paper, we demonstrate that the results may be misleading and cannot be well generalized to real practice. The core observation is that the canonical randomized split of a test set in conventional evaluation leaves the test set dominated by samples with high similarity to the training set. The performance of models is severely degraded on samples with lower similarity to the training set but the drawback is highly overlooked in current evaluation. As a result, the performance can hardly be trusted when the model meets low-similarity samples in real practice. To address this problem, we propose a framework of similarity aware evaluation in which a novel split methodology is proposed to adapt to any desired distribution. This is achieved by a formulation of optimization problems which are approximately and efficiently solved by gradient descent. We perform extensive experiments across five representative methods in four datasets for two typical target evaluations and compare them with various counterpart methods. Results demonstrate that the proposed split methodology can significantly better fit desired distributions and guide the development of models. Code is released at https://github.com/Amshoreline/SAE/tree/main. △ Less

Submitted 13 April, 2025; originally announced April 2025.

Comments: ICLR 2025 Oral

arXiv:2503.16321 [pdf]

Balancing the effective sample size in prior across different doses in the curve-free Bayesian decision-theoretic design for dose-finding trials

Authors: Jiapeng Xu, Dehua Bi, Shenghua Kelly Fan, Bee Leng Lee, Ying Lu

Abstract: The primary goal of dose allocation in phase I trials is to minimize patient exposure to subtherapeutic or excessively toxic doses, while accurately recommending a phase II dose that is as close as possible to the maximum tolerated dose (MTD). Fan et al. (2012) introduced a curve-free Bayesian decision-theoretic design (CFBD), which leverages the assumption of a monotonic dose-toxicity relationshi… ▽ More The primary goal of dose allocation in phase I trials is to minimize patient exposure to subtherapeutic or excessively toxic doses, while accurately recommending a phase II dose that is as close as possible to the maximum tolerated dose (MTD). Fan et al. (2012) introduced a curve-free Bayesian decision-theoretic design (CFBD), which leverages the assumption of a monotonic dose-toxicity relationship without directly modeling dose-toxicity curves. This approach has also been extended to drug combinations for determining the MTD (Lee et al., 2017). Although CFBD has demonstrated improved trial efficiency by using fewer patients while maintaining high accuracy in identifying the MTD, it may artificially inflate the effective sample sizes for the updated prior distributions, particularly at the lowest and highest dose levels. This can lead to either overshooting or undershooting the target dose. In this paper, we propose a modification to CFBD's prior distribution updates that balances effective sample sizes across different doses. Simulation results show that with the modified prior specification, CFBD achieves a more focused dose allocation at the MTD and offers more precise dose recommendations with fewer patients on average. It also demonstrates robustness to other well-known dose finding designs in literature. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 24 pages

arXiv:2503.11637 [pdf, other]

Gradient-bridged Posterior: Bayesian Inference for Models with Implicit Functions

Authors: Cheng Zeng, Yaozhi Yang, Jason Xu, Leo L Duan

Abstract: Many statistical problems include model parameters that are defined as the solutions to optimization sub-problems. These include classical approaches such as profile likelihood as well as modern applications involving flow networks or Procrustes distances. In such cases, the likelihood of the data involves an implicit function, often complicating inferential procedures and entailing prohibitive co… ▽ More Many statistical problems include model parameters that are defined as the solutions to optimization sub-problems. These include classical approaches such as profile likelihood as well as modern applications involving flow networks or Procrustes distances. In such cases, the likelihood of the data involves an implicit function, often complicating inferential procedures and entailing prohibitive computational cost. In this article, we propose an intuitive and tractable posterior inference approach for this setting. We introduce a class of continuous models that handle implicit function values using the first-order optimality of the sub-problems. Specifically, we apply a shrinkage kernel to the gradient norm, which retains a probabilistic interpretation within a generative model. This can be understood as a generalization of the Gibbs posterior framework to newly enable concentration around partial minimizers in a subset of the parameters. We show that this method, termed the gradient-bridged posterior, is amenable to efficient posterior computation, and enjoys theoretical guarantees, establishing a Bernstein--von Mises theorem for asymptotic normality. The advantages of our approach are highlighted on a synthetic flow network experiment and an application to data integration using Procrustes distances. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 31 pages, 13 figures

arXiv:2503.06381 [pdf, other]

Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model

Authors: Jinwen Xu, Qin Lu, Yaakov Bar-Shalom

Abstract: This paper deals with the identification of the stochastic Ornstein-Uhlenbeck (OU) process error model, which is characterized by an inverse time constant, and the unknown variances of the process and observation noises. Although the availability of the explicit expression of the log-likelihood function allows one to obtain the maximum likelihood estimator (MLE), this entails evaluating the nontri… ▽ More This paper deals with the identification of the stochastic Ornstein-Uhlenbeck (OU) process error model, which is characterized by an inverse time constant, and the unknown variances of the process and observation noises. Although the availability of the explicit expression of the log-likelihood function allows one to obtain the maximum likelihood estimator (MLE), this entails evaluating the nontrivial gradient and also often struggles with local optima. To address these limitations, we put forth a sample-efficient global optimization approach based on the Bayesian optimization (BO) framework, which relies on a Gaussian process (GP) surrogate model for the objective function that effectively balances exploration and exploitation to select the query points. Specifically, each evaluation of the objective is implemented efficiently through the Kalman filter (KF) recursion. Comprehensive experiments on various parameter settings and sampling intervals corroborate that BO-based estimator consistently outperforms MLE implemented by the steady-state KF approximation and the expectation-maximization algorithm (whose derivation is a side contribution) in terms of root mean-square error (RMSE) and statistical consistency, confirming the effectiveness and robustness of the BO for identification of the stochastic OU process. Notably, the RMSE values produced by the BO-based estimator are smaller than the classical Cramér-Rao lower bound, especially for the inverse time constant, estimating which has been a long-standing challenge. This seemingly counterintuitive result can be explained by the data-driven prior for the learning parameters indirectly injected by BO through the GP prior over the objective function. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.06009 [pdf, ps, other]

Nearly Optimal Differentially Private ReLU Regression

Authors: Meng Ding, Mingxi Lei, Shaowei Wang, Tianhang Zheng, Di Wang, Jinhui Xu

Abstract: In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussi… ▽ More In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(ε, δ)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $ε$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, δ)$-DP algorithms is lower bounded by $Ω(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results. △ Less

Submitted 10 June, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

Comments: 47 pages (UAI2025)

arXiv:2503.03536 [pdf, other]

A Criterion for Extending Continuous-Mixture Identifiability Results

Authors: Michael R. Powers, Jiaxin Xu

Abstract: For continuous mixtures of random variables, we provide a simple criterion -- generating-function accessibility -- to extend previously known kernel-based identifiability (or unidentifiability) results to new kernel distributions. This criterion, based on functional relationships between the relevant kernels' moment-generating functions or Laplace transforms, may be applied to continuous mixtures… ▽ More For continuous mixtures of random variables, we provide a simple criterion -- generating-function accessibility -- to extend previously known kernel-based identifiability (or unidentifiability) results to new kernel distributions. This criterion, based on functional relationships between the relevant kernels' moment-generating functions or Laplace transforms, may be applied to continuous mixtures of both discrete and continuous random variables. To illustrate the proposed approach, we present results for several specific kernels. △ Less

Submitted 5 March, 2025; originally announced March 2025.

MSC Class: 62F99; 60E05

arXiv:2502.17814 [pdf, other]

An Overview of Large Language Models for Statisticians

Authors: Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I. Jordan, Song Mei, Jason E Weston, Weijie J. Su, Jing Xu, Linjun Zhang

Abstract: Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures, emerging problems -- in areas such as uncertainty quantification, decision… ▽ More Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures, emerging problems -- in areas such as uncertainty quantification, decision-making, causal inference, and distribution shift -- require a deeper engagement with the field of statistics. This paper explores potential areas where statisticians can make important contributions to the development of LLMs, particularly those that aim to engender trustworthiness and transparency for human users. Thus, we focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation. We also consider possible roles for LLMs in statistical analysis. By bridging AI and statistics, we aim to foster a deeper collaboration that advances both the theoretical foundations and practical applications of LLMs, ultimately shaping their role in addressing complex societal challenges. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.10793 [pdf, other]

Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training

Authors: Jie Xu, Zihan Wu

Abstract: Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers three key insights: 1) Samples show different time-varying influence patterns,… ▽ More Existing methods for measuring training sample influence on models only provide static, overall measurements, overlooking how sample influence changes during training. We propose Dynamic Influence Tracker (DIT), which captures the time-varying sample influence across arbitrary time windows during training. DIT offers three key insights: 1) Samples show different time-varying influence patterns, with some samples important in the early training stage while others become important later. 2) Sample influences show a weak correlation between early and late stages, demonstrating that the model undergoes distinct learning phases with shifting priorities. 3) Analyzing influence during the convergence period provides more efficient and accurate detection of corrupted samples than full-training analysis. Supported by theoretical guarantees without assuming loss convexity or model convergence, DIT significantly outperforms existing methods, achieving up to 0.99 correlation with ground truth and above 98\% accuracy in detecting corrupted samples in complex architectures. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2502.10409 [pdf, other]

Data Science Students Perspectives on Learning Analytics: An Application of Human-Led and LLM Content Analysis

Authors: Raghda Zahran, Jianfei Xu, Huizhi Liang, Matthew Forshaw

Abstract: Objective This study is part of a series of initiatives at a UK university designed to cultivate a deep understanding of students' perspectives on analytics that resonate with their unique learning needs. It explores collaborative data processing undertaken by postgraduate students who examined an Open University Learning Analytics Dataset (OULAD). Methods A qualitative approach was adopted, int… ▽ More Objective This study is part of a series of initiatives at a UK university designed to cultivate a deep understanding of students' perspectives on analytics that resonate with their unique learning needs. It explores collaborative data processing undertaken by postgraduate students who examined an Open University Learning Analytics Dataset (OULAD). Methods A qualitative approach was adopted, integrating a Retrieval-Augmented Generation (RAG) and a Large Language Model (LLM) technique with human-led content analysis to gather information about students' perspectives based on their submitted work. The study involved 72 postgraduate students in 12 groups. Findings The analysis of group work revealed diverse insights into essential learning analytics from the students' perspectives. All groups adopted a structured data science methodology. The questions formulated by the groups were categorised into seven themes, reflecting their specific areas of interest. While there was variation in the selected variables to interpret correlations, a consensus was found regarding the general results. Conclusion A significant outcome of this study is that students specialising in data science exhibited a deeper understanding of learning analytics, effectively articulating their interests through inferences drawn from their analyses. While human-led content analysis provided a general understanding of students' perspectives, the LLM offered nuanced insights. △ Less

Submitted 22 January, 2025; originally announced February 2025.

Comments: 17 Pages, 2 Tables, 1 Figure

arXiv:2502.06168 [pdf, other]

Dynamic Pricing with Adversarially-Censored Demands

Authors: Jianyu Xu, Yining Wang, Xi Chen, Yu-Xiang Wang

Abstract: We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives… ▽ More We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: 33 pages, 1 figure

MSC Class: 91B06; 91B24; 62P20; 62C20; 90B50 ACM Class: I.2.6

arXiv:2502.00126 [pdf, other]

A Bayesian decision-theoretic approach to sparse estimation

Authors: Aihua Li, Surya T. Tokdar, Jason Xu

Abstract: We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of… ▽ More We extend the work of Hahn and Carvalho (2015) and develop a doubly-regularized sparse regression estimator by synthesizing Bayesian regularization with penalized least squares within a decision-theoretic framework. In contrast to existing Bayesian decision-theoretic formulation chiefly reliant upon the symmetric 0-1 loss, the new method -- which we call Bayesian Decoupling -- employs a family of penalized loss functions indexed by a sparsity-tuning parameter. We propose a class of reweighted l1 penalties, with two specific instances that achieve simultaneous bias reduction and convexity. The design of the penalties incorporates considerations of signal sizes, as enabled by the Bayesian paradigm. The tuning parameter is selected using a posterior benchmarking criterion, which quantifies the drop in predictive power relative to the posterior mean which is the optimal Bayes estimator under the squared error loss. Additionally, in contrast to the widely used median probability model technique which selects variables by thresholding posterior inclusion probabilities at the fixed threshold of 1/2, Bayesian Decoupling enables the use of a data-driven threshold which automatically adapts to estimated signal sizes and offers far better performance in high-dimensional settings with highly correlated predictors. Our numerical results in such settings show that certain combinations of priors and loss functions significantly improve the solution path compared to existing methods, prioritizing true signals early along the path before false signals are selected. Consequently, Bayesian Decoupling produces estimates with better prediction and selection performance. Finally, a real data application illustrates the practical advantages of our approaches which select sparser models with larger coefficient estimates. △ Less

Submitted 31 January, 2025; originally announced February 2025.

Comments: Submitted to Biometrika

arXiv:2501.18049 [pdf, ps, other]

Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach

Authors: Jianyu Xu, Xuan Wang, Yu-Xiang Wang, Jiashuo Jiang

Abstract: We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an effici… ▽ More We study an online learning problem on dynamic pricing and resource allocation, where we make joint pricing and inventory decisions to maximize the overall net profit. We consider the stochastic dependence of demands on the price, which complicates the resource allocation process and introduces significant non-convexity and non-smoothness to the problem. To solve this problem, we develop an efficient algorithm that utilizes a "Lower-Confidence Bound (LCB)" meta-strategy over multiple OCO agents. Our algorithm achieves $\tilde{O}(\sqrt{Tmn})$ regret (for $m$ suppliers and $n$ consumers), which is optimal with respect to the time horizon $T$. Our results illustrate an effective integration of statistical learning methodologies with complex operations research problems. △ Less

Submitted 21 May, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

MSC Class: 91B06; 90B22; 91B24; 90B50; 90B80; 62P20 ACM Class: I.2.6

arXiv:2501.12453 [pdf]

On the two-step hybrid design for augmenting randomized trials using real-world data

Authors: Jiapeng Xu, Ruben P. A. van Eijk, Alicia Ellis, Tianyu Pan, Lorene M. Nelson, Kit C. B. Roes, Marc van Dijk, Maria Sarno, Leonard H. van den Berg, Lu Tian, Ying Lu

Abstract: Hybrid clinical trials, that borrow real-world data (RWD), are gaining interest, especially for rare diseases. They assume RWD and randomized control arm be exchangeable, but violations can bias results, inflate type I error, or reduce power. A two-step hybrid design first tests exchangeability, reducing inappropriate borrowing but potentially inflating type I error (Yuan et al., 2019). We propose… ▽ More Hybrid clinical trials, that borrow real-world data (RWD), are gaining interest, especially for rare diseases. They assume RWD and randomized control arm be exchangeable, but violations can bias results, inflate type I error, or reduce power. A two-step hybrid design first tests exchangeability, reducing inappropriate borrowing but potentially inflating type I error (Yuan et al., 2019). We propose four methods to better control type I error. Approach 1 estimates the variance of test statistics, rejecting the null hypothesis based on large sample normal approximation. Approach 2 uses a numerical approach for exact critical value determination. Approach 3 splits type I error rates by equivalence test outcome. Approach 4 adjusts the critical value only when equivalence is established. Simulation studies using a hypothetical ALS scenario, evaluate type I error and power under various conditions, compared to the Bayesian power prior approach (Ibrahim et al., 2015). Our methods and the Bayesian power prior control type I error, whereas Yuan et al. (2019) increases it under exchangeability. If exchangeability doesn't hold, all methods fail to control type I error. Our methods show type I error inflation of 6%-8%, compared to 10% for Yuan et al. (2019) and 16% for the Bayesian power prior. △ Less

Submitted 21 January, 2025; originally announced January 2025.

MSC Class: 62 ACM Class: G.3

arXiv:2501.06540 [pdf, other]

CeViT: Copula-Enhanced Vision Transformer in multi-task learning and bi-group image covariates with an application to myopia screening

Authors: Chong Zhong, Yang Li, Jinfeng Xu, Xiang Fu, Yunhao Liu, Qiuyi Huang, Danjuan Yang, Meiyan Li, Aiyi Liu, Alan H. Welsh, Xingtao Zhou, Bo Fu, Catherine C. Liu

Abstract: We aim to assist image-based myopia screening by resolving two longstanding problems, "how to integrate the information of ocular images of a pair of eyes" and "how to incorporate the inherent dependence among high-myopia status and axial length for both eyes." The classification-regression task is modeled as a novel 4-dimensional muti-response regression, where discrete responses are allowed, tha… ▽ More We aim to assist image-based myopia screening by resolving two longstanding problems, "how to integrate the information of ocular images of a pair of eyes" and "how to incorporate the inherent dependence among high-myopia status and axial length for both eyes." The classification-regression task is modeled as a novel 4-dimensional muti-response regression, where discrete responses are allowed, that relates to two dependent 3rd-order tensors (3D ultrawide-field fundus images). We present a Vision Transformer-based bi-channel architecture, named CeViT, where the common features of a pair of eyes are extracted via a shared Transformer encoder, and the interocular asymmetries are modeled through separated multilayer perceptron heads. Statistically, we model the conditional dependence among mixture of discrete-continuous responses given the image covariates by a so-called copula loss. We establish a new theoretical framework regarding fine-tuning on CeViT based on latent representations, allowing the black-box fine-tuning procedure interpretable and guaranteeing higher relative efficiency of fine-tuning weight estimation in the asymptotic setting. We apply CeViT to an annotated ultrawide-field fundus image dataset collected by Shanghai Eye \& ENT Hospital, demonstrating that CeViT enhances the baseline model in both accuracy of classifying high-myopia and prediction of AL on both eyes. △ Less

Submitted 11 January, 2025; originally announced January 2025.

arXiv:2501.01657 [pdf, other]

Change Point Detection for Random Objects with Possibly Periodic Behavior

Authors: Jiazhen Xu, Andrew T. A. Wood, Tao Zou

Abstract: Time-varying random objects have been increasingly encountered in modern data analysis. Moreover, in a substantial number of these applications, periodic behavior of the random objects has been observed. We introduce a new, powerful scan statistic and corresponding test for the precise identification and localization of abrupt changes in the distribution of non-Euclidean random objects with possib… ▽ More Time-varying random objects have been increasingly encountered in modern data analysis. Moreover, in a substantial number of these applications, periodic behavior of the random objects has been observed. We introduce a new, powerful scan statistic and corresponding test for the precise identification and localization of abrupt changes in the distribution of non-Euclidean random objects with possibly periodic behavior. Our approach is nonparametric and effectively captures the entire distribution of these random objects. Remarkably, it operates with minimal tuning parameters, requiring only the specification of cut-off intervals near endpoints, where change points are assumed not to occur. Our theoretical contributions include deriving the asymptotic distribution of the test statistic under the null hypothesis of no change points, establishing the consistency of the test in the presence of change points under contiguous alternatives and providing rigorous guarantees on the near-optimal consistency in estimating the number and locations of change points, whether dealing with a single change point or multiple ones. We demonstrate that the most competitive method currently in the literature for change point detection in random objects is degraded by periodic behavior, as periodicity leads to blurring of the changes that this procedure aims to discover. Through comprehensive simulation studies, we demonstrate the superior power and accuracy of our approach in both detecting change points and pinpointing their locations, across scenarios involving both periodic and nonperiodic random objects. Our main application is to weighted networks, represented through graph Laplacians. The proposed method delivers highly interpretable results, as evidenced by the identification of meaningful change points in the New York City Citi Bike sharing system that align with significant historical events. △ Less

Submitted 3 January, 2025; originally announced January 2025.

Comments: arXiv admin note: text overlap with arXiv:2311.16025 by other authors

arXiv:2411.17728 [pdf, other]

Analytic Continuation by Feature Learning

Authors: Zhe Zhao, Jingping Xu, Ce Wang, Yaping Yang

Abstract: Analytic continuation aims to reconstruct real-time spectral functions from imaginary-time Green's functions; however, this process is notoriously ill-posed and challenging to solve. We propose a novel neural network architecture, named the Feature Learning Network (FL-net), to enhance the prediction accuracy of spectral functions, achieving an improvement of at least $20\%$ over traditional metho… ▽ More Analytic continuation aims to reconstruct real-time spectral functions from imaginary-time Green's functions; however, this process is notoriously ill-posed and challenging to solve. We propose a novel neural network architecture, named the Feature Learning Network (FL-net), to enhance the prediction accuracy of spectral functions, achieving an improvement of at least $20\%$ over traditional methods, such as the Maximum Entropy Method (MEM), and previous neural network approaches. Furthermore, we develop an analytical method to evaluate the robustness of the proposed network. Using this method, we demonstrate that increasing the hidden dimensionality of FL-net, while leading to lower loss, results in decreased robustness. Overall, our model provides valuable insights into effectively addressing the complex challenges associated with analytic continuation. △ Less

Submitted 22 November, 2024; originally announced November 2024.

Comments: 8 pages, 9 figures

arXiv:2411.15567 [pdf, ps, other]

Regional consistency evaluation and sample size calculation under two MRCTs

Authors: Kunhai Qing, Xinru Ren, Jin Xu

Abstract: Multi-regional clinical trial (MRCT) has been common practice for drug development and global registration. The FDA guidance "Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products Guidance for Industry" (FDA, 2019) requires that substantial evidence of effectiveness of a drug/biologic product to be demonstrated for market approval. In the situations where two p… ▽ More Multi-regional clinical trial (MRCT) has been common practice for drug development and global registration. The FDA guidance "Demonstrating Substantial Evidence of Effectiveness for Human Drug and Biological Products Guidance for Industry" (FDA, 2019) requires that substantial evidence of effectiveness of a drug/biologic product to be demonstrated for market approval. In the situations where two pivotal MRCTs are needed to establish effectiveness of a specific indication for a drug or biological product, a systematic approach of consistency evaluation for regional effect is crucial. In this paper, we first present some existing regional consistency evaluations in a unified way that facilitates regional sample size calculation under the simple fixed effect model. Second, we extend the two commonly used consistency assessment criteria of MHLW (2007) in the context of two MRCTs and provide their evaluation and regional sample size calculation. Numerical studies demonstrate the proposed regional sample size attains the desired probability of showing regional consistency. A hypothetical example is provided for illustration of application. We provide an R package for implementation. △ Less

Submitted 23 November, 2024; originally announced November 2024.

arXiv:2411.01780 [pdf, other]

Clustering Based on Density Propagation and Subcluster Merging

Authors: Feiping Nie, Yitao Song, Jingjing Xue, Rong Wang, Xuelong Li

Abstract: We propose the DPSM method, a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process, thereby making it suitable for… ▽ More We propose the DPSM method, a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process, thereby making it suitable for a graph space. In DPSM, nodes are partitioned into small clusters based on propagated density. The partitioning technique has been proved to be sound and complete. We then extend the concept of spectral clustering from individual nodes to these small clusters, while introducing the CluCut measure to guide cluster merging. This measure is modified in various ways to account for cluster properties, thus provides guidance on when to terminate the merging process. Various experiments have validated the effectiveness of DOSM and the accuracy of these conclusions. △ Less

Submitted 3 November, 2024; originally announced November 2024.

arXiv:2411.00075 [pdf, other]

μP$^2$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling

Authors: Moritz Haas, Jin Xu, Volkan Cevher, Leena Chennuru Vankadara

Abstract: Sharpness Aware Minimization (SAM) enhances performance across various neural architectures and datasets. As models are continually scaled up to improve performance, a rigorous understanding of SAM's scaling behaviour is paramount. To this end, we study the infinite-width limit of neural networks trained with SAM, using the Tensor Programs framework. Our findings reveal that the dynamics of standa… ▽ More Sharpness Aware Minimization (SAM) enhances performance across various neural architectures and datasets. As models are continually scaled up to improve performance, a rigorous understanding of SAM's scaling behaviour is paramount. To this end, we study the infinite-width limit of neural networks trained with SAM, using the Tensor Programs framework. Our findings reveal that the dynamics of standard SAM effectively reduce to applying SAM solely in the last layer in wide neural networks, even with optimal hyperparameters. In contrast, we identify a stable parameterization with layerwise perturbation scaling, which we call $\textit{Maximal Update and Perturbation Parameterization}$ ($μ$P$^2$), that ensures all layers are both feature learning and effectively perturbed in the limit. Through experiments with MLPs, ResNets and Vision Transformers, we empirically demonstrate that $μ$P$^2$ achieves hyperparameter transfer of the joint optimum of learning rate and perturbation radius across model scales. Moreover, we provide an intuitive condition to derive $μ$P$^2$ for other perturbation rules like Adaptive SAM and SAM-ON, also ensuring balanced perturbation effects across all layers. △ Less

Submitted 10 February, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

Comments: Final NeurIPS 2024 camera-ready version. Differences to v1: Cleaner Figure 1, added Appendix H.3.2 showing that even MLPs can transfer optimal HPs in some versions of SP on CIFAR-10, small improvements in writing

arXiv:2410.17392 [pdf, other]

Experimental Designs for Optimizing Last-Mile Delivery

Authors: Nicholas Rios, Jie Xu

Abstract: Companies like Amazon and UPS are heavily invested in last-mile delivery problems. Optimizing last-delivery operations not only creates tremendous cost savings for these companies but also generate broader societal and environmental benefits in terms of better delivery service and reduced air pollutants and greenhouse gas emissions. Last-mile delivery is readily formulated as the Travelling Salesm… ▽ More Companies like Amazon and UPS are heavily invested in last-mile delivery problems. Optimizing last-delivery operations not only creates tremendous cost savings for these companies but also generate broader societal and environmental benefits in terms of better delivery service and reduced air pollutants and greenhouse gas emissions. Last-mile delivery is readily formulated as the Travelling Salesman Problem (TSP), where a salesperson must visit several cities and return to the origin with the least cost. A solution to this problem is a Hamiltonian circuit in an undirected graph. Many methods exist for solving the TSP, but they often assume the travel costs are fixed. In practice, travel costs between delivery zones are random quantities, as they are subject to variation from traffic, weather, and other factors. Innovations such as truck-drone last-mile delivery creates even more uncertainties due to scarce data. A Bayesian D-optimal experimental design in conjunction with a regression model are proposed to estimate these unknown travel costs, and subsequently search for a highly efficient solution to the TSP. This framework can naturally be extended to incorporate the use of drones and any other emerging technology that has use in last-mile delivery. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 22 Pages, 2 Figures with 4 subfigure panels each, To be submitted to Quality Engineering

arXiv:2410.05444 [pdf, other]

Online scalable Gaussian processes with conformal prediction for guaranteed coverage

Authors: Jinwen Xu, Qin Lu, Georgios B. Giannakis

Abstract: The Gaussian process (GP) is a Bayesian nonparametric paradigm that is widely adopted for uncertainty quantification (UQ) in a number of safety-critical applications, including robotics, healthcare, as well as surveillance. The consistency of the resulting uncertainty values however, hinges on the premise that the learning function conforms to the properties specified by the GP model, such as smoo… ▽ More The Gaussian process (GP) is a Bayesian nonparametric paradigm that is widely adopted for uncertainty quantification (UQ) in a number of safety-critical applications, including robotics, healthcare, as well as surveillance. The consistency of the resulting uncertainty values however, hinges on the premise that the learning function conforms to the properties specified by the GP model, such as smoothness, periodicity and more, which may not be satisfied in practice, especially with data arriving on the fly. To combat against such model mis-specification, we propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage under the sole assumption of data exchangeability. However, this assumption is usually violated in the online setting, where a prediction set is sought before revealing the true label. To ensure long-term coverage guarantee, we will adaptively set the key threshold parameter based on the feedback whether the true label falls inside the prediction set. Numerical results demonstrate the merits of the online GP-CP approach relative to existing alternatives in the long-term coverage performance. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.03937 [pdf, other]

Clustering Alzheimer's Disease Subtypes via Similarity Learning and Graph Diffusion

Authors: Tianyi Wei, Shu Yang, Davoud Ataee Tarzanagh, Jingxuan Bao, Jia Xu, Patryk Orzechowski, Joost B. Wagenaar, Qi Long, Li Shen

Abstract: Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of… ▽ More Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of AD that represent distinctive clinical features and underlying pathology by utilizing unsupervised clustering with graph diffusion and similarity learning. We adopted SIMLR, a multi-kernel similarity learning framework, and graph diffusion to perform clustering on a group of 829 patients with AD and mild cognitive impairment (MCI, a prodromal stage of AD) based on their cortical thickness measurements extracted from magnetic resonance imaging (MRI) scans. Although the clustering approach we utilized has not been explored for the task of AD subtyping before, it demonstrated significantly better performance than several commonly used clustering methods. Specifically, we showed the power of graph diffusion in reducing the effects of noise in the subtype detection. Our results revealed five subtypes that differed remarkably in their biomarkers, cognitive status, and some other clinical features. To evaluate the resultant subtypes further, a genetic association study was carried out and successfully identified potential genetic underpinnings of different AD subtypes. Our source code is available at: https://github.com/PennShenLab/AD-SIMLR. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: ICIBM'23': International Conference on Intelligent Biology and Medicine, Tampa, FL, USA, July 16-19, 2023

arXiv:2410.03833 [pdf, other]

Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective

Authors: Meng Ding, Rohan Sharma, Changyou Chen, Jinhui Xu, Kaiyi Ji

Abstract: Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data. In this paper, we pr… ▽ More Machine Unlearning has emerged as a significant area of research, focusing on `removing' specific subsets of data from a trained model. Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning, as they effectively retain model performance. However, it is consistently observed that naive FT methods struggle to forget the targeted data. In this paper, we present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework, providing a deeper exploration of this phenomenon. Our analysis reveals that while FT models can achieve zero remaining loss, they fail to forget the forgetting data, as the pretrained model retains its influence and the fine-tuning process does not adequately mitigate it. To address this, we propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset, unlike existing methods that focus on the forgetting dataset. Our theoretical analysis demonstrates that RBM not only significantly improves unlearning accuracy (UA) but also ensures higher retaining accuracy (RA) by preserving overlapping features shared between the forgetting and remaining datasets. Experiments on synthetic and real-world datasets validate our theoretical insights, showing that RBM outperforms existing masking approaches in balancing UA, RA, and disparity metrics. △ Less

Submitted 7 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

Comments: 23 pages,5 figures

arXiv:2409.06530 [pdf, other]

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems

Authors: Huaqing Zhang, Lesi Chen, Jing Xu, Jingzhao Zhang

Abstract: This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate soluti… ▽ More This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose a novel method by reformulating them into functionally constrained problems. Our method achieves near-optimal rates for both smooth and nonsmooth problems. To the best of our knowledge, this is the first near-optimal algorithm that works under standard assumptions of smoothness or Lipschitz continuity for the objective functions. △ Less

Submitted 27 January, 2025; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: Accepted at NeurIPS 2024

arXiv:2409.04919 [pdf, other]

Learning with Shared Representations: Statistical Rates and Efficient Algorithms

Authors: Xiaochun Niu, Lili Su, Jiaming Xu, Pengkun Yang

Abstract: Collaborative learning through latent shared feature representations enables heterogeneous clients to train personalized models with enhanced performance while reducing sample complexity. Despite its empirical success and extensive research, the theoretical understanding of statistical error rates remains incomplete, even for shared representations constrained to low-dimensional linear subspaces.… ▽ More Collaborative learning through latent shared feature representations enables heterogeneous clients to train personalized models with enhanced performance while reducing sample complexity. Despite its empirical success and extensive research, the theoretical understanding of statistical error rates remains incomplete, even for shared representations constrained to low-dimensional linear subspaces. In this paper, we establish new upper and lower bounds on the error for learning low-dimensional linear representations shared across clients. Our results account for both statistical heterogeneity (including covariate and concept shifts) and heterogeneity in local dataset sizes, a critical aspect often overlooked in previous studies. We further extend our error bounds to more general nonlinear models, including logistic regression and one-hidden-layer ReLU neural networks. More specifically, we design a spectral estimator that leverages independent replicas of local averaging to approximately solve the non-convex least squares problem. We derive a nearly matching minimax lower bound, proving that our estimator achieves the optimal statistical rate when the latent shared linear representation is well-represented across the entire dataset--that is, when no specific direction is disproportionately underrepresented. Our analysis reveals two distinct phases of the optimal rate: in typical cases, the rate matches the standard parameter-counting rate for the representation; however, a statistical penalty arises when the number of clients surpasses a certain threshold or the local dataset sizes fall below a threshold. These findings provide a more precise characterization of when collaboration benefits the overall system or individual clients in transfer learning and private fine-tuning. △ Less

Submitted 21 January, 2025; v1 submitted 7 September, 2024; originally announced September 2024.

arXiv:2409.00407 [pdf, other]

Response probability distribution estimation of expensive computer simulators: A Bayesian active learning perspective using Gaussian process regression

Authors: Chao Dang, Marcos A. Valdebenito, Nataly A. Manque, Jun Xu, Matthias G. R. Faes

Abstract: Estimation of the response probability distributions of computer simulators in the presence of randomness is a crucial task in many fields. However, achieving this task with guaranteed accuracy remains an open computational challenge, especially for expensive-to-evaluate computer simulators. In this work, a Bayesian active learning perspective is presented to address the challenge, which is based… ▽ More Estimation of the response probability distributions of computer simulators in the presence of randomness is a crucial task in many fields. However, achieving this task with guaranteed accuracy remains an open computational challenge, especially for expensive-to-evaluate computer simulators. In this work, a Bayesian active learning perspective is presented to address the challenge, which is based on the use of the Gaussian process (GP) regression. First, estimation of the response probability distributions is conceptually interpreted as a Bayesian inference problem, as opposed to frequentist inference. This interpretation provides several important benefits: (1) it quantifies and propagates discretization error probabilistically; (2) it incorporates prior knowledge of the computer simulator, and (3) it enables the effective reduction of numerical uncertainty in the solution to a prescribed level. The conceptual Bayesian idea is then realized by using the GP regression, where we derive the posterior statistics of the response probability distributions in semi-analytical form and also provide a numerical solution scheme. Based on the practical Bayesian approach, a Bayesian active learning (BAL) method is further proposed for estimating the response probability distributions. In this context, the key contribution lies in the development of two crucial components for active learning, i.e., stopping criterion and learning function, by taking advantage of posterior statistics. It is empirically demonstrated by five numerical examples that the proposed BAL method can efficiently estimate the response probability distributions with desired accuracy. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2408.14625 [pdf, other]

A Bayesian approach for fitting semi-Markov mixture models of cancer latency to individual-level data

Authors: Raphael Morsomme, Shannon Holloway, Marc Ryser, Jason Xu

Abstract: Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficie… ▽ More Multi-state models of cancer natural history are widely used for designing and evaluating cancer early detection strategies. Calibrating such models against longitudinal data from screened cohorts is challenging, especially when fitting non-Markovian mixture models against individual-level data. Here, we consider a family of semi-Markov mixture models of cancer natural history introduce an efficient data-augmented Markov chain Monte Carlo sampling algorithm for fitting these models to individual-level screening and cancer diagnosis histories. Our fully Bayesian approach supports rigorous uncertainty quantification and model selection through leave-one-out cross-validation, and it enables the estimation of screening-related overdiagnosis rates. We demonstrate the effectiveness of our approach using synthetic data, showing that the sampling algorithm efficiently explores the joint posterior distribution of model parameters and latent variables. Finally, we apply our method to data from the US Breast Cancer Surveillance Consortium and estimate the extent of breast cancer overdiagnosis associated with mammography screening. The sampler and model comparison method are available in the R package baclava. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Submitted for review

arXiv:2408.10996 [pdf, ps, other]

Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform

Authors: Tong Mao, Jonathan W. Siegel, Jinchao Xu

Abstract: Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a v… ▽ More Let $Ω\subset \mathbb{R}^d$ be a bounded domain. We consider the problem of how efficiently shallow neural networks with the ReLU$^k$ activation function can approximate functions from Sobolev spaces $W^s(L_p(Ω))$ with error measured in the $L_q(Ω)$-norm. Utilizing the Radon transform and recent results from discrepancy theory, we provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $q\leq p$, $p\geq 2$, and $s \leq k + (d+1)/2$. The rates we derive are optimal up to logarithmic factors, and significantly generalize existing results. An interesting consequence is that the adaptivity of shallow ReLU$^k$ neural networks enables them to obtain optimal approximation rates for smoothness up to order $s = k + (d+1)/2$, even though they represent piecewise polynomials of fixed degree $k$. △ Less

Submitted 20 August, 2024; originally announced August 2024.

MSC Class: 62M45; 41A25; 41A30

arXiv:2408.06710 [pdf, other]

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

Abstract: Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simpl… ▽ More Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.03746 [pdf, ps, other]

Flexible Bayesian Last Layer Models Using Implicit Priors and Diffusion Posterior Sampling

Authors: Jian Xu, Zhiqi Lin, Shigui Li, Min Chen, Junmei Yang, Delu Zeng, John Paisley

Abstract: Bayesian Last Layer (BLL) models focus solely on uncertainty in the output layer of neural networks, demonstrating comparable performance to more complex Bayesian models. However, the use of Gaussian priors for last layer weights in Bayesian Last Layer (BLL) models limits their expressive capacity when faced with non-Gaussian, outlier-rich, or high-dimensional datasets. To address this shortfall,… ▽ More Bayesian Last Layer (BLL) models focus solely on uncertainty in the output layer of neural networks, demonstrating comparable performance to more complex Bayesian models. However, the use of Gaussian priors for last layer weights in Bayesian Last Layer (BLL) models limits their expressive capacity when faced with non-Gaussian, outlier-rich, or high-dimensional datasets. To address this shortfall, we introduce a novel approach that combines diffusion techniques and implicit priors for variational learning of Bayesian last layer weights. This method leverages implicit distributions for modeling weight priors in BLL, coupled with diffusion samplers for approximating true posterior predictions, thereby establishing a comprehensive Bayesian prior and posterior estimation strategy. By delivering an explicit and computationally efficient variational lower bound, our method aims to augment the expressive abilities of BLL models, enhancing model accuracy, calibration, and out-of-distribution detection proficiency. Through detailed exploration and experimental validation, We showcase the method's potential for improving predictive accuracy and uncertainty quantification while ensuring computational efficiency. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2407.19218 [pdf, other]

A Versatility Measure for Parametric Risk Models

Authors: Michael R. Powers, Jiaxin Xu

Abstract: Parametric statistical methods play a central role in analyzing risk through its underlying frequency and severity components. Given the wide availability of numerical algorithms and high-speed computers, researchers and practitioners often model these separate (although possibly statistically dependent) random variables by fitting a large number of parametric probability distributions to historic… ▽ More Parametric statistical methods play a central role in analyzing risk through its underlying frequency and severity components. Given the wide availability of numerical algorithms and high-speed computers, researchers and practitioners often model these separate (although possibly statistically dependent) random variables by fitting a large number of parametric probability distributions to historical data and then comparing goodness-of-fit statistics. However, this approach is highly susceptible to problems of overfitting because it gives insufficient weight to fundamental considerations of functional simplicity and adaptability. To address this shortcoming, we propose a formal mathematical measure for assessing the versatility of frequency and severity distributions prior to their application. We then illustrate this approach by computing and comparing values of the versatility measure for a variety of probability distributions commonly used in risk analysis. △ Less

Submitted 15 March, 2025; v1 submitted 27 July, 2024; originally announced July 2024.

MSC Class: 62F07; 62E10

arXiv:2407.17033 [pdf, other]

Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference

Authors: Jian Xu, Delu Zeng, John Paisley

Abstract: Deep Gaussian processes (DGPs) provide a robust paradigm for Bayesian deep learning. In DGPs, a set of sparse integration locations called inducing points are selected to approximate the posterior distribution of the model. This is done to reduce computational complexity and improve model efficiency. However, inferring the posterior distribution of inducing points is not straightforward. Tradition… ▽ More Deep Gaussian processes (DGPs) provide a robust paradigm for Bayesian deep learning. In DGPs, a set of sparse integration locations called inducing points are selected to approximate the posterior distribution of the model. This is done to reduce computational complexity and improve model efficiency. However, inferring the posterior distribution of inducing points is not straightforward. Traditional variational inference approaches to posterior approximation often lead to significant bias. To address this issue, we propose an alternative method called Denoising Diffusion Variational Inference (DDVI) that uses a denoising diffusion stochastic differential equation (SDE) to generate posterior samples of inducing variables. We rely on score matching methods for denoising diffusion model to approximate score functions with a neural network. Furthermore, by combining classical mathematical theory of SDEs with the minimization of KL divergence between the approximate and true processes, we propose a novel explicit variational lower bound for the marginal likelihood function of DGP. Through experiments on various datasets and comparisons with baseline methods, we empirically demonstrate the effectiveness of DDVI for posterior inference of inducing points for DGP models. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.13195 [pdf, other]

Scalable Exploration via Ensemble++

Authors: Yingru Li, Jiawei Xu, Baoxiang Wang, Zhi-Quan Luo

Abstract: Thompson Sampling is a principled method for balancing exploration and exploitation, but its real-world adoption faces computational challenges in large-scale or non-conjugate settings. While ensemble-based approaches offer partial remedies, they typically require prohibitively large ensemble sizes. We propose Ensemble++, a scalable exploration framework using a novel shared-factor ensemble archit… ▽ More Thompson Sampling is a principled method for balancing exploration and exploitation, but its real-world adoption faces computational challenges in large-scale or non-conjugate settings. While ensemble-based approaches offer partial remedies, they typically require prohibitively large ensemble sizes. We propose Ensemble++, a scalable exploration framework using a novel shared-factor ensemble architecture with random linear combinations. For linear bandits, we provide theoretical guarantees showing that Ensemble++ achieves regret comparable to exact Thompson Sampling with only $Θ(d \log T)$ ensemble sizes--significantly outperforming prior methods. Crucially, this efficiency holds across both compact and finite action sets with either time-invariant or time-varying contexts without configuration changes. We extend this theoretical foundation to nonlinear rewards by replacing fixed features with learnable neural representations while preserving the same incremental update principle, effectively bridging theory and practice for real-world tasks. Comprehensive experiments across linear, quadratic, neural, and GPT-based contextual bandits validate our theoretical findings and demonstrate Ensemble++'s superior regret-computation tradeoff versus state-of-the-art methods. △ Less

Submitted 18 May, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: 53 pages

arXiv:2405.20970 [pdf, other]

PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Authors: Xiaoke Wang, Xiaochen Yang, Rui Zhu, Jing-Hao Xue

Abstract: Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with… ▽ More Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with asymmetric loss (PUAL), by introducing a structure of asymmetric loss on positive instances into the objective function of the global and local learning classifier. Then we develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary. We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 24 pages, 6 figures

arXiv:2405.17479 [pdf, other]

A rationale from frequency perspective for grokking in training neural network

Authors: Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu

Abstract: Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon a… ▽ More Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon across both synthetic and real datasets, offering a novel viewpoint for elucidating the grokking phenomenon by characterizing it through the lens of frequency dynamics during the training process. Our empirical frequency-based analysis sheds new light on understanding the grokking phenomenon and its underlying mechanisms. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2403.00968 [pdf, other]

The Bridged Posterior: Optimization, Profile Likelihood and a New Approach to Generalized Bayes

Authors: Cheng Zeng, Eleni Dilma, Jason Xu, Leo L Duan

Abstract: Optimization is widely used in statistics, thanks to its efficiency for delivering point estimates on useful spaces, such as those satisfying low cardinality or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss function to form a posterior density. Nevertheless, Gibbs posteriors are supported in a high-dimensional space, and do not inherit the comput… ▽ More Optimization is widely used in statistics, thanks to its efficiency for delivering point estimates on useful spaces, such as those satisfying low cardinality or combinatorial structure. To quantify uncertainty, Gibbs posterior exponentiates the negative loss function to form a posterior density. Nevertheless, Gibbs posteriors are supported in a high-dimensional space, and do not inherit the computational efficiency or constraint formulations from optimization. In this article, we explore a new generalized Bayes approach, viewing the likelihood as a function of data, parameters, and latent variables conditionally determined by an optimization sub-problem. Marginally, the latent variable given the data remains stochastic, and is characterized by its posterior distribution. This framework, coined ``bridged posterior'', conforms to the Bayesian paradigm. Besides providing a novel generative model, we obtain a positively surprising theoretical finding that under mild conditions, the $\sqrt{n}$-adjusted posterior distribution of the parameters under our model converges to the same normal distribution as that of the canonical integrated posterior. Therefore, our result formally dispels a long-held belief that partial optimization of latent variables may lead to under-estimation of parameter uncertainty. We demonstrate the practical advantages of our approach under several settings, including maximum-margin classification, latent normal models, and harmonization of multiple networks. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 42 pages, 8 figures

arXiv:2402.10228 [pdf, other]

Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent

Authors: Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo

Abstract: We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offer… ▽ More We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm. △ Less

Submitted 14 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Proceedings of the $\mathit{41}^{st}$ International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). Invited talk in Informs Optimization Conference 2024 and International Symposium on Mathematical Programming 2024

arXiv:2402.08493 [pdf, other]

Sparsity via Sparse Group $k$-max Regularization

Authors: Qinghua Tao, Xiangming Xi, Jun Xu, Johan A. K. Suykens

Abstract: For the linear inverse problem with sparsity constraints, the $l_0$ regularized problem is NP-hard, and existing approaches either utilize greedy algorithms to find almost-optimal solutions or to approximate the $l_0$ regularization with its convex counterparts. In this paper, we propose a novel and concise regularization, namely the sparse group $k$-max regularization, which can not only simultan… ▽ More For the linear inverse problem with sparsity constraints, the $l_0$ regularized problem is NP-hard, and existing approaches either utilize greedy algorithms to find almost-optimal solutions or to approximate the $l_0$ regularization with its convex counterparts. In this paper, we propose a novel and concise regularization, namely the sparse group $k$-max regularization, which can not only simultaneously enhance the group-wise and in-group sparsity, but also casts no additional restraints on the magnitude of variables in each group, which is especially important for variables at different scales, so that it approximate the $l_0$ norm more closely. We also establish an iterative soft thresholding algorithm with local optimality conditions and complexity analysis provided. Through numerical experiments on both synthetic and real-world datasets, we verify the effectiveness and flexibility of the proposed method. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 7 pages, accepted to American Control Conference 2024

arXiv:2401.16421 [pdf, other]

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Authors: Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He

Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute pos… ▽ More In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities. △ Less

Submitted 17 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 7 figures, 8 tables; ICML 2024 Camera Ready version; Code: https://github.com/zhenyuhe00/BiPE

arXiv:2312.15999 [pdf, other]

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Authors: Jianyu Xu, Yu-Xiang Wang

Abstract: We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity, which can be equivalently represented as a valuation with heteroscedastic noise. To solve the problem, we propose a computationally efficie… ▽ More We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity, which can be equivalently represented as a valuation with heteroscedastic noise. To solve the problem, we propose a computationally efficient algorithm called "Pricing with Perturbation (PwP)", which enjoys an $O(\sqrt{dT\log T})$ regret while allowing arbitrary adversarial input context sequences. We also prove a matching lower bound at $Ω(\sqrt{dT})$ to show the optimality regarding $d$ and $T$ (up to $\log T$ factors). Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 29 pages

MSC Class: 91B06; 91B24; 62P20; 62C20; 90B50 ACM Class: I.2.6

arXiv:2312.13875 [pdf, other]

Best Arm Identification in Batched Multi-armed Bandit Problems

Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan

Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed… ▽ More Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed bandit problem. We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification. The linear program leads to a two-stage algorithm that can achieve good theoretical properties. We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.13484 [pdf, other]

Bayesian Transfer Learning

Authors: Piotr M. Suder, Jason Xu, David B. Dunson

Abstract: Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical… ▽ More Transfer learning is a burgeoning concept in statistical machine learning that seeks to improve inference and/or predictive accuracy on a domain of interest by leveraging data from related domains. While the term "transfer learning" has garnered much recent interest, its foundational principles have existed for years under various guises. Prior literature reviews in computer science and electrical engineering have sought to bring these ideas into focus, primarily surveying general methodologies and works from these disciplines. This article highlights Bayesian approaches to transfer learning, which have received relatively limited attention despite their innate compatibility with the notion of drawing upon prior knowledge to guide new learning tasks. Our survey encompasses a wide range of Bayesian transfer learning frameworks applicable to a variety of practical settings. We discuss how these methods address the problem of finding the optimal information to transfer between domains, which is a central question in transfer learning. We illustrate the utility of Bayesian transfer learning methods via a simulation study where we compare performance against frequentist competitors. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.07741 [pdf, other]

Robust Functional Principal Component Analysis for Non-Euclidean Random Objects

Authors: Jiazhen Xu, Andrew T. A. Wood, Tao Zou

Abstract: Functional data analysis offers a diverse toolkit of statistical methods tailored for analyzing samples of real-valued random functions. Recently, samples of time-varying random objects, such as time-varying networks, have been increasingly encountered in modern data analysis. These data structures represent elements within general metric spaces that lack local or global linear structures, renderi… ▽ More Functional data analysis offers a diverse toolkit of statistical methods tailored for analyzing samples of real-valued random functions. Recently, samples of time-varying random objects, such as time-varying networks, have been increasingly encountered in modern data analysis. These data structures represent elements within general metric spaces that lack local or global linear structures, rendering traditional functional data analysis methods inapplicable. Moreover, the existing methodology for time-varying random objects does not work well in the presence of outlying objects. In this paper, we propose a robust method for analysing time-varying random objects. Our method employs pointwise Fréchet medians and then constructs pointwise distance trajectories between the individual time courses and the sample Fréchet medians. This representation effectively transforms time-varying objects into functional data. A novel robust approach to functional principal component analysis based on a Winsorized U-statistic estimator of the covariance structure is introduced. The proposed robust analysis of these distance trajectories is able to identify key features of time-varying objects and is useful for downstream analysis. To illustrate the efficacy of our approach, numerical studies focusing on dynamic networks are conducted. The results indicate that the proposed method exhibits good all-round performance and surpasses the existing approach in terms of robustness, showcasing its superior performance in handling time-varying objects data. △ Less

Submitted 6 March, 2025; v1 submitted 28 November, 2023; originally announced December 2023.

arXiv:2311.05806 [pdf, other]

Likelihood ratio tests in random graph models with increasing dimensions

Authors: Ting Yan, Yuanzhang Li, Jinfeng Xu, Yaning Yang, Ji Zhu

Abstract: We explore the Wilks phenomena in two random graph models: the $β$-model and the Bradley-Terry model. For two increasing dimensional null hypotheses, including a specified null $H_0: β_i=β_i^0$ for $i=1,\ldots, r$ and a homogenous null $H_0: β_1=\cdots=β_r$, we reveal high dimensional Wilks' phenomena that the normalized log-likelihood ratio statistic,… ▽ More We explore the Wilks phenomena in two random graph models: the $β$-model and the Bradley-Terry model. For two increasing dimensional null hypotheses, including a specified null $H_0: β_i=β_i^0$ for $i=1,\ldots, r$ and a homogenous null $H_0: β_1=\cdots=β_r$, we reveal high dimensional Wilks' phenomena that the normalized log-likelihood ratio statistic, $[2\{\ell(\widehat{\mathbfβ}) - \ell(\widehat{\mathbfβ}^0)\} - r]/(2r)^{1/2}$, converges in distribution to the standard normal distribution as $r$ goes to infinity. Here, $\ell( \mathbfβ)$ is the log-likelihood function on the model parameter $\mathbfβ=(β_1, \ldots, β_n)^\top$, $\widehat{\mathbfβ}$ is its maximum likelihood estimator (MLE) under the full parameter space, and $\widehat{\mathbfβ}^0$ is the restricted MLE under the null parameter space. For the homogenous null with a fixed $r$, we establish Wilks-type theorems that $2\{\ell(\widehat{\mathbfβ}) - \ell(\widehat{\mathbfβ}^0)\}$ converges in distribution to a chi-square distribution with $r-1$ degrees of freedom, as the total number of parameters, $n$, goes to infinity. When testing the fixed dimensional specified null, we find that its asymptotic null distribution is a chi-square distribution in the $β$-model. However, unexpectedly, this is not true in the Bradley-Terry model. By developing several novel technical methods for asymptotic expansion, we explore Wilks type results in a principled manner; these principled methods should be applicable to a class of random graph models beyond the $β$-model and the Bradley-Terry model. Simulation studies and real network data applications further demonstrate the theoretical results. △ Less

Submitted 17 March, 2025; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: Major revisions. This paper supersedes arxiv article arXiv:2211.10055 titled "Wilks' theorems in the $β$-model" by T. Yan, Y. Zhang, J. Xu, Y. Yang and J. Zhu

arXiv:2309.15809 [pdf, other]

Fair Canonical Correlation Analysis

Authors: Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Boning Tong, Jia Xu, Yanbo Feng, Qi Long, Li Shen

Abstract: This paper investigates fairness and bias in Canonical Correlation Analysis (CCA), a widely used statistical technique for examining the relationship between two sets of variables. We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes. Our approach enables CCA to learn global projection matrices from all data points whi… ▽ More This paper investigates fairness and bias in Canonical Correlation Analysis (CCA), a widely used statistical technique for examining the relationship between two sets of variables. We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes. Our approach enables CCA to learn global projection matrices from all data points while ensuring that these matrices yield comparable correlation levels to group-specific projection matrices. Experimental evaluation on both synthetic and real-world datasets demonstrates the efficacy of our method in reducing correlation disparity error without compromising CCA accuracy. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted for publication at NeurIPS 2023, 31 Pages, 14 Figures

arXiv:2309.12658 [pdf, other]

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

Authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

Abstract: Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these chal… ▽ More Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11764 [pdf, other]

Causal inference with outcome dependent sampling and mismeasured outcome

Authors: Min Zeng, Zeyang Jia, Zijian Sui, Jinfeng Xu, Hong Zhang

Abstract: Outcome-dependent sampling designs are extensively utilized in various scientific disciplines, including epidemiology, ecology, and economics, with retrospective case-control studies being specific examples of such designs. Additionally, if the outcome used for sample selection is also mismeasured, then it is even more challenging to estimate the average treatment effect (ATE) accurately. To our k… ▽ More Outcome-dependent sampling designs are extensively utilized in various scientific disciplines, including epidemiology, ecology, and economics, with retrospective case-control studies being specific examples of such designs. Additionally, if the outcome used for sample selection is also mismeasured, then it is even more challenging to estimate the average treatment effect (ATE) accurately. To our knowledge, no existing method can address these two issues simultaneously. In this paper, we establish the identifiability of ATE and propose a novel method for estimating ATE in the context of generalized linear model. The estimator is shown to be consistent under some regularity conditions. To relax the model assumption, we also consider generalized additive model. We propose to estimate ATE using penalized B-splines and establish asymptotic properties for the proposed estimator. Our methods are evaluated through extensive simulation studies and the application to a dataset from the UK Biobank, with alcohol intake as the treatment and gout as the outcome. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 49 pages, 5 figures

Showing 1–50 of 298 results for author: Xue, J