Search | arXiv e-print repository

Watermark in the Classroom: A Conformal Framework for Adaptive AI Usage Detection

Authors: Yangxinyu Xie, Xuyang Chen, Zhimei Ren, Weijie J. Su

Abstract: As artificial intelligence tools become ubiquitous in education, maintaining academic integrity while accommodating pedagogically beneficial AI assistance presents unprecedented challenges. Current AI detection systems fail to control false positive rates (FPR) and suffer from bias against minority student groups, prompting institutional suspensions of these technologies. Watermarking techniques o… ▽ More As artificial intelligence tools become ubiquitous in education, maintaining academic integrity while accommodating pedagogically beneficial AI assistance presents unprecedented challenges. Current AI detection systems fail to control false positive rates (FPR) and suffer from bias against minority student groups, prompting institutional suspensions of these technologies. Watermarking techniques offer statistical rigor through precise $p$-values but remain untested in educational contexts where students may use varying levels of permitted AI edits. We present the first adaptation of watermarking-based detection methods for classroom settings, introducing conformal methods that effectively control FPR across diverse classroom settings. Using essays from native and non-native English speakers, we simulate seven levels of AI editing interventions--from grammar correction to content expansion--across multiple language models and watermarking schemes, and evaluate our proposal under these different setups. Our findings provide educators with quantitative frameworks to enforce academic integrity standards while embracing AI integration in the classroom. △ Less

Submitted 30 July, 2025; originally announced July 2025.

arXiv:2506.18562 [pdf, ps, other]

Multi-Rank Subspace Change-Point Detection for Monitoring Robotic Swarms

Authors: Jonghyeok Lee, Yao Xie, Youngser Park, Jason Hindes, Ira Schwartz, Carey Priebe

Abstract: We study the problem of real-time detection of covariance structure changes in high-dimensional streaming data, motivated by applications such as robotic swarm monitoring. Building upon the spiked covariance model, we propose the multi-rank Subspace-CUSUM procedure, which extends the classical CUSUM framework by tracking the top principal components to approximate a likelihood ratio. We provide a… ▽ More We study the problem of real-time detection of covariance structure changes in high-dimensional streaming data, motivated by applications such as robotic swarm monitoring. Building upon the spiked covariance model, we propose the multi-rank Subspace-CUSUM procedure, which extends the classical CUSUM framework by tracking the top principal components to approximate a likelihood ratio. We provide a theoretical analysis of the proposed method by characterizing the expected detection statistics under both pre- and post-change regimes and offer principled guidance for selecting the drift and threshold parameters to control the false alarm rate. The effectiveness of our method is demonstrated through simulations and a real-world application to robotic swarm behavior data. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2505.18526 [pdf, ps, other]

Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

Authors: Yunqin Zhu, Henry Shaowu Yuchi, Yao Xie

Abstract: Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models. The design of expressive and scalable kernels has garnered significant research attention. Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form. However, this approach remains limited by the choice of base kernels, inherits… ▽ More Kernels are key to encoding prior beliefs and data structures in Gaussian process (GP) models. The design of expressive and scalable kernels has garnered significant research attention. Deep kernel learning enhances kernel flexibility by feeding inputs through a neural network before applying a standard parametric form. However, this approach remains limited by the choice of base kernels, inherits high inference costs, and often demands sparse approximations. Drawing on Mercer's theorem, we introduce a fully data-driven, scalable deep kernel representation where a neural network directly represents a low-rank kernel through a small set of basis functions. This construction enables highly efficient exact GP inference in linear time and memory without invoking inducing points. It also supports scalable mini-batch training based on a principled variational inference framework. We further propose a simple variance correction procedure to guard against overconfidence in uncertainty estimates. Experiments on synthetic and real-world data demonstrate the advantages of our deep kernel GP in terms of predictive accuracy, uncertainty quantification, and computational efficiency. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.16051 [pdf, ps, other]

PO-Flow: Flow-based Generative Models for Sampling Potential Outcomes and Counterfactuals

Authors: Dongze Wu, David I. Inouye, Yao Xie

Abstract: We propose PO-Flow, a novel continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcomes and counterfactuals. Trained via flow matching, PO-Flow provides a unified framework for individualized potential outcome prediction, counterfactual predictions, and uncertainty-aware density learning. Among generative models, it is the first to enable density learnin… ▽ More We propose PO-Flow, a novel continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcomes and counterfactuals. Trained via flow matching, PO-Flow provides a unified framework for individualized potential outcome prediction, counterfactual predictions, and uncertainty-aware density learning. Among generative models, it is the first to enable density learning of potential outcomes without requiring explicit distributional assumptions (e.g., Gaussian mixtures), while also supporting counterfactual prediction conditioned on factual outcomes in general observational datasets. On benchmarks such as ACIC, IHDP, and IBM, it consistently outperforms prior methods across a range of causal inference tasks. Beyond that, PO-Flow succeeds in high-dimensional settings, including counterfactual image generation, demonstrating its broad applicability. △ Less

Submitted 21 May, 2025; originally announced May 2025.

arXiv:2504.09706 [pdf, other]

Modeling Discrete Coating Degradation Events via Hawkes Processes

Authors: Matthew Repasky, Henry Yuchi, Fritz Friedersdorf, Yao Xie

Abstract: Forecasting the degradation of coated materials has long been a topic of critical interest in engineering, as it has enormous implications for both system maintenance and sustainable material use. Material degradation is affected by many factors, including the history of corrosion and characteristics of the environment, which can be measured by high-frequency sensors. However, the high volume of d… ▽ More Forecasting the degradation of coated materials has long been a topic of critical interest in engineering, as it has enormous implications for both system maintenance and sustainable material use. Material degradation is affected by many factors, including the history of corrosion and characteristics of the environment, which can be measured by high-frequency sensors. However, the high volume of data produced by such sensors can inhibit efficient modeling and prediction. To alleviate this issue, we propose novel metrics for representing material degradation, taking the form of discrete degradation events. These events maintain the statistical properties of continuous sensor readings, such as correlation with time to coating failure and coefficient of variation at failure, but are composed of orders of magnitude fewer measurements. To forecast future degradation of the coating system, a marked Hawkes process models the events. We use the forecast of degradation to predict a future time of failure, exhibiting superior performance to the approach based on direct modeling of galvanic corrosion using continuous sensor measurements. While such maintenance is typically done on a regular basis, degradation models can enable informed condition-based maintenance, reducing unnecessary excess maintenance and preventing unexpected failures. △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2504.09348

Graph-Based Prediction Models for Data Debiasing

Authors: Dongze Wu, Hanyang Jiang, Yao Xie

Abstract: Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in critical applications such as healthcare and public safety. In this work, we introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reportin… ▽ More Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in critical applications such as healthcare and public safety. In this work, we introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reporting bias probabilities. By modeling the bias as a smooth signal over a graph constructed from geophysical or feature-based similarities, our convex formulation not only ensures a unique solution but also comes with theoretical recovery guarantees under certain assumptions. We validate GROUD on both challenging simulated experiments and real-world datasets -- including Atlanta emergency calls and COVID-19 vaccine adverse event reports -- demonstrating its robustness and superior performance in accurately recovering debiased counts. This approach paves the way for more reliable downstream decision-making in systems affected by reporting irregularities. △ Less

Submitted 18 April, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

Comments: We submitted this arXiv version by mistake. We have decided to update the original submission (arXiv:2307.07898) instead of submitting a separate article

arXiv:2504.06364 [pdf, ps, other]

Deep spatio-temporal point processes: Advances and new directions

Authors: Xiuyuan Cheng, Zheng Dong, Yao Xie

Abstract: Spatio-temporal point processes (STPPs) model discrete events distributed in time and space, with important applications in areas such as criminology, seismology, epidemiology, and social networks. Traditional models often rely on parametric kernels, limiting their ability to capture heterogeneous, nonstationary dynamics. Recent innovations integrate deep neural architectures -- either by modeling… ▽ More Spatio-temporal point processes (STPPs) model discrete events distributed in time and space, with important applications in areas such as criminology, seismology, epidemiology, and social networks. Traditional models often rely on parametric kernels, limiting their ability to capture heterogeneous, nonstationary dynamics. Recent innovations integrate deep neural architectures -- either by modeling the conditional intensity function directly or by learning flexible, data-driven influence kernels, substantially broadening their expressive power. This article reviews the development of the deep influence kernel approach, which enjoys statistical explainability, since the influence kernel remains in the model to capture the spatiotemporal propagation of event influence and its impact on future events, while also possessing strong expressive power, thereby benefiting from both worlds. We explain the main components in developing deep kernel point processes, leveraging tools such as functional basis decomposition and graph neural networks to encode complex spatial or network structures, as well as estimation using both likelihood-based and likelihood-free methods, and address computational scalability for large-scale data. We also discuss the theoretical foundation of kernel identifiability. Simulated and real-data examples highlight applications to crime analysis, earthquake aftershock prediction, and sepsis prediction modeling, and we conclude by discussing promising directions for the field. △ Less

Submitted 22 August, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

Comments: Annual Review of Statistics and Its Application, 2025

arXiv:2502.13394 [pdf, ps, other]

Flow-based generative models as iterative algorithms in probability space

Authors: Yao Xie, Xiuyuan Cheng

Abstract: Generative AI (GenAI) has revolutionized data-driven modeling by enabling the synthesis of high-dimensional data across various applications, including image generation, language modeling, biomedical signal processing, and anomaly detection. Flow-based generative models provide a powerful framework for capturing complex probability distributions, offering exact likelihood estimation, efficient sam… ▽ More Generative AI (GenAI) has revolutionized data-driven modeling by enabling the synthesis of high-dimensional data across various applications, including image generation, language modeling, biomedical signal processing, and anomaly detection. Flow-based generative models provide a powerful framework for capturing complex probability distributions, offering exact likelihood estimation, efficient sampling, and deterministic transformations between distributions. These models leverage invertible mappings governed by Ordinary Differential Equations (ODEs), enabling precise density estimation and likelihood evaluation. This tutorial presents an intuitive mathematical framework for flow-based generative models, formulating them as neural network-based representations of continuous probability densities. We explore key theoretical principles, including the Wasserstein metric, gradient flows, and density evolution governed by ODEs, to establish convergence guarantees and bridge empirical advancements with theoretical insights. By providing a rigorous yet accessible treatment, we aim to equip researchers and practitioners with the necessary tools to effectively apply flow-based generative models in signal processing and machine learning. △ Less

Submitted 6 September, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

Comments: IEEE Signal Processing Magazine, Special Issue on The Mathematics of Deep Learning, 2025

arXiv:2502.05709 [pdf, other]

Flow-based Conformal Prediction for Multi-dimensional Time Series

Authors: Junghwan Lee, Chen Xu, Yao Xie

Abstract: Conformal prediction for time series presents two key challenges: (1) leveraging sequential correlations in features and non-conformity scores and (2) handling multi-dimensional outcomes. We propose a novel conformal prediction method to address these two key challenges by integrating Transformer and Normalizing Flow. Specifically, the Transformer encodes the historical context of time series, and… ▽ More Conformal prediction for time series presents two key challenges: (1) leveraging sequential correlations in features and non-conformity scores and (2) handling multi-dimensional outcomes. We propose a novel conformal prediction method to address these two key challenges by integrating Transformer and Normalizing Flow. Specifically, the Transformer encodes the historical context of time series, and normalizing flow learns the transformation from the base distribution to the distribution of non-conformity scores conditioned on the encoded historical context. This enables the construction of prediction regions by transforming samples from the base distribution using the learned conditional flow. We ensure the marginal coverage by defining the prediction regions as sets in the transformed space that correspond to a predefined probability mass in the base distribution. The model is trained end-to-end by Flow Matching, avoiding the need for computationally intensive numerical solutions of ordinary differential equations. We demonstrate that our proposed method achieves smaller prediction regions compared to the baselines while satisfying the desired coverage through comprehensive experiments using simulated and real-world time series datasets. △ Less

Submitted 8 February, 2025; originally announced February 2025.

arXiv:2501.16393 [pdf, other]

Improving Network Threat Detection by Knowledge Graph, Large Language Model, and Imbalanced Learning

Authors: Lili Zhang, Quanyan Zhu, Herman Ray, Ying Xie

Abstract: Network threat detection has been challenging due to the complexities of attack activities and the limitation of historical threat data to learn from. To help enhance the existing practices of using analytics, machine learning, and artificial intelligence methods to detect the network threats, we propose an integrated modelling framework, where Knowledge Graph is used to analyze the users' activit… ▽ More Network threat detection has been challenging due to the complexities of attack activities and the limitation of historical threat data to learn from. To help enhance the existing practices of using analytics, machine learning, and artificial intelligence methods to detect the network threats, we propose an integrated modelling framework, where Knowledge Graph is used to analyze the users' activity patterns, Imbalanced Learning techniques are used to prune and weigh Knowledge Graph, and LLM is used to retrieve and interpret the users' activities from Knowledge Graph. The proposed framework is applied to Agile Threat Detection through Online Sequential Learning. The preliminary results show the improved threat capture rate by 3%-4% and the increased interpretabilities of risk predictions based on the users' activities. △ Less

Submitted 14 May, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

Comments: Accepted by "Combining AI and OR/MS for Better Trustworthy Decision Making" Bridge Program co-organized by AAAI and INFORMS as poster and demo

arXiv:2412.20556 [pdf, ps, other]

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

Authors: Linglingzhi Zhu, Yao Xie

Abstract: We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using neural network-based generative models to address these computational challeng… ▽ More We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using neural network-based generative models to address these computational challenges but lacks algorithmic convergence guarantees. This paper bridges this theoretical gap by presenting an iterative algorithm to solve such a minimax problem, achieving global convergence under mild assumptions and leveraging technical tools from vector space minimax optimization and convex analysis in the space of continuous probability densities. In particular, leveraging Brenier's theorem, we represent the worst-case distribution as a transport map applied to a continuous reference measure and reformulate the regularized discrepancy-based DRO as a minimax problem in the Wasserstein space. Furthermore, we demonstrate that the worst-case distribution can be efficiently computed using a modified Jordan-Kinderlehrer-Otto (JKO) scheme with sufficiently large regularization parameters for commonly used discrepancy functions, linked to the radius of the ambiguity set. Additionally, we derive the global convergence rate and quantify the total number of subgradient and inexact modified JKO iterations required to obtain approximate stationary points. These results are potentially applicable to nonconvex and nonsmooth scenarios, with broad relevance to modern machine learning applications. △ Less

Submitted 29 December, 2024; originally announced December 2024.

arXiv:2412.16523 [pdf, other]

Physics-Guided Fair Graph Sampling for Water Temperature Prediction in River Networks

Authors: Erhu He, Declan Kutscher, Yiqun Xie, Jacob Zwart, Zhe Jiang, Huaxiu Yao, Xiaowei Jia

Abstract: This work introduces a novel graph neural networks (GNNs)-based method to predict stream water temperature and reduce model bias across locations of different income and education levels. Traditional physics-based models often have limited accuracy because they are necessarily approximations of reality. Recently, there has been an increasing interest of using GNNs in modeling complex water dynamic… ▽ More This work introduces a novel graph neural networks (GNNs)-based method to predict stream water temperature and reduce model bias across locations of different income and education levels. Traditional physics-based models often have limited accuracy because they are necessarily approximations of reality. Recently, there has been an increasing interest of using GNNs in modeling complex water dynamics in stream networks. Despite their promise in improving the accuracy, GNNs can bring additional model bias through the aggregation process, where node features are updated by aggregating neighboring nodes. The bias can be especially pronounced when nodes with similar sensitive attributes are frequently connected. We introduce a new method that leverages physical knowledge to represent the node influence in GNNs, and then utilizes physics-based influence to refine the selection and weights over the neighbors. The objective is to facilitate equitable treatment over different sensitive groups in the graph aggregation, which helps reduce spatial bias over locations, especially for those in underprivileged groups. The results on the Delaware River Basin demonstrate the effectiveness of the proposed method in preserving equitable performance across locations in different sensitive groups. △ Less

Submitted 21 December, 2024; originally announced December 2024.

arXiv:2412.15315 [pdf, other]

Enhancing Masked Time-Series Modeling via Dropping Patches

Authors: Tianyu Qiu, Yi Xie, Yun Xiong, Hao Niu, Xiaofeng Gao

Abstract: This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain,… ▽ More This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus optimizing the quality of the learned representations △ Less

Submitted 19 December, 2024; originally announced December 2024.

arXiv:2412.01098 [pdf, other]

Spatial Conformal Inference through Localized Quantile Regression

Authors: Hanyang Jiang, Yao Xie

Abstract: Reliable uncertainty quantification at unobserved spatial locations, especially in the presence of complex and heterogeneous datasets, remains a core challenge in spatial statistics. Traditional approaches like Kriging rely heavily on assumptions such as normality, which often break down in large-scale, diverse datasets, leading to unreliable prediction intervals. While machine learning methods ha… ▽ More Reliable uncertainty quantification at unobserved spatial locations, especially in the presence of complex and heterogeneous datasets, remains a core challenge in spatial statistics. Traditional approaches like Kriging rely heavily on assumptions such as normality, which often break down in large-scale, diverse datasets, leading to unreliable prediction intervals. While machine learning methods have emerged as powerful alternatives, they primarily focus on point predictions and provide limited mechanisms for uncertainty quantification. Conformal prediction, a distribution-free framework, offers valid prediction intervals without relying on parametric assumptions. However, existing conformal prediction methods are either not tailored for spatial settings, or existing ones for spatial data have relied on rather restrictive i.i.d. assumptions. In this paper, we propose Localized Spatial Conformal Prediction (LSCP), a conformal prediction method designed specifically for spatial data. LSCP leverages localized quantile regression to construct prediction intervals. Instead of i.i.d. assumptions, our theoretical analysis builds on weaker conditions of stationarity and spatial mixing, which is natural for spatial data, providing finite-sample bounds on the conditional coverage gap and establishing asymptotic guarantees for conditional coverage. We present experiments on both synthetic and real-world datasets to demonstrate that LSCP achieves accurate coverage with significantly tighter and more consistent prediction intervals across the spatial domain compared to existing methods. △ Less

Submitted 15 February, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.17099 [pdf, other]

Spatio-Temporal Conformal Prediction for Power Outage Data

Authors: Hanyang Jiang, Yao Xie, Feng Qiu

Abstract: In recent years, increasingly unpredictable and severe global weather patterns have frequently caused long-lasting power outages. Building resilience, the ability to withstand, adapt to, and recover from major disruptions, has become crucial for the power industry. To enable rapid recovery, accurately predicting future outage numbers is essential. Rather than relying on simple point estimates, we… ▽ More In recent years, increasingly unpredictable and severe global weather patterns have frequently caused long-lasting power outages. Building resilience, the ability to withstand, adapt to, and recover from major disruptions, has become crucial for the power industry. To enable rapid recovery, accurately predicting future outage numbers is essential. Rather than relying on simple point estimates, we analyze extensive quarter-hourly outage data and develop a graph conformal prediction method that delivers accurate prediction regions for outage numbers across the states for a time period. We demonstrate the effectiveness of this method through extensive numerical experiments in several states affected by extreme weather events that led to widespread outages. △ Less

Submitted 25 November, 2024; originally announced November 2024.

arXiv:2411.11203 [pdf, ps, other]

Debiasing Watermarks for Large Language Models via Maximal Coupling

Authors: Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang

Abstract: Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach that partitions the token set into ``green'' and ``red'' lists, subtly increasing the generation probability for green tokens. To correct token distribution bias… ▽ More Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach that partitions the token set into ``green'' and ``red'' lists, subtly increasing the generation probability for green tokens. To correct token distribution bias, our method employs maximal coupling, using a uniform coin flip to decide whether to apply bias correction, with the result embedded as a pseudorandom watermark signal. Theoretical analysis confirms this approach's unbiased nature and robust detection capabilities. Experimental results show that it outperforms prior techniques by preserving text quality while maintaining high detectability, and it demonstrates resilience to targeted modifications aimed at improving text quality. This research provides a promising watermarking solution for language models, balancing effective detection with minimal impact on text quality. △ Less

Submitted 12 June, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

Comments: To appear in Journal of the American Statistical Association (JASA)

arXiv:2411.02694 [pdf, other]

Point processes with event time uncertainty

Authors: Xiuyuan Cheng, Tingnan Gong, Yao Xie

Abstract: Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the form… ▽ More Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the formulation in the continuous-time setting under a few assumptions motivated by application scenarios. After imposing a time grid, we obtain a discrete-time model that facilitates inference and can be computed by first-order optimization methods such as Gradient Descent or Variation inequality (VI) using batch-based Stochastic Gradient Descent (SGD). The parameter recovery guarantee is proved for VI inference at an $O(1/k)$ convergence rate using $k$ SGD steps. Our framework handles non-stationary processes by modeling the inference kernel as a matrix (or tensor on a network) and it covers the stationary process, such as the classical Hawkes process, as a special case. We experimentally show that the proposed approach outperforms previous General Linear model (GLM) baselines on simulated and real data and reveals meaningful causal relations on a Sepsis-associated Derangements dataset. △ Less

Submitted 4 November, 2024; originally announced November 2024.

arXiv:2410.17882 [pdf, ps, other]

Identifiable Representation and Model Learning for Latent Dynamic Systems

Authors: Congxi Zhang, Yongchun Xie

Abstract: Learning identifiable representations and models from low-level observations is helpful for an intelligent spacecraft to complete downstream tasks reliably. For temporal observations, to ensure that the data generating process is provably inverted, most existing works either assume the noise variables in the dynamic mechanisms are (conditionally) independent or require that the interventions can d… ▽ More Learning identifiable representations and models from low-level observations is helpful for an intelligent spacecraft to complete downstream tasks reliably. For temporal observations, to ensure that the data generating process is provably inverted, most existing works either assume the noise variables in the dynamic mechanisms are (conditionally) independent or require that the interventions can directly affect each latent variable. However, in practice, the relationship between the exogenous inputs/interventions and the latent variables may follow some complex deterministic mechanisms. In this work, we study the problem of identifiable representation and model learning for latent dynamic systems. The key idea is to use an inductive bias inspired by controllable canonical forms, which are sparse and input-dependent by definition. We prove that, for linear and affine nonlinear latent dynamic systems with sparse input matrices, it is possible to identify the latent variables up to scaling and determine the dynamic models up to some simple transformations. The results have the potential to provide some theoretical guarantees for developing more trustworthy decision-making and control methods for intelligent spacecrafts. △ Less

Submitted 4 December, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

arXiv:2410.02548 [pdf, ps, other]

Local Flow Matching Generative Models

Authors: Chen Xu, Xiuyuan Cheng, Yao Xie

Abstract: Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions, and in particular to generate data from noise. Inspired by the variational nature of the diffusion process as a gradient flow, we introduce a stepwise FM model called Local Flow Matching (LFM), which consecutively learns a sequence of FM sub-models, each matching a… ▽ More Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions, and in particular to generate data from noise. Inspired by the variational nature of the diffusion process as a gradient flow, we introduce a stepwise FM model called Local Flow Matching (LFM), which consecutively learns a sequence of FM sub-models, each matching a diffusion process up to the time of the step size in the data-to-noise direction. In each step, the two distributions to be interpolated by the sub-flow model are closer to each other than data vs. noise, and this enables the use of smaller models with faster training. This variational perspective also allows us to theoretically prove a generation guarantee of the proposed flow model in terms of the $χ^2$-divergence between the generated and true data distributions, utilizing the contraction property of the diffusion process. In practice, the stepwise structure of LFM is natural to be distilled and different distillation techniques can be adopted to speed up generation. We empirically demonstrate improved training efficiency and competitive generative performance of LFM compared to FM on the unconditional generation of tabular data and image datasets, and also on the conditional generation of robotic manipulation policies. △ Less

Submitted 11 July, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

arXiv:2410.02078 [pdf, other]

Posterior sampling via Langevin dynamics based on generative priors

Authors: Vishal Purohit, Matthew Repasky, Jianfeng Lu, Qiang Qiu, Yao Xie, Xiuyuan Cheng

Abstract: Posterior sampling in high-dimensional spaces using generative models holds significant promise for various applications, including but not limited to inverse problems and guided generation tasks. Despite many recent developments, generating diverse posterior samples remains a challenge, as existing methods require restarting the entire generative process for each new sample, making the procedure… ▽ More Posterior sampling in high-dimensional spaces using generative models holds significant promise for various applications, including but not limited to inverse problems and guided generation tasks. Despite many recent developments, generating diverse posterior samples remains a challenge, as existing methods require restarting the entire generative process for each new sample, making the procedure computationally expensive. In this work, we propose efficient posterior sampling by simulating Langevin dynamics in the noise space of a pre-trained generative model. By exploiting the mapping between the noise and data spaces which can be provided by distilled flows or consistency models, our method enables seamless exploration of the posterior without the need to re-run the full sampling chain, drastically reducing computational overhead. Theoretically, we prove a guarantee for the proposed noise-space Langevin dynamics to approximate the posterior, assuming that the generative model sufficiently approximates the prior distribution. Our framework is experimentally validated on image restoration tasks involving noisy linear and nonlinear forward operators applied to LSUN-Bedroom (256 x 256) and ImageNet (64 x 64) datasets. The results demonstrate that our approach generates high-fidelity samples with enhanced semantic diversity even under a limited number of function evaluations, offering superior efficiency and performance compared to existing diffusion-based posterior sampling techniques. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.20547 [pdf, other]

Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

Authors: Dongze Wu, Yao Xie

Abstract: Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) obj… ▽ More Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method built on Continuous Normalizing Flow (CNF) for sampling from high-dimensional and multi-modal distributions. AF is trained with a dynamic Optimal Transport (OT) objective incorporating Wasserstein regularization, and guided by annealing procedures, facilitating effective exploration of modes in high-dimensional spaces. Compared to recent NF methods, AF greatly improves training efficiency and stability, with minimal reliance on MC assistance. We demonstrate the superior performance of AF compared to state-of-the-art methods through experiments on various challenging distributions and real-world datasets, particularly in high-dimensional and multi-modal settings. We also highlight AF potential for sampling the least favorable distributions. △ Less

Submitted 27 May, 2025; v1 submitted 30 September, 2024; originally announced September 2024.

Comments: This paper has been accepted to ICML 2025 and will appear in the Proceedings of Machine Learning Research (PMLR)

arXiv:2409.15597 [pdf, other]

Higher-criticism for sparse multi-stream change-point detection

Authors: Tingnan Gong, Alon Kipnis, Yao Xie

Abstract: We study a statistical procedure based on higher criticism (HC) to address the sparse multi-stream quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of multiple data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing… ▽ More We study a statistical procedure based on higher criticism (HC) to address the sparse multi-stream quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of multiple data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing for a change point in individual streams and combining multiple tests using higher criticism. Relying on HC thresholding, the procedure also indicates a set of streams suspected to be affected by the change. We provide a theoretical analysis under a sparse heteroscedastic normal change-point model. We establish an information-theoretic detection delay lower bound when individual tests are based on the likelihood ratio or the generalized likelihood ratio statistics and show that the delay of the HC-based method converges in distribution to this bound. In the special case of constant variance, our bound coincides with known results in (Chan, 2017). We demonstrate the effectiveness of the HC-based method compared to other methods in detecting sparse changes through extensive numerical evaluations. △ Less

Submitted 19 April, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

Comments: Authors are listed in alphabetical order

arXiv:2409.10882 [pdf, ps, other]

Spatio-Temporal-Network Point Processes for Modeling Crime Events with Landmarks

Authors: Zheng Dong, Jorge Mateu, Yao Xie

Abstract: Self-exciting point processes are widely used to model the contagious effects of crime events living within continuous geographic space, using their occurrence time and locations. However, in urban environments, most events are naturally constrained within the city's street network structure, and the contagious effects of crime are governed by such a network geography. Meanwhile, the complex distr… ▽ More Self-exciting point processes are widely used to model the contagious effects of crime events living within continuous geographic space, using their occurrence time and locations. However, in urban environments, most events are naturally constrained within the city's street network structure, and the contagious effects of crime are governed by such a network geography. Meanwhile, the complex distribution of urban infrastructures also plays an important role in shaping crime patterns across space. We introduce a novel spatio-temporal-network point process framework for crime modeling that integrates these urban environmental characteristics by incorporating self-attention graph neural networks. Our framework incorporates the street network structure as the underlying event space, where crime events can occur at random locations on the network edges. To realistically capture criminal movement patterns, distances between events are measured using street network distances. We then propose a new mark for a crime event by concatenating the event's crime category with the type of its nearby landmark, aiming to capture how the urban design influences the mixing structures of various crime types. A graph attention network architecture is adopted to learn the existence of mark-to-mark interactions. Extensive experiments on crime data from Valencia, Spain, demonstrate the effectiveness of our framework in understanding the crime landscape and forecasting crime risks across regions. △ Less

Submitted 30 September, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.03986 [pdf, other]

An Efficient and Generalizable Symbolic Regression Method for Time Series Analysis

Authors: Yi Xie, Tianyu Qiu, Yun Xiong, Xiuqi Huang, Xiaofeng Gao, Chao Chen

Abstract: Time series analysis and prediction methods currently excel in quantitative analysis, offering accurate future predictions and diverse statistical indicators, but generally falling short in elucidating the underlying evolution patterns of time series. To gain a more comprehensive understanding and provide insightful explanations, we utilize symbolic regression techniques to derive explicit express… ▽ More Time series analysis and prediction methods currently excel in quantitative analysis, offering accurate future predictions and diverse statistical indicators, but generally falling short in elucidating the underlying evolution patterns of time series. To gain a more comprehensive understanding and provide insightful explanations, we utilize symbolic regression techniques to derive explicit expressions for the non-linear dynamics in the evolution of time series variables. However, these techniques face challenges in computational efficiency and generalizability across diverse real-world time series data. To overcome these challenges, we propose \textbf{N}eural-\textbf{E}nhanced \textbf{Mo}nte-Carlo \textbf{T}ree \textbf{S}earch (NEMoTS) for time series. NEMoTS leverages the exploration-exploitation balance of Monte-Carlo Tree Search (MCTS), significantly reducing the search space in symbolic regression and improving expression quality. Furthermore, by integrating neural networks with MCTS, NEMoTS not only capitalizes on their superior fitting capabilities to concentrate on more pertinent operations post-search space reduction, but also replaces the complex and time-consuming simulation process, thereby substantially improving computational efficiency and generalizability in time series analysis. NEMoTS offers an efficient and comprehensive approach to time series analysis. Experiments with three real-world datasets demonstrate NEMoTS's significant superiority in performance, efficiency, reliability, and interpretability, making it well-suited for large-scale real-world time series data. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.09672 [pdf, other]

Regularization for Adversarial Robust Learning

Authors: Jie Wang, Rui Gao, Yao Xie

Abstract: Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounte… ▽ More Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates $φ$-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased oracles to solve this problem efficiently, achieving the near-optimal sample complexity. Moreover, we establish its regularization effects and demonstrate it is asymptotic equivalence to a regularized empirical risk minimization framework, by considering various scaling regimes of the regularization parameter and robustness level. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks. △ Less

Submitted 22 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

Comments: 51 pages, 5 figures

arXiv:2408.09258 [pdf, other]

Atlanta Gun Violence Modeling via Nonstationary Spatio-temporal Point Processes

Authors: Zheng Dong, Yao Xie

Abstract: Analysis of gun violence in the United States has utilized various models based on spatiotemporal point processes. Previous studies have identified a contagion effect in gun violence, characterized by bursts of diffusion across urban environments, which can be effectively represented using the self-excitatory spatiotemporal Hawkes process. The Hawkes process and its variants have been successful i… ▽ More Analysis of gun violence in the United States has utilized various models based on spatiotemporal point processes. Previous studies have identified a contagion effect in gun violence, characterized by bursts of diffusion across urban environments, which can be effectively represented using the self-excitatory spatiotemporal Hawkes process. The Hawkes process and its variants have been successful in modeling self-excitatory events, including earthquakes, disease outbreaks, financial market movements, neural activity, and the viral spread of memes on social networks. However, existing Hawkes models applied to gun violence often rely on simplistic stationary kernels, which fail to account for the complex, non-homogeneous spread of influence and impact over space and time. To address this limitation, we adopt a non-stationary spatiotemporal point process model that incorporates a neural network-based kernel to better represent the varied correlations among events of gun violence. Our study analyzes a comprehensive dataset of approximately 16,000 gunshot events in the Atlanta metropolitan area from 2021 to 2023. The cornerstone of our approach is the innovative non-stationary kernel, designed to enhance the model's expressiveness while preserving its interpretability. This approach not only demonstrates strong predictive performance but also provides insights into the spatiotemporal dynamics of gun violence and its propagation within urban settings. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.07219 [pdf, other]

Causal Effect Estimation using identifiable Variational AutoEncoder with Latent Confounders and Post-Treatment Variables

Authors: Yang Xie, Ziqi Xu, Debo Cheng, Jiuyong Li, Lin Liu, Yinghao Zhang, Zaiwen Feng

Abstract: Estimating causal effects from observational data is challenging, especially in the presence of latent confounders. Much work has been done on addressing this challenge, but most of the existing research ignores the bias introduced by the post-treatment variables. In this paper, we propose a novel method of joint Variational AutoEncoder (VAE) and identifiable Variational AutoEncoder (iVAE) for lea… ▽ More Estimating causal effects from observational data is challenging, especially in the presence of latent confounders. Much work has been done on addressing this challenge, but most of the existing research ignores the bias introduced by the post-treatment variables. In this paper, we propose a novel method of joint Variational AutoEncoder (VAE) and identifiable Variational AutoEncoder (iVAE) for learning the representations of latent confounders and latent post-treatment variables from their proxy variables, termed CPTiVAE, to achieve unbiased causal effect estimation from observational data. We further prove the identifiability in terms of the representation of latent post-treatment variables. Extensive experiments on synthetic and semi-synthetic datasets demonstrate that the CPTiVAE outperforms the state-of-the-art methods in the presence of latent confounders and post-treatment variables. We further apply CPTiVAE to a real-world dataset to show its potential application. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.10976 [pdf, other]

Learning Cellular Network Connection Quality with Conformal

Authors: Hanyang Jiang, Elizabeth Belding, Ellen Zegure, Yao Xie

Abstract: In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a p… ▽ More In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a prediction interval that can encompass the expected range of speed variations. In order to build an accurate network estimation map, numerous mobile data need to be collected at different locations. Currently, public datasets rely on users to upload data through apps. Although massive data has been collected, the datasets suffer from significant noise due to the nature of cellular networks and various other factors. Additionally, the uneven distribution of population density affects the spatial consistency of data collection, leading to substantial uncertainty in the network quality maps derived from this data. We focus our analysis on large-scale internet-quality datasets provided by Ookla to construct an estimated map of connection quality. To improve the reliability of this map, we introduce a novel conformal prediction technique to build an uncertainty map. We identify regions with heightened uncertainty to prioritize targeted, manual data collection. In addition, the uncertainty map quantifies how reliable the prediction is in different areas. Our method also leads to a sampling strategy that guides researchers to selectively gather high-quality data that best complement the current dataset to improve the overall accuracy of the prediction model. △ Less

Submitted 4 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2311.05641

arXiv:2407.09964 [pdf, other]

TrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional Regression

Authors: Ricardo Baptista, Eliza O'Reilly, Yangxinyu Xie

Abstract: We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach know… ▽ More We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for the estimation of the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across a variety of settings with both simulated and real data. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 39 pages, 10 figures

arXiv:2406.16136 [pdf, other]

Distribution-Free Online Change Detection for Low-Rank Images

Authors: Tingnan Gong, Seong-Hee Kim, Yao Xie

Abstract: We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific para… ▽ More We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific parametric distribution. We propose new monitoring statistics that utilize the low-rank structure of the in-control mean matrix. Additionally, we study the properties of the proposed detection procedure, assessing whether the monitoring statistics effectively capture a mean shift and evaluating the rate of increase in the average run length relative to the control limit in both the in-control and out-of-control cases. The effectiveness of our procedure is demonstrated through simulated and real data experiments. △ Less

Submitted 27 February, 2025; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: 30 pages, 7 figures

arXiv:2406.06894 [pdf, other]

Nonlinear time-series embedding by monotone variational inequality

Authors: Jonathan Y. Zhou, Yao Xie

Abstract: In the wild, we often encounter collections of sequential data such as electrocardiograms, motion capture, genomes, and natural language, and sequences may be multichannel or symbolic with nonlinear dynamics. We introduce a new method to learn low-dimensional representations of nonlinear time series without supervision and can have provable recovery guarantees. The learned representation can be us… ▽ More In the wild, we often encounter collections of sequential data such as electrocardiograms, motion capture, genomes, and natural language, and sequences may be multichannel or symbolic with nonlinear dynamics. We introduce a new method to learn low-dimensional representations of nonlinear time series without supervision and can have provable recovery guarantees. The learned representation can be used for downstream machine-learning tasks such as clustering and classification. The method is based on the assumption that the observed sequences arise from a common domain, but each sequence obeys its own autoregressive models that are related to each other through low-rank regularization. We cast the problem as a computationally efficient convex matrix parameter recovery problem using monotone Variational Inequality and encode the common domain assumption via low-rank constraint across the learned representations, which can learn the geometry for the entire domain as well as faithful representations for the dynamics of each individual sequence using the domain information in totality. We show the competitive performance of our method on real-world time-series data with the baselines and demonstrate its effectiveness for symbolic text modeling and RNA sequence clustering. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04859 [pdf, other]

Stochastic full waveform inversion with deep generative prior for uncertainty quantification

Authors: Yuke Xie, Hervé Chauris, Nicolas Desassis

Abstract: To obtain high-resolution images of subsurface structures from seismic data, seismic imaging techniques such as Full Waveform Inversion (FWI) serve as crucial tools. However, FWI involves solving a nonlinear and often non-unique inverse problem, presenting challenges such as local minima trapping and inadequate handling of inherent uncertainties. In addressing these challenges, we propose leveragi… ▽ More To obtain high-resolution images of subsurface structures from seismic data, seismic imaging techniques such as Full Waveform Inversion (FWI) serve as crucial tools. However, FWI involves solving a nonlinear and often non-unique inverse problem, presenting challenges such as local minima trapping and inadequate handling of inherent uncertainties. In addressing these challenges, we propose leveraging deep generative models as the prior distribution of geophysical parameters for stochastic Bayesian inversion. This approach integrates the adjoint state gradient for efficient back-propagation from the numerical solution of partial differential equations. Additionally, we introduce explicit and implicit variational Bayesian inference methods. The explicit method computes variational distribution density using a normalizing flow-based neural network, enabling computation of the Bayesian posterior of parameters. Conversely, the implicit method employs an inference network attached to a pretrained generative model to estimate density, incorporating an entropy estimator. Furthermore, we also experimented with the Stein Variational Gradient Descent (SVGD) method as another variational inference technique, using particles. We compare these variational Bayesian inference methods with conventional Markov chain Monte Carlo (McMC) sampling. Each method is able to quantify uncertainties and to generate seismic data-conditioned realizations of subsurface geophysical parameters. This framework provides insights into subsurface structures while accounting for inherent uncertainties. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.16828 [pdf, ps, other]

Kernel-based Optimally Weighted Conformal Prediction Intervals

Authors: Jonghyeok Lee, Chen Xu, Yao Xie

Abstract: In this work, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional… ▽ More In this work, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of KOWCPI on real and synthetic time-series data against state-of-the-art methods, where KOWCPI achieves narrower confidence intervals without losing coverage. △ Less

Submitted 31 May, 2025; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15441 [pdf, ps, other]

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Authors: Jie Wang, March Boedihardjo, Yao Xie

Abstract: Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data int… ▽ More Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimension before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the obtained solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing. △ Less

Submitted 18 July, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by ICML-2025

arXiv:2404.18838 [pdf, other]

Accurate adaptive deep learning method for solving elliptic problems

Authors: Jingyong Ying, Yaqi Xie, Jiao Li, Hongqiao Wang

Abstract: Deep learning method is of great importance in solving partial differential equations. In this paper, inspired by the failure-informed idea proposed by Gao et.al. (SIAM Journal on Scientific Computing 45(4)(2023)) and as an improvement, a new accurate adaptive deep learning method is proposed for solving elliptic problems, including the interface problems and the convection-dominated problems. Bas… ▽ More Deep learning method is of great importance in solving partial differential equations. In this paper, inspired by the failure-informed idea proposed by Gao et.al. (SIAM Journal on Scientific Computing 45(4)(2023)) and as an improvement, a new accurate adaptive deep learning method is proposed for solving elliptic problems, including the interface problems and the convection-dominated problems. Based on the failure probability framework, the piece-wise uniform distribution is used to approximate the optimal proposal distribution and an kernel-based method is proposed for efficient sampling. Together with the improved Levenberg-Marquardt optimization method, the proposed adaptive deep learning method shows great potential in improving solution accuracy. Numerical tests on the elliptic problems without interface conditions, on the elliptic interface problem, and on the convection-dominated problems demonstrate the effectiveness of the proposed method, as it reduces the relative errors by a factor varying from $10^2$ to $10^4$ for different cases. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.11509 [pdf, other]

VC Theory for Inventory Policies

Authors: Yaqi Xie, Will Ma, Linwei Xin

Abstract: Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inv… ▽ More Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalization error of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of ``learning less is more'' in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error an order of magnitude lower than that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions. △ Less

Submitted 7 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.03329 [pdf]

COMPILED: Deep Metric Learning for Defect Classification of Threaded Pipe Connections using Multichannel Partially Observed Functional Data

Authors: Juan Du, Yukun Xie, Chen Zhang

Abstract: In modern manufacturing, most products are conforming. Few products are nonconforming with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing technology development, process variables evolved as time changes, which can be collected in high resolution as multichannel functional data. These functional data have rich… ▽ More In modern manufacturing, most products are conforming. Few products are nonconforming with different defect types. The identification of defect types can help further root cause diagnosis of production lines. With the sensing technology development, process variables evolved as time changes, which can be collected in high resolution as multichannel functional data. These functional data have rich information to characterize the process and help identify the defect types. Motivated by a real example from the threaded pipe connection process, we focus on defect classification where each sample is represented as partially observed multichannel functional data. However, the available samples for each defect type are limited and imbalanced. The functional data is partially observed since the pre-connection process before the threaded pipe connection process is unobserved as there is no sensor installed in the production line. Therefore, the defect classification based on imbalanced, multichannel, and partially observed functional data is very important but challenging. To deal with these challenges, we propose an innovative classification approach named as COMPILED based on deep metric learning. The framework leverages the power of deep metric learning to train on imbalanced datasets. A novel neural network structure is proposed to handle multichannel partially observed functional data. The results from a real-world case study demonstrate the superior accuracy of our framework when compared to existing benchmarks. △ Less

Submitted 8 December, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Submitted version to IISE Transactions

arXiv:2403.14822 [pdf, other]

Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets

Authors: Jie Wang, Rui Gao, Yao Xie

Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c… ▽ More We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-convex, non-smooth probabilistic functions that are often intractable to optimize, existing methods resort to approximations rather than exact solutions. To tackle the challenge, we introduce an exact mixed-integer exponential conic reformulation of the problem, which can be solved into a global optimum with a moderate amount of input data. Subsequently, we propose a convex approximation, demonstrating its superiority over current state-of-the-art methodologies in literature. Furthermore, we establish connections between robust hypothesis testing and regularized formulations of non-robust risk functions, offering insightful interpretations. Our numerical study highlights the satisfactory testing performance and computational efficiency of the proposed framework. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 26 pages, 2 figures

arXiv:2403.09042 [pdf, other]

Recurrent Events Modeling Based on a Reflected Brownian Motion with Application to Hypoglycemia

Authors: Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan

Abstract: Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient… ▽ More Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower-boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower-boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the Deviance Information Criterion and the Logarithm of the Pseudo-Marginal Likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03850 [pdf, other]

Conformal prediction for multi-dimensional time series by ellipsoidal sets

Authors: Chen Xu, Hanyang Jiang, Yao Xie

Abstract: Conformal prediction (CP) has been a popular method for uncertainty quantification because it is distribution-free, model-agnostic, and theoretically sound. For forecasting problems in supervised learning, most CP methods focus on building prediction intervals for univariate responses. In this work, we develop a sequential CP method called $\texttt{MultiDimSPCI}$ that builds prediction… ▽ More Conformal prediction (CP) has been a popular method for uncertainty quantification because it is distribution-free, model-agnostic, and theoretically sound. For forecasting problems in supervised learning, most CP methods focus on building prediction intervals for univariate responses. In this work, we develop a sequential CP method called $\texttt{MultiDimSPCI}$ that builds prediction $\textit{regions}$ for a multivariate response, especially in the context of multivariate time series, which are not exchangeable. Theoretically, we estimate $\textit{finite-sample}$ high-probability bounds on the conditional coverage gap. Empirically, we demonstrate that $\texttt{MultiDimSPCI}$ maintains valid coverage on a wide range of multivariate time series while producing smaller prediction regions than CP and non-CP baselines. △ Less

Submitted 23 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted by the Forty-first International Conference on Machine Learning (ICML 2024)

arXiv:2401.15262 [pdf, other]

Asymptotic Behavior of Adversarial Training Estimator under $\ell_\infty$-Perturbation

Authors: Yiling Xie, Xiaoming Huo

Abstract: Adversarial training has been proposed to protect machine learning models against adversarial attacks. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the asymptotic distribution of t… ▽ More Adversarial training has been proposed to protect machine learning models against adversarial attacks. This paper focuses on adversarial training under $\ell_\infty$-perturbation, which has recently attracted much research attention. The asymptotic behavior of the adversarial training estimator is investigated in the generalized linear model. The results imply that the asymptotic distribution of the adversarial training estimator under $\ell_\infty$-perturbation could put a positive probability mass at $0$ when the true parameter is $0$, providing a theoretical guarantee of the associated sparsity-recovery ability. Alternatively, a two-step procedure is proposed -- adaptive adversarial training, which could further improve the performance of adversarial training under $\ell_\infty$-perturbation. Specifically, the proposed procedure could achieve asymptotic variable-selection consistency and unbiasedness. Numerical experiments are conducted to show the sparsity-recovery ability of adversarial training under $\ell_\infty$-perturbation and to compare the empirical performance between classic adversarial training and adaptive adversarial training. △ Less

Submitted 2 March, 2025; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.08324 [pdf, other]

Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data

Authors: Bencong Zhu, Guanyu Hu, Yang Xie, Lin Xu, Xiaodan Fan, Qiwei Li

Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These… ▽ More The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These challenges pose obstacles to effective clustering, which is a fundamental problem in SRT data analysis. Current computational approaches often rely on heuristic data preprocessing and arbitrary cluster number prespecification, leading to considerable information loss and consequently, suboptimal downstream analysis. In response to these challenges, we introduce BNPSpace, a novel Bayesian nonparametric spatial clustering framework that directly models SRT count data. BNPSpace facilitates the partitioning of the whole spatial domain, which is characterized by substantial heterogeneity, into homogeneous spatial domains with similar molecular characteristics while identifying a parsimonious set of discriminating genes among different spatial domains. Moreover, BNPSpace incorporates spatial information through a Markov random field prior model, encouraging a smooth and biologically meaningful partition pattern. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.05404 [pdf, other]

Disentangled Latent Representation Learning for Tackling the Confounding M-Bias Problem in Causal Inference

Authors: Debo Cheng, Yang Xie, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Yinghao Zhang, Zaiwen Feng

Abstract: In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they… ▽ More In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they fail to handle the M-bias. In this paper, we identify a challenging and unsolved problem caused by a variable that leads to confounding bias and M-bias simultaneously. To address this problem with co-occurring M-bias and confounding bias, we propose a novel Disentangled Latent Representation learning framework for learning latent representations from proxy variables for unbiased Causal effect Estimation (DLRCE) from observational data. Specifically, DLRCE learns three sets of latent representations from the measured proxy variables to adjust for the confounding bias and M-bias. Extensive experiments on both synthetic and three real-world datasets demonstrate that DLRCE significantly outperforms the state-of-the-art estimators in the case of the presence of both confounding bias and M-bias. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures and 5 tables. Accepted by ICDM2023

arXiv:2312.02959 [pdf, other]

Detecting algorithmic bias in medical-AI models using trees

Authors: Jeffrey Smith, Andre Holder, Rishikesan Kamaleswaran, Yao Xie

Abstract: With the growing prevalence of machine learning and artificial intelligence-based medical decision support systems, it is equally important to ensure that these systems provide patient outcomes in a fair and equitable fashion. This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems. Our approach efficiently identifies potential bia… ▽ More With the growing prevalence of machine learning and artificial intelligence-based medical decision support systems, it is equally important to ensure that these systems provide patient outcomes in a fair and equitable fashion. This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems. Our approach efficiently identifies potential biases in medical-AI models, specifically in the context of sepsis prediction, by employing the Classification and Regression Trees (CART) algorithm with conformity scores. We verify our methodology by conducting a series of synthetic data experiments, showcasing its ability to estimate areas of bias in controlled settings precisely. The effectiveness of the concept is further validated by experiments using electronic medical records from Grady Memorial Hospital in Atlanta, Georgia. These tests demonstrate the practical implementation of our strategy in a clinical environment, where it can function as a vital instrument for guaranteeing fairness and equity in AI-based medical decisions. △ Less

Submitted 29 October, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 26 pages, 9 figures

arXiv:2311.05641 [pdf, other]

Mobile Internet Quality Estimation using Self-Tuning Kernel Regression

Authors: Hanyang Jiang, Henry Shaowu Yuchi, Elizabeth Belding, Ellen Zegura, Yao Xie

Abstract: Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the… ▽ More Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.19787 [pdf]

$e^{\text{RPCA}}$: Robust Principal Component Analysis for Exponential Family Distributions

Authors: Xiaojun Zheng, Simon Mak, Liyan Xie, Yao Xie

Abstract: Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, exis… ▽ More Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non-Gaussian. We thus propose a new method called Robust Principal Component Analysis for Exponential Family distributions ($e^{\text{RPCA}}$), which can perform the desired decomposition into low-rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient $e^{\text{RPCA}}$ decomposition. The effectiveness of $e^{\text{RPCA}}$ is then demonstrated in two applications: the first for steel sheet defect detection, and the second for crime activity monitoring in the Atlanta metropolitan area. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.19253 [pdf, other]

Flow-based Distributionally Robust Optimization

Authors: Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie

Abstract: We present a computationally efficient framework, called $\texttt{FlowDRO}$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to p… ▽ More We present a computationally efficient framework, called $\texttt{FlowDRO}$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data. △ Less

Submitted 24 February, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: IEEE Journal on Selected Areas in Information Theory (JSAIT). Accepted. 2024

arXiv:2310.17582 [pdf, other]

doi 10.1109/TIT.2024.3422412

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

Authors: Xiuyuan Cheng, Jianfeng Lu, Yixin Tan, Yao Xie

Abstract: Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In… ▽ More Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models. △ Less

Submitted 3 July, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.03258 [pdf, other]

Assessing Electricity Service Unfairness with Transfer Counterfactual Learning

Authors: Song Wei, Xiangrui Kong, Alinson Santos Xavier, Shixiang Zhu, Yao Xie, Feng Qiu

Abstract: Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average caus… ▽ More Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average causal effect of a specific protected attribute. Subsequently, we use subgroup analysis to handle model heterogeneity and introduce a novel method for estimating counterfactual unfairness based on transfer learning, which helps to alleviate the data scarcity in each subgroup. In our numerical analysis, we apply our method to a unique large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages under both daily and post-disaster operations, and such discrimination is exacerbated under severe conditions. These findings suggest a widespread, systematic issue of injustice in the power service systems and emphasize the necessity for focused interventions in disadvantaged communities. △ Less

Submitted 24 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The preliminary version titled "Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets" is presented at NeurIPS 2023 Workshops on Causal Representation Learning (CRL) and Algorithmic Fairness through the Lens of Time (AFT); See v1

arXiv:2309.08911 [pdf, ps, other]

Efficient Methods for Non-stationary Online Learning

Authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou

Abstract: Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of non-stationarity, in which multiple base-learners are maintai… ▽ More Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of non-stationarity, in which multiple base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises concerns about computational complexity -- such methods typically maintain $O(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret that reduce the number of projections per round from $O(\log T)$ to $1$. The proposed algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial modifications for non-stationary online methods. Furthermore, we study an even stronger measure, namely "interval dynamic regret", and reduce the number of projections per round from $O(\log^2 T)$ to $1$ for minimizing it. Our reduction demonstrates broad generality and applies to two important applications: online stochastic control and online principal component analysis, resulting in methods that are both efficient and optimal. Finally, empirical studies verify our theoretical findings. △ Less

Submitted 8 September, 2025; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: V3 changes: accepted by JMLR 2025 and improve the writing; V2/V1 changes: investigate interval dynamic regret and add two applications (online non-stochastic control and online PCA) and improve the presentation; preliminary version published at NeurIPS'22

Journal ref: Journal of Machine Learning Research, 2025

Showing 1–50 of 212 results for author: Xie, Y