-
Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference
Authors:
Frank Shih,
Zhenghao Jiang,
Faming Liang
Abstract:
Uncertainty quantification (UQ) in scientific machine learning is increasingly critical as neural networks are widely adopted to tackle complex problems across diverse scientific disciplines. For physics-informed neural networks (PINNs), a prominent model in scientific machine learning, uncertainty is typically quantified using Bayesian or dropout methods. However, both approaches suffer from a fu…
▽ More
Uncertainty quantification (UQ) in scientific machine learning is increasingly critical as neural networks are widely adopted to tackle complex problems across diverse scientific disciplines. For physics-informed neural networks (PINNs), a prominent model in scientific machine learning, uncertainty is typically quantified using Bayesian or dropout methods. However, both approaches suffer from a fundamental limitation: the prior distribution or dropout rate required to construct honest confidence sets cannot be determined without additional information. In this paper, we propose a novel method within the framework of extended fiducial inference (EFI) to provide rigorous uncertainty quantification for PINNs. The proposed method leverages a narrow-neck hyper-network to learn the parameters of the PINN and quantify their uncertainty based on imputed random errors in the observations. This approach overcomes the limitations of Bayesian and dropout methods, enabling the construction of honest confidence sets based solely on observed data. This advancement represents a significant breakthrough for PINNs, greatly enhancing their reliability, interpretability, and applicability to real-world scientific and engineering challenges. Moreover, it establishes a new theoretical framework for EFI, extending its application to large-scale models, eliminating the need for sparse hyper-networks, and significantly improving the automaticity and robustness of statistical inference.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Missing Data Imputation by Reducing Mutual Information with Rectified Flows
Authors:
Jiahao Yu,
Qizhen Ying,
Leyang Wang,
Ziyue Jiang,
Song Liu
Abstract:
This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minim…
▽ More
This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Tutorial on Bayesian Functional Regression Using Stan
Authors:
Ziren Jiang,
Ciprian Crainiceanu,
Erjia Cui
Abstract:
This manuscript provides step-by-step instructions for implementing Bayesian functional regression models using Stan. Extensive simulations indicate that the inferential performance of the methods is comparable to that of state-of-the-art frequentist approaches. However, Bayesian approaches allow for more flexible modeling and provide an alternative when frequentist methods are not available or ma…
▽ More
This manuscript provides step-by-step instructions for implementing Bayesian functional regression models using Stan. Extensive simulations indicate that the inferential performance of the methods is comparable to that of state-of-the-art frequentist approaches. However, Bayesian approaches allow for more flexible modeling and provide an alternative when frequentist methods are not available or may require additional development. Methods and software are illustrated using the accelerometry data from the National Health and Nutrition Examination Survey (NHANES).
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Wide & Deep Learning for Node Classification
Authors:
Yancheng Chen,
Wenguo Yang,
Zhipeng Jiang
Abstract:
Wide & Deep, a simple yet effective learning architecture for recommendation systems developed by Google, has had a significant impact in both academia and industry due to its combination of the memorization ability of generalized linear models and the generalization ability of deep models. Graph convolutional networks (GCNs) remain dominant in node classification tasks; however, recent studies ha…
▽ More
Wide & Deep, a simple yet effective learning architecture for recommendation systems developed by Google, has had a significant impact in both academia and industry due to its combination of the memorization ability of generalized linear models and the generalization ability of deep models. Graph convolutional networks (GCNs) remain dominant in node classification tasks; however, recent studies have highlighted issues such as heterophily and expressiveness, which focus on graph structure while seemingly neglecting the potential role of node features. In this paper, we propose a flexible framework GCNIII, which leverages the Wide & Deep architecture and incorporates three techniques: Intersect memory, Initial residual and Identity mapping. We provide comprehensive empirical evidence showing that GCNIII can more effectively balance the trade-off between over-fitting and over-generalization on various semi- and full- supervised tasks. Additionally, we explore the use of large language models (LLMs) for node feature engineering to enhance the performance of GCNIII in cross-domain node classification tasks. Our implementation is available at https://github.com/CYCUCAS/GCNIII.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Pre-Training Estimators for Structural Models: Application to Consumer Search
Authors:
Yanhao 'Max' Wei,
Zhenling Jiang
Abstract:
We explore pretraining estimators for structural econometric models. The estimator is "pretrained" in the sense that the bulk of the computational cost and researcher effort occur during the construction of the estimator. Subsequent applications of the estimator to different datasets require little computational cost or researcher effort. The estimation leverages a neural net to recognize the stru…
▽ More
We explore pretraining estimators for structural econometric models. The estimator is "pretrained" in the sense that the bulk of the computational cost and researcher effort occur during the construction of the estimator. Subsequent applications of the estimator to different datasets require little computational cost or researcher effort. The estimation leverages a neural net to recognize the structural model's parameter from data patterns. As an initial trial, this paper builds a pretrained estimator for a sequential search model that is known to be difficult to estimate. We evaluate the pretrained estimator on 12 real datasets. The estimation takes seconds to run and shows high accuracy. We provide the estimator at pnnehome.github.io. More generally, pretrained, off-the-shelf estimators can make structural models more accessible to researchers and practitioners.
△ Less
Submitted 19 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Online Fault Detection and Classification of Chemical Process Systems Leveraging Statistical Process Control and Riemannian Geometric Analysis
Authors:
Alireza Miraliakbar,
Fangyuan Ma,
Zheyu Jiang
Abstract:
In this work, we study an integrated fault detection and classification framework called FARM for fast, accurate, and robust online chemical process monitoring. The FARM framework integrates the latest advancements in statistical process control (SPC) for monitoring nonparametric and heterogeneous data streams with novel data analysis approaches based on Riemannian geometry together in a hierarchi…
▽ More
In this work, we study an integrated fault detection and classification framework called FARM for fast, accurate, and robust online chemical process monitoring. The FARM framework integrates the latest advancements in statistical process control (SPC) for monitoring nonparametric and heterogeneous data streams with novel data analysis approaches based on Riemannian geometry together in a hierarchical framework for online process monitoring. We conduct a systematic evaluation of the FARM monitoring framework using the Tennessee Eastman Process (TEP) dataset. Results show that FARM performs competitively against state-of-the-art process monitoring algorithms by achieving a good balance among fault detection rate (FDR), fault detection speed (FDS), and false alarm rate (FAR). Specifically, FARM achieved an average FDR of 96.97% while also outperforming benchmark methods in successfully detecting hard-to-detect faults that are previously known, including Faults 3, 9 and 15, with FDRs being 97.08%, 96.30% and 95.99%, respectively. In terms of FAR, our FARM framework allows practitioners to customize their choice of FAR, thereby offering great flexibility. Moreover, we report a significant improvement in average fault classification accuracy during online monitoring from 61% to 82% when leveraging Riemannian geometric analysis, and further to 84.5% when incorporating additional features from SPC. This illustrates the synergistic effect of integrating fault detection and classification in a holistic, hierarchical monitoring framework.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Generative Reliability-Based Design Optimization Using In-Context Learning Capabilities of Large Language Models
Authors:
Zhonglin Jiang,
Qian Tang,
Zequn Wang
Abstract:
Large Language Models (LLMs) have demonstrated remarkable in-context learning capabilities, enabling flexible utilization of limited historical information to play pivotal roles in reasoning, problem-solving, and complex pattern recognition tasks. Inspired by the successful applications of LLMs in multiple domains, this paper proposes a generative design method by leveraging the in-context learnin…
▽ More
Large Language Models (LLMs) have demonstrated remarkable in-context learning capabilities, enabling flexible utilization of limited historical information to play pivotal roles in reasoning, problem-solving, and complex pattern recognition tasks. Inspired by the successful applications of LLMs in multiple domains, this paper proposes a generative design method by leveraging the in-context learning capabilities of LLMs with the iterative search mechanisms of metaheuristic algorithms for solving reliability-based design optimization problems. In detail, reliability analysis is performed by engaging the LLMs and Kriging surrogate modeling to overcome the computational burden. By dynamically providing critical information of design points to the LLMs with prompt engineering, the method enables rapid generation of high-quality design alternatives that satisfy reliability constraints while achieving performance optimization. With the Deepseek-V3 model, three case studies are used to demonstrated the performance of the proposed approach. Experimental results indicate that the proposed LLM-RBDO method successfully identifies feasible solutions that meet reliability constraints while achieving a comparable convergence rate compared to traditional genetic algorithms.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Semiparametric estimation for multivariate Hawkes processes using dependent Dirichlet processes: An application to order flow data in financial markets
Authors:
Alex Ziyu Jiang,
Abel Rodriguez
Abstract:
The order flow in high-frequency financial markets has been of particular research interest in recent years, as it provides insights into trading and order execution strategies and leads to better understanding of the supply-demand interplay and price formation. In this work, we propose a semiparametric multivariate Hawkes process model that relies on (mixtures of) dependent Dirichlet processes to…
▽ More
The order flow in high-frequency financial markets has been of particular research interest in recent years, as it provides insights into trading and order execution strategies and leads to better understanding of the supply-demand interplay and price formation. In this work, we propose a semiparametric multivariate Hawkes process model that relies on (mixtures of) dependent Dirichlet processes to analyze order flow data. Such a formulation avoids the kind of strong parametric assumptions about the excitation functions of the Hawkes process that often accompany traditional models and which, as we show, are not justified in the case of order flow data. It also allows us to borrow information across dimensions, improving estimation of the individual excitation functions. To fit the model, we develop two algorithms, one using Markov chain Monte Carlo methods and one using a stochastic variational approximation. In the context of simulation studies, we show that our model outperforms benchmark methods in terms of lower estimation error for both algorithms. In the context of real order flow data, we show that our model can capture features of the excitation functions such as non-monotonicity that cannot be accommodated by standard parametric models.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Estimating Parameters of Structural Models Using Neural Networks
Authors:
Yanhao,
Wei,
Zhenling Jiang
Abstract:
We study an alternative use of machine learning. We train neural nets to provide the parameter estimate of a given (structural) econometric model, for example, discrete choice or consumer search. Training examples consist of datasets generated by the econometric model under a range of parameter values. The neural net takes the moments of a dataset as input and tries to recognize the parameter valu…
▽ More
We study an alternative use of machine learning. We train neural nets to provide the parameter estimate of a given (structural) econometric model, for example, discrete choice or consumer search. Training examples consist of datasets generated by the econometric model under a range of parameter values. The neural net takes the moments of a dataset as input and tries to recognize the parameter value underlying that dataset. Besides the point estimate, the neural net can also output statistical accuracy. This neural net estimator (NNE) tends to limited-information Bayesian posterior as the number of training datasets increases. We apply NNE to a consumer search model. It gives more accurate estimates at lighter computational costs than the prevailing approach. NNE is also robust to redundant moment inputs. In general, NNE offers the most benefits in applications where other estimation approaches require very heavy simulation costs. We provide code at: https://nnehome.github.io.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Development and Validation of a Dynamic Kidney Failure Prediction Model based on Deep Learning: A Real-World Study with External Validation
Authors:
Jingying Ma,
Jinwei Wang,
Lanlan Lu,
Yexiang Sun,
Mengling Feng,
Peng Shen,
Zhiqin Jiang,
Shenda Hong,
Luxia Zhang
Abstract:
Background: Chronic kidney disease (CKD), a progressive disease with high morbidity and mortality, has become a significant global public health problem. At present, most of the models used for predicting the progression of CKD are static models. We aim to develop a dynamic kidney failure prediction model based on deep learning (KFDeep) for CKD patients, utilizing all available data on common clin…
▽ More
Background: Chronic kidney disease (CKD), a progressive disease with high morbidity and mortality, has become a significant global public health problem. At present, most of the models used for predicting the progression of CKD are static models. We aim to develop a dynamic kidney failure prediction model based on deep learning (KFDeep) for CKD patients, utilizing all available data on common clinical indicators from real-world Electronic Health Records (EHRs) to provide real-time predictions.
Findings: A retrospective cohort of 4,587 patients from EHRs of Yinzhou, China, is used as the development dataset (2,752 patients for training, 917 patients for validation) and internal validation dataset (917 patients), while a prospective cohort of 934 patients from the Peking University First Hospital CKD cohort (PKUFH cohort) is used as the external validation dataset. The AUROC of the KFDeep model reaches 0.946 (95\% CI: 0.922-0.970) on the internal validation dataset and 0.805 (95\% CI: 0.763-0.847) on the external validation dataset, both surpassing existing models. The KFDeep model demonstrates stable performance in simulated dynamic scenarios, with the AUROC progressively increasing over time. Both the calibration curve and decision curve analyses confirm that the model is unbiased and safe for practical use, while the SHAP analysis and hidden layer clustering results align with established medical knowledge.
Interpretation: The KFDeep model built from real-world EHRs enhances the prediction accuracy of kidney failure without increasing clinical examination costs and can be easily integrated into existing hospital systems, providing physicians with a continuously updated decision-support tool due to its dynamic design.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Identification and Estimation of Simultaneous Equation Models Using Higher-Order Cumulant Restrictions
Authors:
Ziyu Jiang
Abstract:
Identifying structural parameters in linear simultaneous equation models is a fundamental challenge in economics and related fields. Recent work leverages higher-order distributional moments, exploiting the fact that non-Gaussian data carry more structural information than the Gaussian framework. While many of these contributions still require zero-covariance assumptions for structural errors, thi…
▽ More
Identifying structural parameters in linear simultaneous equation models is a fundamental challenge in economics and related fields. Recent work leverages higher-order distributional moments, exploiting the fact that non-Gaussian data carry more structural information than the Gaussian framework. While many of these contributions still require zero-covariance assumptions for structural errors, this paper shows that such an assumption can be dispensed with. Specifically, we demonstrate that under any diagonal higher-cumulant condition, the structural parameter matrix can be identified by solving an eigenvector problem. This yields a direct identification argument and motivates a simple sample-analogue estimator that is both consistent and asymptotically normal. Moreover, when uncorrelatedness may still be plausible -- such as in vector autoregression models -- our framework offers a transparent way to test for it, all within the same higher-order orthogonality setting employed by earlier studies. Monte Carlo simulations confirm desirable finite-sample performance, and we further illustrate the method's practical value in two empirical applications.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Physics-Guided Fair Graph Sampling for Water Temperature Prediction in River Networks
Authors:
Erhu He,
Declan Kutscher,
Yiqun Xie,
Jacob Zwart,
Zhe Jiang,
Huaxiu Yao,
Xiaowei Jia
Abstract:
This work introduces a novel graph neural networks (GNNs)-based method to predict stream water temperature and reduce model bias across locations of different income and education levels. Traditional physics-based models often have limited accuracy because they are necessarily approximations of reality. Recently, there has been an increasing interest of using GNNs in modeling complex water dynamic…
▽ More
This work introduces a novel graph neural networks (GNNs)-based method to predict stream water temperature and reduce model bias across locations of different income and education levels. Traditional physics-based models often have limited accuracy because they are necessarily approximations of reality. Recently, there has been an increasing interest of using GNNs in modeling complex water dynamics in stream networks. Despite their promise in improving the accuracy, GNNs can bring additional model bias through the aggregation process, where node features are updated by aggregating neighboring nodes. The bias can be especially pronounced when nodes with similar sensitive attributes are frequently connected. We introduce a new method that leverages physical knowledge to represent the node influence in GNNs, and then utilizes physics-based influence to refine the selection and weights over the neighbors. The objective is to facilitate equitable treatment over different sensitive groups in the graph aggregation, which helps reduce spatial bias over locations, especially for those in underprivileged groups. The results on the Delaware River Basin demonstrate the effectiveness of the proposed method in preserving equitable performance across locations in different sensitive groups.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling
Authors:
Jinzong Dong,
Zhaohui Jiang,
Dong Pan,
Haoyang Yu
Abstract:
Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully…
▽ More
Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully mining and utilizing the prior distribution behind the calibration curve. However, a well-informed prior distribution can provide valuable insights beyond the empirical data under the limited data or low-density regions of confidence scores. To fill this gap, this paper proposes a new method that integrates the prior distribution behind the calibration curve with empirical data to estimate a continuous calibration curve, which is realized by modeling the sampling process of calibration data as a binomial process and maximizing the likelihood function of the binomial process. We prove that the calibration curve estimating method is Lipschitz continuous with respect to data distribution and requires a sample size of $3/B$ of that required for histogram binning, where $B$ represents the number of bins. Also, a new calibration metric ($TCE_{bpm}$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed. $TCE_{bpm}$ is proven to be a consistent calibration measure. Furthermore, realistic calibration datasets can be generated by the binomial process modeling from a preset true calibration curve and confidence score distribution, which can serve as a benchmark to measure and compare the discrepancy between existing calibration metrics and the true calibration error. The effectiveness of our calibration method and metric are verified in real-world and simulated data.
△ Less
Submitted 18 February, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Authors:
Prajwal Koirala,
Zhanhong Jiang,
Soumik Sarkar,
Cody Fleming
Abstract:
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by lea…
▽ More
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
BAMITA: Bayesian Multiple Imputation for Tensor Arrays
Authors:
Ziren Jiang,
Gen Li,
Eric F. Lock
Abstract:
Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing v…
▽ More
Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation .
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Longitudinal Causal Inference with Selective Eligibility
Authors:
Zhichao Jiang,
Eli Ben-Michael,
D. James Greiner,
Ryan Halen,
Kosuke Imai
Abstract:
Dropout poses a significant challenge to causal inference in longitudinal studies with time-varying treatments. However, existing research does not simultaneously address dropout and time-varying treatments. We examine selective eligibility, an important yet overlooked source of non-ignorable dropout in such settings. This problem arises when a unit's prior treatment history influences its eligibi…
▽ More
Dropout poses a significant challenge to causal inference in longitudinal studies with time-varying treatments. However, existing research does not simultaneously address dropout and time-varying treatments. We examine selective eligibility, an important yet overlooked source of non-ignorable dropout in such settings. This problem arises when a unit's prior treatment history influences its eligibility for subsequent treatments, a common scenario in medical and other settings. We propose a general methodological framework for longitudinal causal inference with selective eligibility. By focusing on a subgroup of units who would become eligible for treatment given a specific past treatment sequence, we define the time-specific eligible treatment effect and expected number of outcome events under a treatment sequence of interest. Under a generalized version of sequential ignorability, we derive two nonparametric identification formulae, each leveraging different parts of the observed data distribution. We then derive the efficient influence function of each causal estimand, yielding the corresponding doubly robust estimator. Finally, we apply the proposed methodology to an impact evaluation of a pre-trial risk assessment instrument in the criminal justice system, in which selective eligibility arises due to recidivism.
△ Less
Submitted 15 March, 2025; v1 submitted 23 October, 2024;
originally announced October 2024.
-
Fast Calculation of Feature Contributions in Boosting Trees
Authors:
Zhongli Jiang,
Min Zhang,
Dabao Zhang
Abstract:
Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it underscores the need for a global evaluation of feature contributions. Although coefficients of determination ($R^2$) allow for comparative assessment of individual fe…
▽ More
Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it underscores the need for a global evaluation of feature contributions. Although coefficients of determination ($R^2$) allow for comparative assessment of individual features, individualizing $R^2$ is challenged by the underlying quadratic losses. To address this, we propose Q-SHAP, an efficient algorithm that reduces the computational complexity of calculating Shapley values for quadratic losses to polynomial time. Our simulations show that Q-SHAP not only improves computational efficiency but also enhances the accuracy of feature-specific $R^2$ estimates.
△ Less
Submitted 26 May, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Estimating Treatment Effects under Recommender Interference: A Structured Neural Networks Approach
Authors:
Ruohan Zhan,
Shichao Han,
Yuchen Hu,
Zhenling Jiang
Abstract:
Recommender systems are essential for content-sharing platforms by curating personalized content. To evaluate updates to recommender systems targeting content creators, platforms frequently rely on creator-side randomized experiments. The treatment effect measures the change in outcomes when a new algorithm is implemented compared to the status quo. We show that the standard difference-in-means es…
▽ More
Recommender systems are essential for content-sharing platforms by curating personalized content. To evaluate updates to recommender systems targeting content creators, platforms frequently rely on creator-side randomized experiments. The treatment effect measures the change in outcomes when a new algorithm is implemented compared to the status quo. We show that the standard difference-in-means estimator can lead to biased estimates due to recommender interference that arises when treated and control creators compete for exposure. We propose a "recommender choice model" that describes which item gets exposed from a pool containing both treated and control items. By combining a structural choice model with neural networks, this framework directly models the interference pathway while accounting for rich viewer-content heterogeneity. We construct a debiased estimator of the treatment effect and prove it is $\sqrt n$-consistent and asymptotically normal with potentially correlated samples. We validate our estimator's empirical performance with a field experiment on Weixin short-video platform. In addition to the standard creator-side experiment, we conduct a costly double-sided randomization design to obtain a benchmark estimate free from interference bias. We show that the proposed estimator yields results comparable to the benchmark, whereas the standard difference-in-means estimator can exhibit significant bias and even produce reversed signs.
△ Less
Submitted 5 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies
Authors:
Eli Ben-Michael,
D. James Greiner,
Melody Huang,
Kosuke Imai,
Zhichao Jiang,
Sooahn Shin
Abstract:
The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirica…
▽ More
The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a large language model in particular--leads to a worse classification performance.
△ Less
Submitted 11 October, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Multiple testing using uniform filtering of ordered p-values
Authors:
Zhiwen Jiang,
Stephan Morgenthaler
Abstract:
We investigate the multiplicity model with m values of some test statistic independently drawn from a mixture of no effect (null) and positive effect (alternative), where we seek to identify, the alternative test results with a controlled error rate. We are interested in the case where the alternatives are rare. A number of multiple testing procedures filter the set of ordered p-values in order to…
▽ More
We investigate the multiplicity model with m values of some test statistic independently drawn from a mixture of no effect (null) and positive effect (alternative), where we seek to identify, the alternative test results with a controlled error rate. We are interested in the case where the alternatives are rare. A number of multiple testing procedures filter the set of ordered p-values in order to eliminate the nulls. Such an approach can only work if the p-values originating from the alternatives form one or several identifiable clusters. The Benjamini and Hochberg (BH) method, for example, assumes that this cluster occurs in a small interval $(0,Δ)$ and filters out all or most of the ordered p-values $p_{(r)}$ above a linear threshold $s \times r$. In repeated applications this filter controls the false discovery rate via the slope s. We propose a new adaptive filter that deletes the p-values from regions of uniform distribution. In cases where a single cluster remains, the p-values in an interval are declared alternatives, with the mid-point and the length of the interval chosen by controlling the data-dependent FDR at a desired level.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery
Authors:
Pengwei Yan,
Kaisong Song,
Zhuoren Jiang,
Yangyang Kang,
Tianqianjin Lin,
Changlong Sun,
Xiaozhong Liu
Abstract:
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique…
▽ More
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph
Authors:
Tianqianjin Lin,
Kaisong Song,
Zhuoren Jiang,
Yangyang Kang,
Weikang Yuan,
Xurui Li,
Changlong Sun,
Cui Huang,
Xiaozhong Liu
Abstract:
Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic…
▽ More
Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic the human perception and decision process through two key steps: constructing intelligible variables based on semantics derived from the graph schema and automatically learning task-level causal relationships among these variables by incorporating advanced causal discovery techniques. We compared HG-SCM to seven state-of-the-art baseline models on three real-world datasets, under three distinct and ubiquitous out-of-distribution settings. HG-SCM achieved the highest average performance rank with minimal standard deviation, substantiating its effectiveness and superiority in terms of both predictive power and generalizability. Additionally, the visualization and analysis of the auto-learned causal diagrams for the three tasks aligned well with domain knowledge and human cognition, demonstrating prominent interpretability. HG-SCM's human-like nature and its enhanced generalizability and interpretability make it a promising solution for special scenarios where transparency and trustworthiness are paramount.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Modified treatment policy effect estimation with weighted energy distance
Authors:
Ziren Jiang,
Jared D. Huling
Abstract:
The causal effects of continuous treatments are often characterized through the average dose response function, which is challenging to estimate from observational data due to confounding and positivity violations. Modified treatment policies (MTPs) are an alternative approach that aim to assess the effect of a modification to observed treatment values and work under relaxed assumptions. Estimator…
▽ More
The causal effects of continuous treatments are often characterized through the average dose response function, which is challenging to estimate from observational data due to confounding and positivity violations. Modified treatment policies (MTPs) are an alternative approach that aim to assess the effect of a modification to observed treatment values and work under relaxed assumptions. Estimators for MTPs generally focus on estimating the conditional density of treatment given covariates and using it to construct weights. However, weighting using conditional density models has well-documented challenges. Further, MTPs with larger treatment modifications have stronger confounding and no tools exist to help choose an appropriate modification magnitude. This paper investigates the role of weights for MTPs showing that to control confounding, weights should balance the weighted data to an unobserved hypothetical target population that can be characterized with observed data. Leveraging this insight, we present a versatile set of tools to enhance estimation for MTPs. We introduce a distance that measures imbalance of covariate distributions under the MTP and use it to develop new weighting methods and tools to aid in the estimation of MTPs. Using our methods we study the effect of mechanical power of ventilation on in-hospital mortality.
△ Less
Submitted 18 January, 2025; v1 submitted 17 October, 2023;
originally announced October 2023.
-
CODA: Temporal Domain Generalization via Concept Drift Simulator
Authors:
Chia-Yuan Chang,
Yu-Neng Chuang,
Zhimeng Jiang,
Kwei-Herng Lai,
Anxiao Jiang,
Na Zou
Abstract:
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized predicti…
▽ More
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Improvements on Scalable Stochastic Bayesian Inference Methods for Multivariate Hawkes Process
Authors:
Alex Ziyu Jiang,
Abel Rodríguez
Abstract:
Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochast…
▽ More
Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochastic gradient variational inference and stochastic gradient Langevin Monte Carlo. An important contribution of this paper is a novel approximation to the likelihood function that allows us to retain the computational advantages associated with conjugate settings while reducing approximation errors associated with the boundary effects. The comparisons are based on various simulated scenarios as well as an application to the study the risk dynamics in the Standard & Poor's 500 intraday index prices among its 11 sectors.
△ Less
Submitted 20 February, 2025; v1 submitted 26 September, 2023;
originally announced September 2023.
-
BARTSIMP: flexible spatial covariate modeling and prediction using Bayesian additive regression trees
Authors:
Alex Ziyu Jiang,
Jon Wakefield
Abstract:
Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate specification. Existing machine learning approaches that allow for spatial dependen…
▽ More
Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate specification. Existing machine learning approaches that allow for spatial dependence in the residuals fail to provide reliable uncertainty estimates. In this paper, we investigate the combination of a Gaussian process spatial model with a Bayesian Additive Regression Tree (BART) model. The computational burden of the approach is reduced by combining Markov chain Monte Carlo (MCMC) with the Integrated Nested Laplace Approximation (INLA) technique. We study the performance of the method first via simulation. We then use the model to predict anthropometric responses in Kenya, with the data collected via a complex sampling design. In particular, household survey data are collected via stratified two-stage unequal probability cluster sampling, which requires special care when modeled.
△ Less
Submitted 20 February, 2025; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Principal Stratification with Continuous Post-Treatment Variables: Nonparametric Identification and Semiparametric Estimation
Authors:
Sizhu Lu,
Zhichao Jiang,
Peng Ding
Abstract:
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatme…
▽ More
Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment's impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in an R package.
△ Less
Submitted 3 April, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Approximate Causal Effect Identification under Weak Confounding
Authors:
Ziwei Jiang,
Lai Wei,
Murat Kocaoglu
Abstract:
Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for varia…
▽ More
Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Causal Mediation Analysis with Multi-dimensional and Indirectly Observed Mediators
Authors:
Ziyang Jiang,
Yiling Liu,
Michael H. Klein,
Ahmed Aloui,
Yiman Ren,
Keyu Li,
Vahid Tarokh,
David Carlson
Abstract:
Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For examp…
▽ More
Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For example, we may want to identify how changes in brain activity or structure mediate an antidepressant's effect on behavior, but we may only have access to electrophysiological or imaging brain measurements. To date, most CMA methods assume that the mediator is one-dimensional and observable, which oversimplifies such real-world scenarios. To overcome this limitation, we introduce a CMA framework that can handle complex and indirectly observed mediators based on the identifiable variational autoencoder (iVAE) architecture. We prove that the true joint distribution over observed and latent variables is identifiable with the proposed method. Additionally, our framework captures a disentangled representation of the indirectly observed mediator and yields accurate estimation of the direct and mediated effects in synthetic and semi-synthetic experiments, providing evidence of its potential utility in real-world applications.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
The ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2023: Intracranial Meningioma
Authors:
Dominic LaBella,
Maruf Adewole,
Michelle Alonso-Basanta,
Talissa Altes,
Syed Muhammad Anwar,
Ujjwal Baid,
Timothy Bergquist,
Radhika Bhalerao,
Sully Chen,
Verena Chung,
Gian-Marco Conte,
Farouk Dako,
James Eddy,
Ivan Ezhov,
Devon Godfrey,
Fathi Hilal,
Ariana Familiar,
Keyvan Farahani,
Juan Eugenio Iglesias,
Zhifan Jiang,
Elaine Johanson,
Anahita Fathi Kazerooni,
Collin Kent,
John Kirkpatrick,
Florian Kofler
, et al. (35 additional authors not shown)
Abstract:
Meningiomas are the most common primary intracranial tumor in adults and can be associated with significant morbidity and mortality. Radiologists, neurosurgeons, neuro-oncologists, and radiation oncologists rely on multiparametric MRI (mpMRI) for diagnosis, treatment planning, and longitudinal treatment monitoring; yet automated, objective, and quantitative tools for non-invasive assessment of men…
▽ More
Meningiomas are the most common primary intracranial tumor in adults and can be associated with significant morbidity and mortality. Radiologists, neurosurgeons, neuro-oncologists, and radiation oncologists rely on multiparametric MRI (mpMRI) for diagnosis, treatment planning, and longitudinal treatment monitoring; yet automated, objective, and quantitative tools for non-invasive assessment of meningiomas on mpMRI are lacking. The BraTS meningioma 2023 challenge will provide a community standard and benchmark for state-of-the-art automated intracranial meningioma segmentation models based on the largest expert annotated multilabel meningioma mpMRI dataset to date. Challenge competitors will develop automated segmentation models to predict three distinct meningioma sub-regions on MRI including enhancing tumor, non-enhancing tumor core, and surrounding nonenhancing T2/FLAIR hyperintensity. Models will be evaluated on separate validation and held-out test datasets using standardized metrics utilized across the BraTS 2023 series of challenges including the Dice similarity coefficient and Hausdorff distance. The models developed during the course of this challenge will aid in incorporation of automated meningioma MRI segmentation into clinical practice, which will ultimately improve care of patients with meningioma.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
A Multi-Arm Two-Stage (MATS) Design for Proof-of-Concept and Dose Optimization in Early-Phase Oncology Trials
Authors:
Zhenghao Jiang,
Gu Mi,
Ji Lin,
Christelle Lorenzato,
Yuan Ji
Abstract:
The Project Optimus initiative by the FDA's Oncology Center of Excellence is widely viewed as a groundbreaking effort to change the $\textit{status quo}$ of conventional dose-finding strategies in oncology. Unlike in other therapeutic areas where multiple doses are evaluated thoroughly in dose ranging studies, early-phase oncology dose-finding studies are characterized by the practice of identifyi…
▽ More
The Project Optimus initiative by the FDA's Oncology Center of Excellence is widely viewed as a groundbreaking effort to change the $\textit{status quo}$ of conventional dose-finding strategies in oncology. Unlike in other therapeutic areas where multiple doses are evaluated thoroughly in dose ranging studies, early-phase oncology dose-finding studies are characterized by the practice of identifying a single dose, such as the maximum tolerated dose (MTD) or the recommended phase 2 dose (RP2D). Following the spirit of Project Optimus, we propose an Multi-Arm Two-Stage (MATS) design for proof-of-concept (PoC) and dose optimization that allows the evaluation of two selected doses from a dose-escalation trial. The design assess the higher dose first across multiple indications in the first stage, and adaptively enters the second stage for an indication if the higher dose exhibits promising anti-tumor activities. In the second stage, a randomized comparison between the higher and lower doses is conducted to achieve proof-of-concept (PoC) and dose optimization. A Bayesian hierarchical model governs the statistical inference and decision making by borrowing information across doses, indications, and stages. Our simulation studies show that the proposed MATS design yield desirable performance. An R Shiny application has been developed and made available at https://matsdesign.shinyapps.io/mats/.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
A Survey on Uncertainty Quantification Methods for Deep Learning
Authors:
Wenchong He,
Zhe Jiang,
Tingsong Xiao,
Zelin Xu,
Yukun Li
Abstract:
Deep neural networks (DNNs) have achieved tremendous success in making accurate predictions for computer vision, natural language processing, as well as science and engineering domains. However, it is also well-recognized that DNNs sometimes make unexpected, incorrect, but overconfident predictions. This can cause serious consequences in high-stake applications, such as autonomous driving, medical…
▽ More
Deep neural networks (DNNs) have achieved tremendous success in making accurate predictions for computer vision, natural language processing, as well as science and engineering domains. However, it is also well-recognized that DNNs sometimes make unexpected, incorrect, but overconfident predictions. This can cause serious consequences in high-stake applications, such as autonomous driving, medical diagnosis, and disaster response. Uncertainty quantification (UQ) aims to estimate the confidence of DNN predictions beyond prediction accuracy. In recent years, many UQ methods have been developed for DNNs. It is of great practical value to systematically categorize these UQ methods and compare their advantages and disadvantages. However, existing surveys mostly focus on categorizing UQ methodologies from a neural network architecture perspective or a Bayesian perspective and ignore the source of uncertainty that each methodology can incorporate, making it difficult to select an appropriate UQ method in practice. To fill the gap, this paper presents a systematic taxonomy of UQ methods for DNNs based on the types of uncertainty sources (data uncertainty versus model uncertainty). We summarize the advantages and disadvantages of methods in each category. We show how our taxonomy of UQ methodologies can potentially help guide the choice of UQ method in different machine learning problems (e.g., active learning, robustness, and reinforcement learning). We also identify current research gaps and propose several future research directions.
△ Less
Submitted 19 January, 2025; v1 submitted 26 February, 2023;
originally announced February 2023.
-
Domain Adaptation via Rebalanced Sub-domain Alignment
Authors:
Yiling Liu,
Juncheng Dong,
Ziyang Jiang,
Ahmed Aloui,
Keyu Li,
Hunter Klein,
Vahid Tarokh,
David Carlson
Abstract:
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitati…
▽ More
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target sub-domains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on real-world data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to state-of-the-art methods.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Estimating Causal Effects using a Multi-task Deep Ensemble
Authors:
Ziyang Jiang,
Zhuoran Hou,
Yiling Liu,
Yiman Ren,
Keyu Li,
David Carlson
Abstract:
A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task G…
▽ More
A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. We provide proofs demonstrating equivalency of CDME to a multi-task Gaussian process (GP) with a coregionalization kernel a priori. Compared to multi-task GP, CMDE efficiently handles high-dimensional and multi-modal covariates and provides pointwise uncertainty estimates of causal effects. We evaluate our method across various types of datasets and tasks and find that CMDE outperforms state-of-the-art methods on a majority of these tasks.
△ Less
Submitted 27 May, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Informing policy via dynamic models: Cholera in Haiti
Authors:
Jesse Wheeler,
AnnaElaine Rosengart,
Zhuoxun Jiang,
Kevin Tan,
Noah Treutle,
Edward Ionides
Abstract:
Public health decisions must be made about when and how to implement interventions to control an infectious disease epidemic. These decisions should be informed by data on the epidemic as well as current understanding about the transmission dynamics. Such decisions can be posed as statistical questions about scientifically motivated dynamic models. Thus, we encounter the methodological task of bui…
▽ More
Public health decisions must be made about when and how to implement interventions to control an infectious disease epidemic. These decisions should be informed by data on the epidemic as well as current understanding about the transmission dynamics. Such decisions can be posed as statistical questions about scientifically motivated dynamic models. Thus, we encounter the methodological task of building credible, data-informed decisions based on stochastic, partially observed, nonlinear dynamic models. This necessitates addressing the tradeoff between biological fidelity and model simplicity, and the reality of misspecification for models at all levels of complexity. We assess current methodological approaches to these issues via a case study of the 2010-2019 cholera epidemic in Haiti. We consider three dynamic models developed by expert teams to advise on vaccination policies. We evaluate previous methods used for fitting these models, and we demonstrate modified data analysis strategies leading to improved statistical fit. Specifically, we present approaches for diagnosing model misspecification and the consequent development of improved models. Additionally, we demonstrate the utility of recent advances in likelihood maximization for high-dimensional nonlinear dynamic models, enabling likelihood-based inference for spatiotemporal incidence data using this class of models. Our workflow is reproducible and extendable, facilitating future investigations of this disease system.
△ Less
Submitted 4 March, 2024; v1 submitted 21 January, 2023;
originally announced January 2023.
-
An instrumental variable method for point processes: generalised Wald estimation based on deconvolution
Authors:
Zhichao Jiang,
Shizhe Chen,
Peng Ding
Abstract:
Point processes are probabilistic tools for modeling event data. While there exists a fast-growing literature studying the relationships between point processes, it remains unexplored how such relationships connect to causal effects. In the presence of unmeasured confounders, parameters from point process models do not necessarily have causal interpretations. We propose an instrumental variable me…
▽ More
Point processes are probabilistic tools for modeling event data. While there exists a fast-growing literature studying the relationships between point processes, it remains unexplored how such relationships connect to causal effects. In the presence of unmeasured confounders, parameters from point process models do not necessarily have causal interpretations. We propose an instrumental variable method for causal inference with point process treatment and outcome. We define causal quantities based on potential outcomes and establish nonparametric identification results with a binary instrumental variable. We extend the traditional Wald estimation to deal with point process treatment and outcome, showing that it should be performed after a Fourier transform of the intention-to-treat effects on the treatment and outcome and thus takes the form of deconvolution. We term this as the generalised Wald estimation and propose an estimation strategy based on well-established deconvolution methods.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
Authors:
Moses Charikar,
Zhihao Jiang,
Kirankumar Shiragur,
Aaron Sidford
Abstract:
We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $ε\gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $ε\gg n^{-1/4}$ achievable by pol…
▽ More
We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $ε\gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $ε\gg n^{-1/4}$ achievable by polynomial time computable PML-based universal estimators [ACSS21, ACSS20]. Our estimator reaches a theoretical limit for universal symmetric property estimation as [Han21] shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every $1$-Lipschitz property when $ε\ll n^{-1/3}$.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Distributed Online Non-convex Optimization with Composite Regret
Authors:
Zhanhong Jiang,
Aditya Balu,
Xian Yeow Lee,
Young M. Lee,
Chinmay Hegde,
Soumik Sarkar
Abstract:
Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on developing approaches for (either strongly or non-strongly) convex los…
▽ More
Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on developing approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Spectrum of non-Hermitian deep-Hebbian neural networks
Authors:
Zijian Jiang,
Ziming Chen,
Tianqi Hou,
Haiping Huang
Abstract:
Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacob…
▽ More
Neural networks with recurrent asymmetric couplings are important to understand how episodic memories are encoded in the brain. Here, we integrate the experimental observation of wide synaptic integration window into our model of sequence retrieval in the continuous time dynamics. The model with non-normal neuron-interactions is theoretically studied by deriving a random matrix theory of the Jacobian matrix in neural dynamics. The spectra bears several distinct features, such as breaking rotational symmetry about the origin, and the emergence of nested voids within the spectrum boundary. The spectral density is thus highly non-uniformly distributed in the complex plane. The random matrix theory also predicts a transition to chaos. In particular, the edge of chaos provides computational benefits for the sequential retrieval of memories. Our work provides a systematic study of time-lagged correlations with arbitrary time delays, and thus can inspire future studies of a broad class of memory models, and even big data analysis of biological time series.
△ Less
Submitted 16 January, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Policy Learning with Asymmetric Counterfactual Utilities
Authors:
Eli Ben-Michael,
Kosuke Imai,
Zhichao Jiang
Abstract:
Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility…
▽ More
Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension.
△ Less
Submitted 28 November, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel
Authors:
Ziyang Jiang,
Tongshu Zheng,
Yiling Liu,
David Carlson
Abstract:
It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently us…
▽ More
It is challenging to guide neural network (NN) learning with prior knowledge. In contrast, many known properties, such as spatial smoothness or seasonality, are straightforward to model by choosing an appropriate kernel in a Gaussian process (GP). Many deep learning applications could be enhanced by modeling such known properties. For example, convolutional neural networks (CNNs) are frequently used in remote sensing, which is subject to strong seasonal effects. We propose to blend the strengths of deep learning and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties (e.g., seasonality). We implement this idea by combining a deep network and an efficient mapping based on the Nystrom approximation, which we call Implicit Composite Kernel (ICK). We then adopt a sample-then-optimize approach to approximate the full GP posterior distribution. We demonstrate that ICK has superior performance and flexibility on both synthetic and real-world data sets. We believe that ICK framework can be used to include prior information into neural networks in many applications.
△ Less
Submitted 28 February, 2024; v1 submitted 15 May, 2022;
originally announced May 2022.
-
Towards the Generalization of Contrastive Self-Supervised Learning
Authors:
Weiran Huang,
Mingyang Yi,
Xuyang Zhao,
Zihao Jiang
Abstract:
Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(σ,δ)$-measure to mathematically…
▽ More
Recently, self-supervised learning has attracted great attention, since it only requires unlabeled data for model training. Contrastive learning is one popular method for self-supervised learning and has achieved promising empirical performance. However, the theoretical understanding of its generalization ability is still limited. To this end, we define a kind of $(σ,δ)$-measure to mathematically quantify the data augmentation, and then provide an upper bound of the downstream classification error rate based on the measure. It reveals that the generalization ability of contrastive self-supervised learning is related to three key factors: alignment of positive samples, divergence of class centers, and concentration of augmented data. The first two factors are properties of learned representations, while the third one is determined by pre-defined data augmentation. We further investigate two canonical contrastive losses, InfoNCE and cross-correlation, to show how they provably achieve the first two factors. Moreover, we conduct experiments to study the third factor, and observe a strong correlation between downstream performance and the concentration of augmented data.
△ Less
Submitted 2 March, 2023; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Safe Policy Learning through Extrapolation: Application to Pre-trial Risk Assessment
Authors:
Eli Ben-Michael,
D. James Greiner,
Kosuke Imai,
Zhichao Jiang
Abstract:
Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classificati…
▽ More
Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and interpretability. We examine a particular case of algorithmic pre-trial risk assessments in the US criminal justice system, which provide deterministic classification scores and recommendations to help judges make release decisions. Our goal is to analyze data from a unique field experiment on an algorithmic pre-trial risk assessment to investigate whether the scores and recommendations can be improved. Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be stochastic. We develop a maximin robust optimization approach that partially identifies the expected utility of a policy, and then finds a policy that maximizes the worst-case expected utility. The resulting policy has a statistical safety property, limiting the probability of producing a worse policy than the existing one, under structural assumptions about the outcomes. Our analysis of data from the field experiment shows that we can safely improve certain components of the risk assessment instrument by classifying arrestees as lower risk under a wide range of utility specifications, though the analysis is not informative about several components of the instrument.
△ Less
Submitted 31 March, 2025; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Model-based Pre-clinical Trials for Medical Devices Using Statistical Model Checking
Authors:
Haochen Yang,
Jicheng Gu,
Zhihao Jiang
Abstract:
Clinical trials are considered as the golden standard for medical device validation. However, many sacrifices have to be made during the design and conduction of the trials due to cost considerations and partial information, which may compromise the significance of the trial results. In this paper, we proposed a model-based pre-clinical trial framework using statistical model checking. Physiologic…
▽ More
Clinical trials are considered as the golden standard for medical device validation. However, many sacrifices have to be made during the design and conduction of the trials due to cost considerations and partial information, which may compromise the significance of the trial results. In this paper, we proposed a model-based pre-clinical trial framework using statistical model checking. Physiological models represent disease mechanism, which enable automated adjudication of simulation results. Sampling of the patient parameters and hypothesis testing are performed by statistical model checker. The framework enables a broader range of hypothesis to be tested with guaranteed statistical significance, which are useful complements to the clinical trials. We demonstrated our framework with a pre-clinical trial on implantable cardioverter defibrillators.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment
Authors:
Kosuke Imai,
Zhichao Jiang,
James Greiner,
Ryan Halen,
Sooahn Shin
Abstract:
Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairne…
▽ More
Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judge's decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges' decision. Finally, we find that the PSA's recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.
△ Less
Submitted 11 December, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Multiply robust estimation of causal effects under principal ignorability
Authors:
Zhichao Jiang,
Shu Yang,
Peng Ding
Abstract:
Causal inference concerns not only the average effect of the treatment on the outcome but also the underlying mechanism through an intermediate variable of interest. Principal stratification characterizes such a mechanism by targeting subgroup causal effects within principal strata, which are defined by the joint potential values of an intermediate variable. Due to the fundamental problem of causa…
▽ More
Causal inference concerns not only the average effect of the treatment on the outcome but also the underlying mechanism through an intermediate variable of interest. Principal stratification characterizes such a mechanism by targeting subgroup causal effects within principal strata, which are defined by the joint potential values of an intermediate variable. Due to the fundamental problem of causal inference, principal strata are inherently latent, rendering it challenging to identify and estimate subgroup effects within them. A line of research leverages the principal ignorability assumption that the latent principal strata are mean independent of the potential outcomes conditioning on the observed covariates. Under principal ignorability, we derive various nonparametric identification formulas for causal effects within principal strata in observational studies, which motivate estimators relying on the correct specifications of different parts of the observed-data distribution. Appropriately combining these estimators yields triply robust estimators for the causal effects within principal strata. These triply robust estimators are consistent if two of the treatment, intermediate variable, and outcome models are correctly specified, and moreover, they are locally efficient if all three models are correctly specified. We show that these estimators arise naturally from either the efficient influence functions in the semiparametric theory or the model-assisted estimators in the survey sampling theory. We evaluate different estimators based on their finite-sample performance through simulation and apply them to two observational studies.
△ Less
Submitted 27 March, 2022; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Statistical Inference and Power Analysis for Direct and Spillover Effects in Two-Stage Randomized Experiments
Authors:
Zhichao Jiang,
Kosuke Imai,
Anup Malani
Abstract:
Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-ba…
▽ More
Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-based framework, we consider the estimation of a new direct effect of interest as well as the average direct and spillover effects studied in the literature. We provide unbiased estimators of these causal quantities and their conservative variance estimators in a general setting. Using these results, we then develop hypothesis testing procedures and derive sample size formulas. We theoretically compare the two-stage randomized design with the completely randomized and cluster randomized designs, which represent two limiting designs. Finally, we conduct simulation studies to evaluate the empirical performance of our sample size formulas. For empirical illustration, the proposed methodology is applied to the randomized evaluation of the Indian national health insurance program. An open-source software package is available for implementing the proposed methodology.
△ Less
Submitted 20 October, 2022; v1 submitted 15 November, 2020;
originally announced November 2020.
-
Decentralized Deep Learning using Momentum-Accelerated Consensus
Authors:
Aditya Balu,
Zhanhong Jiang,
Sin Yong Tan,
Chinmay Hedge,
Young M Lee,
Soumik Sarkar
Abstract:
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server topology for aggregating the model parameters from the agents. However, such a topology may be inapplicable in networked systems such as ad-hoc mobile networks…
▽ More
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server topology for aggregating the model parameters from the agents. However, such a topology may be inapplicable in networked systems such as ad-hoc mobile networks, field robotics, and power network systems where direct communication with the central parameter server may be inefficient. In this context, we propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology (without a central server). Our algorithm is based on the heavy-ball acceleration method used in gradient-based optimization. We propose a novel consensus protocol where each agent shares with its neighbors its model parameters as well as gradient-momentum values during the optimization process. We consider both strongly convex and non-convex objective functions and theoretically analyze our algorithm's performance. We present several empirical comparisons with competing decentralized learning methods to demonstrate the efficacy of our approach under different communication topologies.
△ Less
Submitted 28 November, 2020; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Semi-supervised Learning with the EM Algorithm: A Comparative Study between Unstructured and Structured Prediction
Authors:
Wenchong He,
Zhe Jiang
Abstract:
Semi-supervised learning aims to learn prediction models from both labeled and unlabeled samples. There has been extensive research in this area. Among existing work, generative mixture models with Expectation-Maximization (EM) is a popular method due to clear statistical properties. However, existing literature on EM-based semi-supervised learning largely focuses on unstructured prediction, assum…
▽ More
Semi-supervised learning aims to learn prediction models from both labeled and unlabeled samples. There has been extensive research in this area. Among existing work, generative mixture models with Expectation-Maximization (EM) is a popular method due to clear statistical properties. However, existing literature on EM-based semi-supervised learning largely focuses on unstructured prediction, assuming that samples are independent and identically distributed. Studies on EM-based semi-supervised approach in structured prediction is limited. This paper aims to fill the gap through a comparative study between unstructured and structured methods in EM-based semi-supervised learning. Specifically, we compare their theoretical properties and find that both methods can be considered as a generalization of self-training with soft class assignment of unlabeled samples, but the structured method additionally considers structural constraint in soft class assignment. We conducted a case study on real-world flood mapping datasets to compare the two methods. Results show that structured EM is more robust to class confusion caused by noise and obstacles in features in the context of the flood mapping application.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Spatiotemporal Attention for Multivariate Time Series Prediction and Interpretation
Authors:
Tryambak Gangopadhyay,
Sin Yong Tan,
Zhanhong Jiang,
Rui Meng,
Soumik Sarkar
Abstract:
Multivariate time series modeling and prediction problems are abundant in many machine learning application domains. Accurate interpretation of such prediction outcomes from a machine learning model that explicitly captures temporal correlations can significantly benefit the domain experts. In this context, temporal attention has been successfully applied to isolate the important time steps for th…
▽ More
Multivariate time series modeling and prediction problems are abundant in many machine learning application domains. Accurate interpretation of such prediction outcomes from a machine learning model that explicitly captures temporal correlations can significantly benefit the domain experts. In this context, temporal attention has been successfully applied to isolate the important time steps for the input time series. However, in multivariate time series problems, spatial interpretation is also critical to understand the contributions of different variables on the model outputs. We propose a novel deep learning architecture, called spatiotemporal attention mechanism (STAM) for simultaneous learning of the most important time steps and variables. STAM is a causal (i.e., only depends on past inputs and does not use future inputs) and scalable (i.e., scales well with an increase in the number of variables) approach that is comparable to the state-of-the-art models in terms of computational tractability. We demonstrate our models' performance on two popular public datasets and a domain-specific dataset. When compared with the baseline models, the results show that STAM maintains state-of-the-art prediction accuracy while offering the benefit of accurate spatiotemporal interpretability. The learned attention weights are validated from a domain knowledge perspective for these real-world datasets.
△ Less
Submitted 26 October, 2020; v1 submitted 11 August, 2020;
originally announced August 2020.