-
Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
Authors:
He Li,
Haoang Chi,
Mingyu Liu,
Wanrong Huang,
Liyang Xu,
Wenjing Yang
Abstract:
The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attr…
▽ More
The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at https://github.com/lihe-maxsize/DeppSTCI_Release_Version-master.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Authors:
Wenhao Yang,
Sifan Yang,
Lijun Zhang
Abstract:
Reflecting the greater significance of recent history over the distant past in non-stationary environments, $λ$-discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $λ$ is given, online gradient descent with an appropriate step size achieves an $O(1/\sqrt{1-λ})$ discounted regret. However, the…
▽ More
Reflecting the greater significance of recent history over the distant past in non-stationary environments, $λ$-discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $λ$ is given, online gradient descent with an appropriate step size achieves an $O(1/\sqrt{1-λ})$ discounted regret. However, the value of $λ$ is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform $O(\sqrt{\log T/1-λ})$ discounted regret, holding for all values of $λ$ across a continuous interval simultaneously. The basic idea is to maintain multiple OGD instances to handle different discount factors, and aggregate their outputs sequentially by an online prediction algorithm named as Discounted-Normal-Predictor (DNP) (Kapralov and Panigrahy,2010). Our analysis reveals that DNP can combine the decisions of two experts, even when they operate on discounted regret with different discount factors.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Wide & Deep Learning for Node Classification
Authors:
Yancheng Chen,
Wenguo Yang,
Zhipeng Jiang
Abstract:
Wide & Deep, a simple yet effective learning architecture for recommendation systems developed by Google, has had a significant impact in both academia and industry due to its combination of the memorization ability of generalized linear models and the generalization ability of deep models. Graph convolutional networks (GCNs) remain dominant in node classification tasks; however, recent studies ha…
▽ More
Wide & Deep, a simple yet effective learning architecture for recommendation systems developed by Google, has had a significant impact in both academia and industry due to its combination of the memorization ability of generalized linear models and the generalization ability of deep models. Graph convolutional networks (GCNs) remain dominant in node classification tasks; however, recent studies have highlighted issues such as heterophily and expressiveness, which focus on graph structure while seemingly neglecting the potential role of node features. In this paper, we propose a flexible framework GCNIII, which leverages the Wide & Deep architecture and incorporates three techniques: Intersect memory, Initial residual and Identity mapping. We provide comprehensive empirical evidence showing that GCNIII can more effectively balance the trade-off between over-fitting and over-generalization on various semi- and full- supervised tasks. Additionally, we explore the use of large language models (LLMs) for node feature engineering to enhance the performance of GCNIII in cross-domain node classification tasks. Our implementation is available at https://github.com/CYCUCAS/GCNIII.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Authors:
Song Xia,
Yi Yu,
Wenhan Yang,
Meiwen Ding,
Zhuo Chen,
Ling-Yu Duan,
Alex C. Kot,
Xudong Jiang
Abstract:
By locally encoding raw data into intermediate features, collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. However, recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (…
▽ More
By locally encoding raw data into intermediate features, collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. However, recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (MIAs). Obfuscation-based methods, such as noise corruption, adversarial representation learning, and information filters, enhance the inversion robustness by obfuscating the task-irrelevant redundancy empirically. However, methods for quantifying such redundancy remain elusive, and the explicit mathematical relation between this redundancy minimization and inversion robustness enhancement has not yet been established. To address that, this work first theoretically proves that the conditional entropy of inputs given intermediate features provides a guaranteed lower bound on the reconstruction mean square error (MSE) under any MIA. Then, we derive a differentiable and solvable measure for bounding this conditional entropy based on the Gaussian mixture estimation and propose a conditional entropy maximization (CEM) algorithm to enhance the inversion robustness. Experimental results on four datasets demonstrate the effectiveness and adaptability of our proposed CEM; without compromising feature utility and computing efficiency, plugging the proposed CEM into obfuscation-based defense mechanisms consistently boosts their inversion robustness, achieving average gains ranging from 12.9\% to 48.2\%. Code is available at \href{https://github.com/xiasong0501/CEM}{https://github.com/xiasong0501/CEM}.
△ Less
Submitted 3 April, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
The Capabilities and Limitations of Weak-to-Strong Generalization: Generalization and Calibration
Authors:
Wei Yao,
Wenkai Yang,
Gengze Xu,
Ziqiao Wang,
Yankai Lin,
Yong Liu
Abstract:
Weak-to-strong generalization, where weakly supervised strong models outperform their weaker teachers, offers a promising approach to aligning superhuman models with human values. To deepen the understanding of this approach, we provide theoretical insights into its capabilities and limitations. First, in the classification setting, we establish upper and lower generalization error bounds for the…
▽ More
Weak-to-strong generalization, where weakly supervised strong models outperform their weaker teachers, offers a promising approach to aligning superhuman models with human values. To deepen the understanding of this approach, we provide theoretical insights into its capabilities and limitations. First, in the classification setting, we establish upper and lower generalization error bounds for the strong model, identifying the primary limitations as stemming from the weak model's generalization error and the optimization objective itself. Additionally, we derive lower and upper bounds on the calibration error of the strong model. These theoretical bounds reveal two critical insights: (1) the weak model should demonstrate strong generalization performance and maintain well-calibrated predictions, and (2) the strong model's training process must strike a careful balance, as excessive optimization could undermine its generalization capability by over-relying on the weak supervision signals. Finally, in the regression setting, we extend the work of Charikar et al. (2024) to a loss function based on Kullback-Leibler (KL) divergence, offering guarantees that the strong student can outperform its weak teacher by at least the magnitude of their disagreement. We conduct sufficient experiments to validate our theory.
△ Less
Submitted 3 June, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
MEP-Net: Generating Solutions to Scientific Problems with Limited Knowledge by Maximum Entropy Principle
Authors:
Wuyue Yang,
Liangrong Peng,
Guojie Li,
Liu Hong
Abstract:
Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions…
▽ More
Maximum entropy principle (MEP) offers an effective and unbiased approach to inferring unknown probability distributions when faced with incomplete information, while neural networks provide the flexibility to learn complex distributions from data. This paper proposes a novel neural network architecture, the MEP-Net, which combines the MEP with neural networks to generate probability distributions from moment constraints. We also provide a comprehensive overview of the fundamentals of the maximum entropy principle, its mathematical formulations, and a rigorous justification for its applicability for non-equilibrium systems based on the large deviations principle. Through fruitful numerical experiments, we demonstrate that the MEP-Net can be particularly useful in modeling the evolution of probability distributions in biochemical reaction networks and in generating complex distributions from data.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Adaptive Sphericity Tests for High Dimensional Data
Authors:
Ping Zhao,
Wenwan Yang,
Long Feng,
Zhaojun Wang
Abstract:
In this paper, we investigate sphericity testing in high-dimensional settings, where existing methods primarily rely on sum-type test procedures that often underperform under sparse alternatives. To address this limitation, we propose two max-type test procedures utilizing the sample covariance matrix and the sample spatial-sign covariance matrix, respectively. Furthermore, we introduce two Cauchy…
▽ More
In this paper, we investigate sphericity testing in high-dimensional settings, where existing methods primarily rely on sum-type test procedures that often underperform under sparse alternatives. To address this limitation, we propose two max-type test procedures utilizing the sample covariance matrix and the sample spatial-sign covariance matrix, respectively. Furthermore, we introduce two Cauchy combination test procedures that integrate both sum-type and max-type tests, demonstrating their superiority across a wide range of sparsity levels in the alternative hypothesis. Our simulation studies corroborate these findings, highlighting the enhanced performance of our proposed methodologies in high-dimensional sphericity testi
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Golden Ratio-Based Sufficient Dimension Reduction
Authors:
Wenjing Yang,
Yuhong Yang
Abstract:
Many machine learning applications deal with high dimensional data. To make computations feasible and learning more efficient, it is often desirable to reduce the dimensionality of the input variables by finding linear combinations of the predictors that can retain as much original information as possible in the relationship between the response and the original predictors. We propose a neural net…
▽ More
Many machine learning applications deal with high dimensional data. To make computations feasible and learning more efficient, it is often desirable to reduce the dimensionality of the input variables by finding linear combinations of the predictors that can retain as much original information as possible in the relationship between the response and the original predictors. We propose a neural network based sufficient dimension reduction method that not only identifies the structural dimension effectively, but also estimates the central space well. It takes advantages of approximation capabilities of neural networks for functions in Barron classes and leads to reduced computation cost compared to other dimension reduction methods in the literature. Additionally, the framework can be extended to fit practical dimension reduction, making the methodology more applicable in practical settings.
△ Less
Submitted 29 January, 2025; v1 submitted 25 October, 2024;
originally announced October 2024.
-
Limit Theorems for Stochastic Gradient Descent with Infinite Variance
Authors:
Jose Blanchet,
Aleksandar Mijatović,
Wenhao Yang
Abstract:
Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when stochastic gradients are assumed to have a finite variance, there is significantly less research addressing its theoretical properties in the case of infinite variance g…
▽ More
Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when stochastic gradients are assumed to have a finite variance, there is significantly less research addressing its theoretical properties in the case of infinite variance gradients. In this paper, we establish the asymptotic behavior of stochastic gradient descent in the context of infinite variance stochastic gradients, assuming that the stochastic gradient is regular varying with index $α\in(1,2)$. The closest result in this context was established in 1969 , in the one-dimensional case and assuming that stochastic gradients belong to a more restrictive class of distributions. We extend it to the multidimensional case, covering a broader class of infinite variance distributions. As we show, the asymptotic distribution of the stochastic gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-Uhlenbeck process driven by an appropriate stable Lévy process. Additionally, we explore the applications of these results in linear regression and logistic regression models.
△ Less
Submitted 5 December, 2024; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Analysis of vessel traffic flow characteristics in inland restricted waterways using multi-source data
Authors:
Wenzhang Yang,
Peng Liao,
Shangkun Jiang,
Hao Wang
Abstract:
To effectively manage vessel traffic and alleviate congestion on busy inland waterways, a comprehensive understanding of vessel traffic flow characteristics is crucial. However, limited data availability has resulted in minimal research on the traffic flow characteristics of inland waterway vessels. This study addresses this gap by conducting vessel-following experiments and fixed-point video moni…
▽ More
To effectively manage vessel traffic and alleviate congestion on busy inland waterways, a comprehensive understanding of vessel traffic flow characteristics is crucial. However, limited data availability has resulted in minimal research on the traffic flow characteristics of inland waterway vessels. This study addresses this gap by conducting vessel-following experiments and fixed-point video monitoring in inland waterways, collecting multi-source data to analyze vessel traffic flow characteristics. First, the analysis of vessel speed distribution identifies the economic speed for vessels operating in these environments. Next, the relationship between microscopic vessel speed and gap distance is examined, with the logarithmic model emerging as the most accurate among various tested models. Additionally, the study explores the relationships among macroscopic speed, density, and flow rate, proposing a novel piecewise fundamental diagram model to describe these relationships. Lastly, the inland vessel traffic states are categorized using K-means clustering algorithm and applied to vessel navigation services. These findings provide valuable insights for enhancing inland waterway transportation and advancing the development of an integrated waterway transportation system.
△ Less
Submitted 21 September, 2024;
originally announced October 2024.
-
Marginal Structural Modeling of Representative Treatment Trajectories
Authors:
Jiewen Liu,
Todd A. Miano,
Stephen Griffiths,
Michael G. S. Shashaty,
Wei Yang
Abstract:
Marginal structural models (MSMs) are widely used in observational studies to estimate the causal effect of time-varying treatments. Despite its popularity, limited attention has been paid to summarizing the treatment history in the outcome model, which proves particularly challenging when individuals' treatment trajectories exhibit complex patterns over time. Commonly used metrics such as the ave…
▽ More
Marginal structural models (MSMs) are widely used in observational studies to estimate the causal effect of time-varying treatments. Despite its popularity, limited attention has been paid to summarizing the treatment history in the outcome model, which proves particularly challenging when individuals' treatment trajectories exhibit complex patterns over time. Commonly used metrics such as the average treatment level fail to adequately capture the treatment history, hindering causal interpretation. For scenarios where treatment histories exhibit distinct temporal patterns, we develop a new approach to parameterize the outcome model. We apply latent growth curve analysis to identify representative treatment trajectories from the observed data and use the posterior probability of latent class membership to summarize the different treatment trajectories. We demonstrate its use in parameterizing the MSMs, which facilitates the interpretations of the results. We apply the method to analyze data from an existing cohort of lung transplant recipients to estimate the effect of Tacrolimus concentrations on the risk of incident chronic kidney disease.
△ Less
Submitted 16 September, 2024; v1 submitted 7 September, 2024;
originally announced September 2024.
-
Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions
Authors:
Patrick Kuiper,
Ali Hasan,
Wenhao Yang,
Yuting Ng,
Hoda Bidkhori,
Jose Blanchet,
Vahid Tarokh
Abstract:
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by defin…
▽ More
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by definition scarce, the potential for model misspecification error is inherent to these applications, thus DRO estimators are natural. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. We study both tractable convex formulations for some problems of interest (e.g. CVaR) and more general neural network based estimators. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques. Additionally, the proposed method is applied to a real data set of financial returns for comparison to a previous analysis. We established the proposed model as a novel formulation in the multivariate EVT domain, and innovative with respect to performance when compared to relevant alternate proposals.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Hypothesis Testing for Class-Conditional Noise Using Local Maximum Likelihood
Authors:
Weisong Yang,
Rafael Poyiadzi,
Niall Twomey,
Raul Santos Rodriguez
Abstract:
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a given instance-label dataset is contaminated with class-conditional label noise, as opposed to uniform label noise. The existing theory builds on the asymptotic…
▽ More
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a given instance-label dataset is contaminated with class-conditional label noise, as opposed to uniform label noise. The existing theory builds on the asymptotic properties of the Maximum Likelihood Estimate for parametric logistic regression. However, the parametric assumptions on top of which these approaches are constructed are often too strong and unrealistic in practice. To alleviate this problem, in this paper we propose an alternative path by showing how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation that leads to more flexible nonparametric logistic regression models, which in turn are less susceptible to model misspecification. This different view allows for wider applicability of the tests by offering users access to a richer model class. Similarly to existing works, we assume we have access to anchor points which are provided by the users. We introduce the necessary ingredients for the adaptation of the hypothesis tests to the case of nonparametric logistic regression and empirically compare against the parametric approach presenting both synthetic and real-world case studies and discussing the advantages and limitations of the proposed approach.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
A Novel Human-Based Meta-Heuristic Algorithm: Dragon Boat Optimization
Authors:
Xiang Li,
Long Lan,
Husam Lahza,
Shaowu Yang,
Shuihua Wang,
Wenjing Yang,
Hengzhu Liu,
Yudong Zhang
Abstract:
(Aim) Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. (Method) It models the unique behaviors of each crew member on the dragon boat during the race by introducing social psychology mechanisms (social…
▽ More
(Aim) Dragon Boat Racing, a popular aquatic folklore team sport, is traditionally held during the Dragon Boat Festival. Inspired by this event, we propose a novel human-based meta-heuristic algorithm called dragon boat optimization (DBO) in this paper. (Method) It models the unique behaviors of each crew member on the dragon boat during the race by introducing social psychology mechanisms (social loafing, social incentive). Throughout this process, the focus is on the interaction and collaboration among the crew members, as well as their decision-making in different situations. During each iteration, DBO implements different state updating strategies. By modelling the crew's behavior and adjusting the state updating strategies, DBO is able to maintain high-performance efficiency. (Results) We have tested the DBO algorithm with 29 mathematical optimization problems and 2 structural design problems. (Conclusion) The experimental results demonstrate that DBO is competitive with state-of-the-art meta-heuristic algorithms as well as conventional methods.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Iterative missing value imputation based on feature importance
Authors:
Cong Guo,
Chun Liu,
Wei Yang
Abstract:
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature sp…
▽ More
Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Bayesian modelling of response to therapy and drug-sensitivity in acute lymphoblastic leukemia
Authors:
Andrea Cremaschi,
Wenjian Yang,
Maria De Iorio,
William E. Evans,
Jun J. Yang,
Gary L. Rosner
Abstract:
Acute lymphoblastic leukemia (ALL) is a heterogeneous hematologic malignancy involving the abnormal proliferation of immature lymphocytes, accounting for most pediatric cancer cases. ALL management in children has seen great improvement in the last decades thanks to better understanding of the disease leading to improved treatment strategies evidenced through clinical trials. Commonly a first cour…
▽ More
Acute lymphoblastic leukemia (ALL) is a heterogeneous hematologic malignancy involving the abnormal proliferation of immature lymphocytes, accounting for most pediatric cancer cases. ALL management in children has seen great improvement in the last decades thanks to better understanding of the disease leading to improved treatment strategies evidenced through clinical trials. Commonly a first course of chemotherapy (induction phase) is administered, followed by treatment with a combination of anti-leukemia drugs. A measure of the efficacy early in the course of therapy is minimal residual disease (MRD). MRD quantifies residual tumor cells and indicates the effectiveness of the treatment over the course of therapy. MRD positivity is defined for values of MRD greater than 0.01%, yielding left-censored observations. We propose a Bayesian model to study the relationship between patient features and MRD observed at two time points during the induction phase. Specifically, we model the observed MRD values via an auto-regressive model, accounting for left-censoring of the data and for the fact that some patients are already in remission after the induction phase. Patient characteristics are included in the model via linear regression terms. In particular, patient-specific drug sensitivity based on ex-vivo assays of patient samples is exploited to identify groups of subjects with similar profiles. We include this information as a covariate in the model for MRD. We adopt horseshoe priors for the regression coefficients to perform variable selection to identify important covariates. We fit the proposed approach to data from three prospective pediatric ALL clinical trials carried out at the St. Jude Children's Research Hospital. Our results highlight that drug sensitivity profiles and leukemic subtypes play an important role in the response to induction therapy as measured by serial MRD measures.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Backward Joint Model for the Dynamic Prediction of Both Competing Risk and Longitudinal Outcomes
Authors:
Wenhao Li,
Brad C. Astor,
Wei Yang,
Tom H. Greene,
Liang Li
Abstract:
Joint modeling is a useful approach to dynamic prediction of clinical outcomes using longitudinally measured predictors. When the outcomes are competing risk events, fitting the conventional shared random effects joint model often involves intensive computation, especially when multiple longitudinal biomarkers are be used as predictors, as is often desired in prediction problems. This paper propos…
▽ More
Joint modeling is a useful approach to dynamic prediction of clinical outcomes using longitudinally measured predictors. When the outcomes are competing risk events, fitting the conventional shared random effects joint model often involves intensive computation, especially when multiple longitudinal biomarkers are be used as predictors, as is often desired in prediction problems. This paper proposes a new joint model for the dynamic prediction of competing risk outcomes. The model factorizes the likelihood into the distribution of the competing risks data and the distribution of longitudinal data given the competing risks data. It extends the basic idea of the recently published backward joint model (BJM) to the competing risk setting, and we call this model crBJM. This model also enables the prediction of future longitudinal data trajectories conditional on being at risk at a future time, a practically important problem that has not been studied in the statistical literature. The model fitting with the EM algorithm is efficient, stable and computationally fast, with a one-dimensional integral in the E-step and convex optimization for most parameters in the M-step, regardless of the number of longitudinal predictors. The model also comes with a consistent albeit less efficient estimation method that can be quickly implemented with standard software, ideal for model building and diagnostics. We study the numerical properties of the proposed method using simulations and illustrate its use in a chronic kidney disease study.
△ Less
Submitted 30 August, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Estimation and Inference in Distributional Reinforcement Learning
Authors:
Liangyu Zhang,
Yang Peng,
Jiadong Liang,
Wenhao Yang,
Zhihua Zhang
Abstract:
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $η^π$) attained by a given policy $π$. We use the certainty-equivalence method to construct our estimator $\hatη^π$, given a generative model is available. In this circumstance we…
▽ More
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $η^π$) attained by a given policy $π$. We use the certainty-equivalence method to construct our estimator $\hatη^π$, given a generative model is available. In this circumstance we need a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2p}(1-γ)^{2p+2}}\right)$ to guarantee the $p$-Wasserstein metric between $\hatη^π$ and $η^π$ less than $\varepsilon$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2}(1-γ)^{4}}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hatη^π$ and $η^π$ is below $\varepsilon$ with high probability. Furthermore, we investigate the asymptotic behavior of $\hatη^π$. We demonstrate that the ``empirical process'' $\sqrt{n}(\hatη^π-η^π)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell^\infty(\mathcal{F}_{\text{W}})$, also in the space of bounded functionals on indicator function class $\ell^\infty(\mathcal{F}_{\text{KS}})$ and bounded measurable function class $\ell^\infty(\mathcal{F}_{\text{TV}})$ when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $η^π$.
△ Less
Submitted 19 September, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Globally Convergent Accelerated Algorithms for Multilinear Sparse Logistic Regression with $\ell_0$-constraints
Authors:
Weifeng Yang,
Wenwen Min
Abstract:
Tensor data represents a multidimensional array. Regression methods based on low-rank tensor decomposition leverage structural information to reduce the parameter count. Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data. To improve its efficacy and interpretability, we present a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints…
▽ More
Tensor data represents a multidimensional array. Regression methods based on low-rank tensor decomposition leverage structural information to reduce the parameter count. Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data. To improve its efficacy and interpretability, we present a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints ($\ell_0$-MLSR). In contrast to the $\ell_1$-norm and $\ell_2$-norm, the $\ell_0$-norm constraint is better suited for feature selection. However, due to its nonconvex and nonsmooth properties, solving it is challenging and convergence guarantees are lacking. Additionally, the multilinear operation in $\ell_0$-MLSR also brings non-convexity. To tackle these challenges, we propose an Accelerated Proximal Alternating Linearized Minimization with Adaptive Momentum (APALM$^+$) method to solve the $\ell_0$-MLSR model. We provide a proof that APALM$^+$ can ensure the convergence of the objective function of $\ell_0$-MLSR. We also demonstrate that APALM$^+$ is globally convergent to a first-order critical point as well as establish convergence rate by using the Kurdyka-Lojasiewicz property. Empirical results obtained from synthetic and real-world datasets validate the superior performance of our algorithm in terms of both accuracy and speed compared to other state-of-the-art methods.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
A Likelihood Approach to Incorporating Self-Report Data in HIV Recency Classification
Authors:
Wenlong Yang,
Danping Liu,
Le Bao,
Runze Li
Abstract:
Estimating new HIV infections is significant yet challenging due to the difficulty in distinguishing between recent and long-term infections. We demonstrate that HIV recency status (recent v.s. long-term) could be determined from the combination of self-report testing history and biomarkers, which are increasingly available in bio-behavioral surveys. HIV recency status is partially observed, given…
▽ More
Estimating new HIV infections is significant yet challenging due to the difficulty in distinguishing between recent and long-term infections. We demonstrate that HIV recency status (recent v.s. long-term) could be determined from the combination of self-report testing history and biomarkers, which are increasingly available in bio-behavioral surveys. HIV recency status is partially observed, given the self-report testing history. For example, people who tested positive for HIV over one year ago should have a long-term infection. Based on the nationally representative samples collected by the Population-based HIV Impact Assessment (PHIA) Project, we propose a likelihood-based probabilistic model for HIV recency classification. The model incorporates both labeled and unlabeled data and integrates the mechanism of how HIV recency status depends on biomarkers and the mechanism of how HIV recency status, together with the self-report time of the most recent HIV test, impacts the test results, via a set of logistic regression models. We compare our method to logistic regression and the binary classification tree (current practice) on Malawi, Zimbabwe, and Zambia PHIA data, as well as on simulated data. Our model obtains more efficient and less biased parameter estimates and is relatively robust to potential reporting error and model misspecification.
△ Less
Submitted 12 November, 2024; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Perfect simulation from unbiased simulation
Authors:
George M. Leigh,
Wen-Hsi Yang,
Montana E. Wickens,
Amanda R. Northrop
Abstract:
We show that any application of the technique of unbiased simulation becomes perfect simulation when coalescence of the two coupled Markov chains can be practically assured in advance. This happens when a fixed number of iterations is high enough that the probability of needing any more to achieve coalescence is negligible; we suggest a value of $10^{-20}$. This finding enormously increases the ra…
▽ More
We show that any application of the technique of unbiased simulation becomes perfect simulation when coalescence of the two coupled Markov chains can be practically assured in advance. This happens when a fixed number of iterations is high enough that the probability of needing any more to achieve coalescence is negligible; we suggest a value of $10^{-20}$. This finding enormously increases the range of problems for which perfect simulation, which exactly follows the target distribution, can be implemented. We design a new algorithm to make practical use of the high number of iterations by producing extra perfect sample points with little extra computational effort, at a cost of a small, controllable amount of serial correlation within sample sets of about 20 points. Different sample sets remain completely independent. The algorithm includes maximal coupling for continuous processes, to bring together chains that are already close. We illustrate the methodology on a simple, two-state Markov chain and on standard normal distributions up to 20 dimensions. Our technical formulation involves a nonzero probability, which can be made arbitrarily small, that a single perfect sample point may have its place taken by a "string" of many points which are assigned weights, each equal to $\pm 1$, that sum to~$1$. A point with a weight of $-1$ is a "hole", which is an object that can be cancelled by an equivalent point that has the same value but opposite weight $+1$.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning
Authors:
Liangyu Zhang,
Yang Peng,
Wenhao Yang,
Zhihua Zhang
Abstract:
We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CR…
▽ More
We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CRL is a model-based reinforcement learning algorithm. Given an estimate of the transition model, we first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. SI-CPO is a policy optimization algorithm. Borrowing the ideas from the cooperative stochastic approximation approach, we make alternative updates to the policy parameters to maximize the reward or minimize the cost. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems. We present theoretical analysis for SI-CRL and SI-CPO, identifying their iteration complexity and sample complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and demonstrate that our proposed algorithms are able to solve complex sequential decision-making tasks leveraging modern deep reinforcement learning techniques.
△ Less
Submitted 29 April, 2023;
originally announced May 2023.
-
Tests for ultrahigh-dimensional partially linear regression models
Authors:
Hongwei Shi,
Bowen Sun,
Weichao Yang,
Xu Guo
Abstract:
In this paper, we consider tests for ultrahigh-dimensional partially linear regression models. The presence of ultrahigh-dimensional nuisance covariates and unknown nuisance function makes the inference problem very challenging. We adopt machine learning methods to estimate the unknown nuisance function and introduce quadratic-form test statistics. Interestingly, though the machine learning method…
▽ More
In this paper, we consider tests for ultrahigh-dimensional partially linear regression models. The presence of ultrahigh-dimensional nuisance covariates and unknown nuisance function makes the inference problem very challenging. We adopt machine learning methods to estimate the unknown nuisance function and introduce quadratic-form test statistics. Interestingly, though the machine learning methods can be very complex, under suitable conditions, we establish the asymptotic normality of our introduced test statistics under the null hypothesis and local alternative hypotheses. We further propose a power-enhanced procedure to improve the test statistics' performance. Two thresholding determination methods are provided for the power-enhanced procedure. We show that the power-enhanced procedure is powerful to detect signals under either sparse or dense alternatives and it can still control the type-I error asymptotically under the null hypothesis. Numerical studies are carried out to illustrate the empirical performance of our introduced procedures.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Semiparametric efficient estimation of genetic relatedness with machine learning methods
Authors:
Xu Guo,
Yiyuan Qian,
Hongwei Shi,
Weichao Yang,
Niwen Zhou
Abstract:
In this paper, we propose semiparametric efficient estimators of genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model misspecification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genet…
▽ More
In this paper, we propose semiparametric efficient estimators of genetic relatedness between two traits in a model-free framework. Most existing methods require specifying certain parametric models involving the traits and genetic variants. However, the bias due to model misspecification may yield misleading statistical results. Moreover, the semiparametric efficient bounds for estimators of genetic relatedness are still lacking. In this paper, we develop semiparametric efficient estimators with machine learning methods and construct valid confidence intervals for two important measures of genetic relatedness: genetic covariance and genetic correlation, allowing both continuous and discrete responses. Based on the derived efficient influence functions of genetic relatedness, we propose a consistent estimator of the genetic covariance as long as one of genetic values is consistently estimated. The data of two traits may be collected from the same group or different groups of individuals. Various numerical studies are performed to illustrate our introduced procedures. We also apply proposed procedures to analyze Carworth Farms White mice genome-wide association study data.
△ Less
Submitted 2 June, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Capturing episodic impacts of environmental signals
Authors:
Manuela Mendiolar,
Jerzy A. Filar,
Wen-Hsi Yang,
Susannah Leahy,
Anthony Courtney
Abstract:
Environmental scientists frequently rely on time series of explanatory variables to explain their impact on an important response variable. However, sometimes, researchers are less interested in raw observations of an explanatory variable than in derived indices induced by episodes embedded in its time series. Often these episodes are intermittent, occur within a specific limited memory, persist f…
▽ More
Environmental scientists frequently rely on time series of explanatory variables to explain their impact on an important response variable. However, sometimes, researchers are less interested in raw observations of an explanatory variable than in derived indices induced by episodes embedded in its time series. Often these episodes are intermittent, occur within a specific limited memory, persist for varying durations, at varying levels of intensity, and overlap important periods with respect to the response variable. We develop a generic, parametrised, family of weighted indices extracted from an environmental signal called IMPIT indices. To facilitate their construction and calibration, we developed a user friendly app in Shiny R referred to as IMPIT-a. We construct examples of IMPIT indices extracted from the Southern Oscillation Index and sea surface temperature signals. We illustrate their applications to two fished species in Queensland waters (i.e., snapper and saucer scallop) and wheat yield in New South Wales.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
A novel approach of empirical likelihood with massive data
Authors:
Yang Liu,
Xia Chen,
Wei-min Yang
Abstract:
In this paper, we propose a novel approach for tackling the obstacles of empirical likelihood in the face of massive data, which is called split sample mean empirical likelihood (SSMEL), our approach provides a unique perspective for solving big data problems. We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimator with the full dataset, and mainta…
▽ More
In this paper, we propose a novel approach for tackling the obstacles of empirical likelihood in the face of massive data, which is called split sample mean empirical likelihood (SSMEL), our approach provides a unique perspective for solving big data problems. We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimator with the full dataset, and maintains the important statistical property of Wilks' theorem, allowing our proposed approach to be used for statistical inference without estimating the covariance matrix. This effectively tackles the hurdle of the Divide and Conquer (DC) algorithm for statistical inference. We further illustrate the proposed approach via simulation studies and real data analysis.
△ Less
Submitted 5 June, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Discovering a change point and piecewise linear structure in a time series of organoid networks via the iso-mirror
Authors:
Tianyi Chen,
Youngser Park,
Ali Saad-Eldin,
Zachary Lubberts,
Avanti Athreya,
Benjamin D. Pedigo,
Joshua T. Vogelstein,
Francesca Puppo,
Gabriel A. Silva,
Alysson R. Muotri,
Weiwei Yang,
Christopher M. White,
Carey E. Priebe
Abstract:
Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to t…
▽ More
Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to the time series of inferred effective connectivity organoid networks. This method produces a one-dimensional iso-mirror representation of the dynamics of the time series of the networks which exhibits a piecewise linear structure. A classical change point algorithm is then applied to this representation, which successfully detects a change point coinciding with the neuroscientifically significant time inhibitory neurons start appearing and the percentage of astrocytes increases dramatically. This finding demonstrates the potential utility of applying the iso-mirror dynamic structure discovery method to inferred effective connectivity time series of organoid networks.
△ Less
Submitted 12 April, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Approximately optimal domain adaptation with Fisher's Linear Discriminant
Authors:
Hayden S. Helm,
Ashwin De Silva,
Joshua T. Vogelstein,
Carey E. Priebe,
Weiwei Yang
Abstract:
We propose a class of models based on Fisher's Linear Discriminant (FLD) in the context of domain adaptation. The class is the convex combination of two hypotheses: i) an average hypothesis representing previously seen source tasks and ii) a hypothesis trained on a new target task. For a particular generative setting we derive the optimal convex combination of the two models under 0-1 loss, propos…
▽ More
We propose a class of models based on Fisher's Linear Discriminant (FLD) in the context of domain adaptation. The class is the convex combination of two hypotheses: i) an average hypothesis representing previously seen source tasks and ii) a hypothesis trained on a new target task. For a particular generative setting we derive the optimal convex combination of the two models under 0-1 loss, propose a computable approximation, and study the effect of various parameter settings on the relative risks between the optimal hypothesis, hypothesis i), and hypothesis ii). We demonstrate the effectiveness of the proposed optimal classifier in the context of EEG- and ECG-based classification settings and argue that the optimal classifier can be computed without access to direct information from any of the individual source tasks. We conclude by discussing further applications, limitations, and possible future directions.
△ Less
Submitted 1 March, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Robust Markov Decision Processes without Model Estimation
Authors:
Wenhao Yang,
Han Wang,
Tadashi Kozuno,
Scott M. Jordan,
Zhihua Zhang
Abstract:
Robust Markov Decision Processes (MDPs) are receiving much attention in learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, there are two major barriers to applying robust MDPs in practice. First, most works study robust MDPs in a model-based regime, where the transition probability ne…
▽ More
Robust Markov Decision Processes (MDPs) are receiving much attention in learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, there are two major barriers to applying robust MDPs in practice. First, most works study robust MDPs in a model-based regime, where the transition probability needs to be estimated and requires a large amount of memories $\mathcal{O}(|\mathcal{S}|^2|\mathcal{A}|)$. Second, prior work typically assumes a strong oracle to obtain the optimal solution as an intermediate step to solve robust MDPs. However, in practice, such an oracle does not exist usually. To remove the oracle, we transform the original robust MDPs into an alternative form, which allows us to use stochastic gradient methods to solve the robust MDPs. Moreover, we prove the alternative form still plays a similar role as the original form. With this new formulation, we devise a sample-efficient algorithm to solve the robust MDPs in a model-free regime, which does not require an oracle and trades off a lower storage requirement $\mathcal{O}(|\mathcal{S}||\mathcal{A}|)$ with being able to generate samples from a generative model or Markovian chain. Finally, we validate our theoretical findings via numerical experiments, showing the efficiency with the alternative form of robust MDPs.
△ Less
Submitted 12 September, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Score function-based tests for ultrahigh-dimensional linear models
Authors:
Weichao Yang,
Xu Guo,
Lixing Zhu
Abstract:
In this paper, we investigate score function-based tests to check the significance of an ultrahigh-dimensional sub-vector of the model coefficients when the nuisance parameter vector is also ultrahigh-dimensional in linear models. We first reanalyze and extend a recently proposed score function-based test to derive, under weaker conditions, its limiting distributions under the null and local alter…
▽ More
In this paper, we investigate score function-based tests to check the significance of an ultrahigh-dimensional sub-vector of the model coefficients when the nuisance parameter vector is also ultrahigh-dimensional in linear models. We first reanalyze and extend a recently proposed score function-based test to derive, under weaker conditions, its limiting distributions under the null and local alternative hypotheses. As it may fail to work when the correlation between testing covariates and nuisance covariates is high, we propose an orthogonalized score function-based test with two merits: debiasing to make the non-degenerate error term degenerate and reducing the asymptotic variance to enhance power performance. Simulations evaluate the finite-sample performances of the proposed tests, and a real data analysis illustrates its application.
△ Less
Submitted 9 November, 2024; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Inference and Prediction Using Functional Principal Components Analysis: Application to Diabetic Kidney Disease Progression in the Chronic Renal Insufficiency Cohort (CRIC) Study
Authors:
Brian Kwan,
Wei Yang,
Daniel Montemayor,
Jing Zhang,
Tobias Fuhrer,
Amanda H. Anderson,
Cheryl A. M. Anderson,
Jing Chen,
Ana C. Ricardo,
Sylvia E. Rosas,
Loki Natarajan,
the CRIC Study Investigators
Abstract:
Repeated longitudinal measurements are commonly used to model long-term disease progression, and timing and number of assessments per patient may vary, leading to irregularly spaced and sparse data. Longitudinal trajectories may exhibit curvilinear patterns, in which mixed linear regression methods may fail to capture true trends in the data. We applied functional principal components analysis to…
▽ More
Repeated longitudinal measurements are commonly used to model long-term disease progression, and timing and number of assessments per patient may vary, leading to irregularly spaced and sparse data. Longitudinal trajectories may exhibit curvilinear patterns, in which mixed linear regression methods may fail to capture true trends in the data. We applied functional principal components analysis to model kidney disease progression via estimated glomerular filtration rate (eGFR) trajectories. In a cohort of 2641 participants with diabetes and up to 15 years of annual follow-up from the Chronic Renal Insufficiency Cohort (CRIC) study, we detected novel dominant modes of variation and patterns of diabetic kidney disease (DKD) progression among subgroups defined by the presence of albuminuria. We conducted inferential permutation tests to assess differences in longitudinal eGFR patterns between groups. To determine whether fitting a full cohort model or separate group-specific models is more optimal for modeling long-term trajectories, we evaluated model fit, using our goodness-of-fit procedure, and future prediction accuracy. Our findings indicated advantages for both modeling approaches in accomplishing different objectives. Beyond DKD, the methods described are applicable to other settings with longitudinally assessed biomarkers as indicators of disease progression. Supplementary materials for this article are available online.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach
Authors:
Miao Lu,
Wenhao Yang,
Liangyu Zhang,
Zhihua Zhang
Abstract:
In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage…
▽ More
In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Bayesian Circular Lattice Filters for Computationally Efficient Estimation of Multivariate Time-Varying Autoregressive Models
Authors:
Yuelei Sui,
Scott H. Holan,
Wen-Hsi Yang
Abstract:
Nonstationary time series data exist in various scientific disciplines, including environmental science, biology, signal processing, econometrics, among others. Many Bayesian models have been developed to handle nonstationary time series. The time-varying vector autoregressive (TV-VAR) model is a well-established model for multivariate nonstationary time series. Nevertheless, in most cases, the la…
▽ More
Nonstationary time series data exist in various scientific disciplines, including environmental science, biology, signal processing, econometrics, among others. Many Bayesian models have been developed to handle nonstationary time series. The time-varying vector autoregressive (TV-VAR) model is a well-established model for multivariate nonstationary time series. Nevertheless, in most cases, the large number of parameters presented by the model results in a high computational burden, ultimately limiting its usage. This paper proposes a computationally efficient multivariate Bayesian Circular Lattice Filter to extend the usage of the TV-VAR model to a broader class of high-dimensional problems. Our fully Bayesian framework allows both the autoregressive (AR) coefficients and innovation covariance to vary over time. Our estimation method is based on the Bayesian lattice filter (BLF), which is extremely computationally efficient and stable in univariate cases. To illustrate the effectiveness of our approach, we conduct a comprehensive comparison with other competing methods through simulation studies and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Finally, we demonstrate our methodology through applications to quarterly Gross Domestic Product (GDP) data and Northern California wind data.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation
Authors:
Wenjing Yang,
Ganghua Wang,
Jie Ding,
Yuhong Yang
Abstract:
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping redundant weights, neurons, or layers. There have been many model compression algorithms proposed that provide impressive empirical success. However, a theoretical…
▽ More
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping redundant weights, neurons, or layers. There have been many model compression algorithms proposed that provide impressive empirical success. However, a theoretical understanding of model compression is still limited. One problem is understanding if a network is more compressible than another of the same structure. Another problem is quantifying how much one can prune a network with theoretically guaranteed accuracy degradation. In this work, we propose to use the sparsity-sensitive $\ell_q$-norm ($0<q<1$) to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression with a controlled accuracy degradation bound. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory. Numerical studies demonstrate the promising performance of the proposed methods compared with standard pruning algorithms.
△ Less
Submitted 8 November, 2022; v1 submitted 11 June, 2022;
originally announced June 2022.
-
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Authors:
Tadashi Kozuno,
Wenhao Yang,
Nino Vieillard,
Toshinori Kitamura,
Yunhao Tang,
Jincheng Mei,
Pierre Ménard,
Mohammad Gheshlaghi Azar,
Michal Valko,
Rémi Munos,
Olivier Pietquin,
Matthieu Geist,
Csaba Szepesvári
Abstract:
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for fi…
▽ More
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $\varepsilon$-optimal policy when $\varepsilon$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Federated Reinforcement Learning with Environment Heterogeneity
Authors:
Hao Jin,
Yang Peng,
Wenhao Yang,
Shusen Wang,
Zhihua Zhang
Abstract:
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy funct…
▽ More
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Mental State Classification Using Multi-graph Features
Authors:
Guodong Chen,
Hayden S. Helm,
Kate Lytvynets,
Weiwei Yang,
Carey E. Priebe
Abstract:
We consider the problem of extracting features from passive, multi-channel electroencephalogram (EEG) devices for downstream inference tasks related to high-level mental states such as stress and cognitive load. Our proposed method leverages recently developed multi-graph tools and applies them to the time series of graphs implied by the statistical dependence structure (e.g., correlation) amongst…
▽ More
We consider the problem of extracting features from passive, multi-channel electroencephalogram (EEG) devices for downstream inference tasks related to high-level mental states such as stress and cognitive load. Our proposed method leverages recently developed multi-graph tools and applies them to the time series of graphs implied by the statistical dependence structure (e.g., correlation) amongst the multiple sensors. We compare the effectiveness of the proposed features to traditional band power-based features in the context of three classification experiments and find that the two feature sets offer complementary predictive information. We conclude by showing that the importance of particular channels and pairs of channels for classification when using the proposed features is neuroscientifically valid.
△ Less
Submitted 25 February, 2022;
originally announced March 2022.
-
A Statistical Analysis of Polyak-Ruppert Averaged Q-learning
Authors:
Xiang Li,
Wenhao Yang,
Jiadong Liang,
Zhihua Zhang,
Michael I. Jordan
Abstract:
We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings. Under a Lipschitz condition, we establish a functional central limit theorem for the averaged iteration $\bar{\boldsymbol{Q}}_T$ and show that its standardized partial-sum process converges weakly to a rescaled Brownian motion. The functional central limit theorem implies…
▽ More
We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings. Under a Lipschitz condition, we establish a functional central limit theorem for the averaged iteration $\bar{\boldsymbol{Q}}_T$ and show that its standardized partial-sum process converges weakly to a rescaled Brownian motion. The functional central limit theorem implies a fully online inference method for reinforcement learning. Furthermore, we show that $\bar{\boldsymbol{Q}}_T$ is the regular asymptotically linear (RAL) estimator for the optimal Q-value function $\boldsymbol{Q}^*$ that has the most efficient influence function. We present a nonasymptotic analysis for the $\ell_{\infty}$ error, $\mathbb{E}\|\bar{\boldsymbol{Q}}_T-\boldsymbol{Q}^*\|_{\infty}$, showing that it matches the instance-dependent lower bound for polynomial step sizes. Similar results are provided for entropy-regularized Q-learning without the Lipschitz condition.
△ Less
Submitted 19 February, 2023; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Merlion: A Machine Learning Library for Time Series
Authors:
Aadyot Bhatnagar,
Paul Kassianik,
Chenghao Liu,
Tian Lan,
Wenzhuo Yang,
Rowan Cassius,
Doyen Sahoo,
Devansh Arpit,
Sri Subramanian,
Gerald Woo,
Amrita Saha,
Arun Kumar Jagota,
Gokulakrishnan Gopalakrishnan,
Manpreet Singh,
K C Krithika,
Sukumar Maddineni,
Daeki Cho,
Bo Zong,
Yingbo Zhou,
Caiming Xiong,
Silvio Savarese,
Steven Hoi,
Huan Wang
Abstract:
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve in…
▽ More
We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion's architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.
△ Less
Submitted 19 September, 2021;
originally announced September 2021.
-
A Survey of Uncertainty in Deep Neural Networks
Authors:
Jakob Gawlikowski,
Cedrique Rovile Njieutcheu Tassi,
Mohsin Ali,
Jongseok Lee,
Matthias Humt,
Jianxiang Feng,
Anna Kruspe,
Rudolph Triebel,
Peter Jung,
Ribana Roscher,
Muhammad Shahzad,
Wen Yang,
Richard Bamler,
Xiao Xiang Zhu
Abstract:
Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identifi…
▽ More
Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.
△ Less
Submitted 18 January, 2022; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Leveraging semantically similar queries for ranking via combining representations
Authors:
Hayden S. Helm,
Marah Abdin,
Benjamin D. Pedigo,
Shweti Mahajan,
Vince Lyzinski,
Youngser Park,
Amitabh Basu,
Piali~Choudhury,
Christopher M. White,
Weiwei Yang,
Carey E. Priebe
Abstract:
In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of l…
▽ More
In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
Authors:
Wenhao Yang,
Liangyu Zhang,
Zhihua Zhang
Abstract:
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectang…
▽ More
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $χ^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2ρ^2(1-γ)^4}\right)$. In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.
△ Less
Submitted 12 August, 2022; v1 submitted 9 May, 2021;
originally announced May 2021.
-
Envelope Methods with Ignorable Missing Data
Authors:
Linquan Ma,
Lan Liu,
Wei Yang
Abstract:
Envelope method was recently proposed as a method to reduce the dimension of responses in multivariate regressions. However, when there exists missing data, the envelope method using the complete case observations may lead to biased and inefficient results. In this paper, we generalize the envelope estimation when the predictors and/or the responses are missing at random. Specifically, we incorpor…
▽ More
Envelope method was recently proposed as a method to reduce the dimension of responses in multivariate regressions. However, when there exists missing data, the envelope method using the complete case observations may lead to biased and inefficient results. In this paper, we generalize the envelope estimation when the predictors and/or the responses are missing at random. Specifically, we incorporate the envelope structure in the expectation-maximization (EM) algorithm. As the parameters under the envelope method are not pointwise identifiable, the EM algorithm for the envelope method was not straightforward and requires a special decomposition. Our method is guaranteed to be more efficient, or at least as efficient as, the standard EM algorithm. Moreover, our method has the potential to outperform the full data MLE. We give asymptotic properties of our method under both normal and non-normal cases. The efficiency gain over the standard EM is confirmed in simulation studies and in an application to the Chronic Renal Insufficiency Cohort (CRIC) study.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Inducing a hierarchy for multi-class classification problems
Authors:
Hayden S. Helm,
Weiwei Yang,
Sujeeth Bharadwaj,
Kate Lytvynets,
Oriana Riva,
Christopher White,
Ali Geisa,
Carey E. Priebe
Abstract:
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that c…
▽ More
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers. The class of methods follows the structure of first clustering the conditional distributions and subsequently using a hierarchical classifier with the induced hierarchy. We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Seasonal association between viral causes of hospitalised acute lower respiratory infections and meteorological factors in China: a retrospective study
Authors:
Bing Xu,
Jinfeng Wang,
Zhongjie Li,
Chengdong Xu,
Yilan Liao,
Maogui Hu,
Jing Yang,
Shengjie Lai,
Liping Wang,
Weizhong Yang
Abstract:
Acute lower respiratory infections caused by respiratory viruses are common and persistent infectious diseases worldwide and in China, which have pronounced seasonal patterns. Meteorological factors have important roles in the seasonality of some major viruses. Our aim was to identify the dominant meteorological factors and to model their effects on common respiratory viruses in different regions…
▽ More
Acute lower respiratory infections caused by respiratory viruses are common and persistent infectious diseases worldwide and in China, which have pronounced seasonal patterns. Meteorological factors have important roles in the seasonality of some major viruses. Our aim was to identify the dominant meteorological factors and to model their effects on common respiratory viruses in different regions of China. We analysed monthly virus data on patients from 81 sentinel hospitals in 22 provinces in mainland China from 2009 to 2013. The geographical detector method was used to quantify the explanatory power of each meteorological factor, individually and interacting in pairs. 28369 hospitalised patients with ALRI were tested, 10387 were positive for at least one virus, including RSV, influenza virus, PIV, ADV, hBoV, hCoV and hMPV. RSV and influenza virus had annual peaks in the north and biannual peaks in the south. PIV and hBoV had higher positive rates in the spring summer months. hMPV had an annual peak in winter spring, especially in the north. ADV and hCoV exhibited no clear annual seasonality. Temperature, atmospheric pressure, vapour pressure, and rainfall had most explanatory power on most respiratory viruses in each region. Relative humidity was only dominant in the north, but had no significant explanatory power for most viruses in the south. Hours of sunlight had significant explanatory power for RSV and influenza virus in the north, and for most viruses in the south. Wind speed was the only factor with significant explanatory power for human coronavirus in the south. For all viruses, interactions between any two of the paired factors resulted in enhanced explanatory power, either bivariately or non-linearly.
△ Less
Submitted 15 April, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
A partition-based similarity for classification distributions
Authors:
Hayden S. Helm,
Ronak D. Mehta,
Brandon Duderstadt,
Weiwei Yang,
Christoper M. White,
Ali Geisa,
Joshua T. Vogelstein,
Carey E. Priebe
Abstract:
Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a…
▽ More
Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a source distribution performs when applied to inference related to a target distribution. The definition of task similarity allows for natural definitions of adversarial and orthogonal distributions. We highlight limiting properties of representations induced by (universally) consistent decision rules and demonstrate in simulation that an empirical estimate of task similarity is a function of the decision rule deployed for inference. We demonstrate that for a given target distribution, both transfer efficiency and semantic similarity of candidate source distributions correlate with empirical task similarity.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
Authors:
Wenhao Yang,
Xiang Li,
Guangzeng Xie,
Zhihua Zhang
Abstract:
Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficientλof regularized term sufficiently small, we propose an adaptive reduction scheme for λ to approximate optimal policy of the original MDP. It is shown that the iteration complexity for obtaining anε-optimal policy could be reduced in compar…
▽ More
Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficientλof regularized term sufficiently small, we propose an adaptive reduction scheme for λ to approximate optimal policy of the original MDP. It is shown that the iteration complexity for obtaining anε-optimal policy could be reduced in comparison with setting sufficiently smallλ. In addition, there exists strong duality connection between the reduction method and solving the original MDP directly, from which we can derive more adaptive reduction method for certain algorithms.
△ Less
Submitted 31 October, 2020;
originally announced November 2020.
-
Incorporating Interpretable Output Constraints in Bayesian Neural Networks
Authors:
Wanqian Yang,
Lars Lorch,
Moritz A. Graule,
Himabindu Lakkaraju,
Finale Doshi-Velez
Abstract:
Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness. We introduce a novel probabilistic framework for reasoning with such constraints and formulate a prior that enables us to effectively incorporate them into Bayesian neural networks (BNNs), including a variant th…
▽ More
Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness. We introduce a novel probabilistic framework for reasoning with such constraints and formulate a prior that enables us to effectively incorporate them into Bayesian neural networks (BNNs), including a variant that can be amortized over tasks. The resulting Output-Constrained BNN (OC-BNN) is fully consistent with the Bayesian framework for uncertainty quantification and is amenable to black-box inference. Unlike typical BNN inference in uninterpretable parameter space, OC-BNNs widen the range of functional knowledge that can be incorporated, especially for model users without expertise in machine learning. We demonstrate the efficacy of OC-BNNs on real-world datasets, spanning multiple domains such as healthcare, criminal justice, and credit scoring.
△ Less
Submitted 6 January, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Interactive Steering of Hierarchical Clustering
Authors:
Weikai Yang,
Xiting Wang,
Jie Lu,
Wenwen Dou,
Shixia Liu
Abstract:
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wiki…
▽ More
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven), and 2) enabling the interactive steering of clustering through a visual interface (user-driven). Our method first maps each data item to the most relevant items in a knowledge base. An initial constraint tree is then extracted using the ant colony optimization algorithm. The algorithm balances the tree width and depth and covers the data items with high confidence. Given the constraint tree, the data items are hierarchically clustered using evolutionary Bayesian rose tree. To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies and interactively improve them. The quantitative evaluation and case study demonstrate that the proposed approach facilitates the building of customized clustering trees in an efficient and effective manner.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Multiple Network Embedding for Anomaly Detection in Time Series of Graphs
Authors:
Guodong Chen,
Jesús Arroyo,
Avanti Athreya,
Joshua Cape,
Joshua T. Vogelstein,
Youngser Park,
Chris White,
Jonathan Larson,
Weiwei Yang,
Carey E. Priebe
Abstract:
This paper considers the graph signal processing problem of anomaly detection in time series of graphs. We examine two related, complementary inference tasks: the detection of anomalous graphs within a time series, and the detection of temporally anomalous vertices. We approach these tasks via the adaptation of statistically principled methods for joint graph inference, specifically \emph{multiple…
▽ More
This paper considers the graph signal processing problem of anomaly detection in time series of graphs. We examine two related, complementary inference tasks: the detection of anomalous graphs within a time series, and the detection of temporally anomalous vertices. We approach these tasks via the adaptation of statistically principled methods for joint graph inference, specifically \emph{multiple adjacency spectral embedding} (MASE). We demonstrate that our method is effective for our inference tasks. Moreover, we assess the performance of our method in terms of the underlying nature of detectable anomalies. We further provide the theoretical justification for our method and insight into its use. Applied to the Enron communication graph, a large-scale commercial search engine time series of graphs, and a larval Drosophila connectome data, our approaches demonstrate their applicability and identify the anomalous vertices beyond just large degree change.
△ Less
Submitted 15 July, 2024; v1 submitted 23 August, 2020;
originally announced August 2020.