-
Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery
Authors:
ChengAo Shen,
Zhengzhang Chen,
Dongsheng Luo,
Dongkuan Xu,
Haifeng Chen,
Jingchao Ni
Abstract:
Causal discovery is an imperative foundation for decision-making across domains, such as smart health, AI for drug discovery and AIOps. Traditional statistical causal discovery methods, while well-established, predominantly rely on observational data and often overlook the semantic cues inherent in cause-and-effect relationships. The advent of Large Language Models (LLMs) has ushered in an afforda…
▽ More
Causal discovery is an imperative foundation for decision-making across domains, such as smart health, AI for drug discovery and AIOps. Traditional statistical causal discovery methods, while well-established, predominantly rely on observational data and often overlook the semantic cues inherent in cause-and-effect relationships. The advent of Large Language Models (LLMs) has ushered in an affordable way of leveraging the semantic cues for knowledge-driven causal discovery, but the development of LLMs for causal discovery lags behind other areas, particularly in the exploration of multi-modal data. To bridge the gap, we introduce MATMCD, a multi-agent system powered by tool-augmented LLMs. MATMCD has two key agents: a Data Augmentation agent that retrieves and processes modality-augmented data, and a Causal Constraint agent that integrates multi-modal data for knowledge-driven reasoning. The proposed design of the inner-workings ensures successful cooperation of the agents. Our empirical study across seven datasets suggests the significant potential of multi-modality enhanced causal discovery.
△ Less
Submitted 31 May, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Gaussian mixture Taylor approximations of risk measures constrained by PDEs with Gaussian random field inputs
Authors:
Dingcheng Luo,
Joshua Chen,
Peng Chen,
Omar Ghattas
Abstract:
This work considers the computation of risk measures for quantities of interest governed by PDEs with Gaussian random field parameters using Taylor approximations. While efficient, Taylor approximations are local to the point of expansion, and hence may degrade in accuracy when the variances of the input parameters are large. To address this challenge, we approximate the underlying Gaussian measur…
▽ More
This work considers the computation of risk measures for quantities of interest governed by PDEs with Gaussian random field parameters using Taylor approximations. While efficient, Taylor approximations are local to the point of expansion, and hence may degrade in accuracy when the variances of the input parameters are large. To address this challenge, we approximate the underlying Gaussian measure by a mixture of Gaussians with reduced variance in a dominant direction of parameter space. Taylor approximations are constructed at the means of each Gaussian mixture component, which are then combined to approximate the risk measures. The formulation is presented in the setting of infinite-dimensional Gaussian random parameters for risk measures including the mean, variance, and conditional value-at-risk. We also provide detailed analysis of the approximations errors arising from two sources: the Gaussian mixture approximation and the Taylor approximations. Numerical experiments are conducted for a semilinear advection-diffusion-reaction equation with a random diffusion coefficient field and for the Helmholtz equation with a random wave speed field. For these examples, the proposed approximation strategy can achieve less than $1\%$ relative error in estimating CVaR with only $\mathcal{O}(10)$ state PDE solves, which is comparable to a standard Monte Carlo estimate with $\mathcal{O}(10^4)$ samples, thus achieving significant reduction in computational cost. The proposed method can therefore serve as a way to rapidly and accurately estimate risk measures under limited computational budgets.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Hierarchically Coherent Multivariate Mixture Networks
Authors:
Kin G. Olivares,
David Luo,
Cristian Challu,
Stefania La Vattiata,
Max Mergenthaler,
Artur Dubrawski
Abstract:
Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical groupings. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize…
▽ More
Large collections of time series data are often organized into hierarchies with different levels of aggregation; examples include product and geographical groupings. Probabilistic coherent forecasting is tasked to produce forecasts consistent across levels of aggregation. In this study, we propose to augment neural forecasting architectures with a coherent multivariate mixture output. We optimize the networks with a composite likelihood objective, allowing us to capture time series' relationships while maintaining high computational efficiency. Our approach demonstrates 13.2% average accuracy improvements on most datasets compared to state-of-the-art baselines. We conduct ablation studies of the framework components and provide theoretical foundations for them. To assist related work, the code is available at this https://github.com/Nixtla/neuralforecast.
△ Less
Submitted 16 October, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation
Authors:
Hengjie Yu,
Dan Luo,
Sam F. Y. Li,
Maozhen Qu,
Da Liu,
Yingchao He,
Fang Cheng
Abstract:
Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI)…
▽ More
Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI) by 13.9% and 12.6% under salinity stress and combined heat-drought stress, respectively. Metabolomics data reveals that ZnO nanopriming treatment, with the highest SRI value, mainly regulates the pathways of amino acid metabolism, secondary metabolite synthesis, carbohydrate metabolism, and translation. Understanding the mechanism of seed nanopriming is still difficult due to the variety of nanomaterials and the complexity of interactions between nanomaterials and plants. Using the nanopriming data, we present an interpretable structure-activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding its stress mitigation effects. The post hoc and model-based interpretation approaches of machine learning are combined to provide complementary benefits and give researchers or policymakers more illuminating or trustworthy results. The concentration, size, and zeta potential of nanoparticles are identified as dominant factors for correlating root dry weight under salinity stress, and their effects and interactions are explained. Additionally, a web-based interactive tool is developed for offering prediction-level interpretation and gathering more details about specific nanopriming treatments. This work offers a promising framework for accelerating the agricultural applications of nanomaterials and may profoundly contribute to nanosafety assessment.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Bounding the FDP in competition-based control of the FDR
Authors:
Arya Ebadi,
Dong Luo,
Jack Freestone,
William Stafford Noble,
Uri Keich
Abstract:
Competition-based approach to controlling the false discovery rate (FDR) recently rose to prominence when, generalizing it to sequential hypothesis testing, Barber and Candès used it as part of their knockoff-filter. Control of the FDR implies that the, arguably more important, false discovery proportion is only controlled in an average sense. We present TDC-SB and TDC-UB that provide upper predic…
▽ More
Competition-based approach to controlling the false discovery rate (FDR) recently rose to prominence when, generalizing it to sequential hypothesis testing, Barber and Candès used it as part of their knockoff-filter. Control of the FDR implies that the, arguably more important, false discovery proportion is only controlled in an average sense. We present TDC-SB and TDC-UB that provide upper prediction bounds on the FDP in the list of discoveries generated when controlling the FDR using competition. Using simulated and real data we show that, overall, our new procedures offer significantly tighter upper bounds than ones obtained using the recently published approach of Katsevich and Ramdas, even when the latter is further improved using the interpolation concept of Goeman et al.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
HierarchicalForecast: A Reference Framework for Hierarchical Forecasting in Python
Authors:
Kin G. Olivares,
Azul Garza,
David Luo,
Cristian Challú,
Max Mergenthaler,
Souhaib Ben Taieb,
Shanika L. Wickramasuriya,
Artur Dubrawski
Abstract:
Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical groupings. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting…
▽ More
Large collections of time series data are commonly organized into structures with different levels of aggregation; examples include product and geographical groupings. It is often important to ensure that the forecasts are coherent so that the predicted values at disaggregate levels add up to the aggregate forecast. The growing interest of the Machine Learning community in hierarchical forecasting systems indicates that we are in a propitious moment to ensure that scientific endeavors are grounded on sound baselines. For this reason, we put forward the HierarchicalForecast library, which contains preprocessed publicly available datasets, evaluation metrics, and a compiled set of statistical baseline models. Our Python-based reference framework aims to bridge the gap between statistical and econometric modeling, and Machine Learning forecasting research. Code and documentation are available in https://github.com/Nixtla/hierarchicalforecast.
△ Less
Submitted 10 October, 2024; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Robust Imitation Learning from Corrupted Demonstrations
Authors:
Liu Liu,
Ziyang Tang,
Lanqing Li,
Dijun Luo
Abstract:
We consider offline Imitation Learning from corrupted demonstrations where a constant fraction of data can be noise or even arbitrary outliers. Classical approaches such as Behavior Cloning assumes that demonstrations are collected by an presumably optimal expert, hence may fail drastically when learning from corrupted demonstrations. We propose a novel robust algorithm by minimizing a Median-of-M…
▽ More
We consider offline Imitation Learning from corrupted demonstrations where a constant fraction of data can be noise or even arbitrary outliers. Classical approaches such as Behavior Cloning assumes that demonstrations are collected by an presumably optimal expert, hence may fail drastically when learning from corrupted demonstrations. We propose a novel robust algorithm by minimizing a Median-of-Means (MOM) objective which guarantees the accurate estimation of policy, even in the presence of constant fraction of outliers. Our theoretical analysis shows that our robust method in the corrupted setting enjoys nearly the same error scaling and sample complexity guarantees as the classical Behavior Cloning in the expert demonstration setting. Our experiments on continuous-control benchmarks validate that our method exhibits the predicted robustness and effectiveness, and achieves competitive results compared to existing imitation learning methods.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Learning Graphon Autoencoders for Generative Graph Modeling
Authors:
Hongteng Xu,
Peilin Zhao,
Junzhou Huang,
Dixin Luo
Abstract:
Graphon is a nonparametric model that generates graphs with arbitrary sizes and can be induced from graphs easily. Based on this model, we propose a novel algorithmic framework called \textit{graphon autoencoder} to build an interpretable and scalable graph generative model. This framework treats observed graphs as induced graphons in functional space and derives their latent representations by an…
▽ More
Graphon is a nonparametric model that generates graphs with arbitrary sizes and can be induced from graphs easily. Based on this model, we propose a novel algorithmic framework called \textit{graphon autoencoder} to build an interpretable and scalable graph generative model. This framework treats observed graphs as induced graphons in functional space and derives their latent representations by an encoder that aggregates Chebshev graphon filters. A linear graphon factorization model works as a decoder, leveraging the latent representations to reconstruct the induced graphons (and the corresponding observed graphs). We develop an efficient learning algorithm to learn the encoder and the decoder, minimizing the Wasserstein distance between the model and data distributions. This algorithm takes the KL divergence of the graph distributions conditioned on different graphons as the underlying distance and leads to a reward-augmented maximum likelihood estimation. The graphon autoencoder provides a new paradigm to represent and generate graphs, which has good generalizability and transferability.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
Hawkes Processes on Graphons
Authors:
Hongteng Xu,
Dixin Luo,
Hongyuan Zha
Abstract:
We propose a novel framework for modeling multiple multivariate point processes, each with heterogeneous event types that share an underlying space and obey the same generative mechanism. Focusing on Hawkes processes and their variants that are associated with Granger causality graphs, our model leverages an uncountable event type space and samples the graphs with different sizes from a nonparamet…
▽ More
We propose a novel framework for modeling multiple multivariate point processes, each with heterogeneous event types that share an underlying space and obey the same generative mechanism. Focusing on Hawkes processes and their variants that are associated with Granger causality graphs, our model leverages an uncountable event type space and samples the graphs with different sizes from a nonparametric model called {\it graphon}. Given those graphs, we can generate the corresponding Hawkes processes and simulate event sequences. Learning this graphon-based Hawkes process model helps to 1) infer the underlying relations shared by different Hawkes processes; and 2) simulate event sequences with different event types but similar dynamics. We learn the proposed model by minimizing the hierarchical optimal transport distance between the generated event sequences and the observed ones, leading to a novel reward-augmented maximum likelihood estimation method. We analyze the properties of our model in-depth and demonstrate its rationality and effectiveness in both theory and experiments.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Learning Graphons via Structured Gromov-Wasserstein Barycenters
Authors:
Hongteng Xu,
Dixin Luo,
Lawrence Carin,
Hongyuan Zha
Abstract:
We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their…
▽ More
We propose a novel and principled method to learn a nonparametric graph model called graphon, which is defined in an infinite-dimensional space and represents arbitrary-size graphs. Based on the weak regularity lemma from the theory of graphons, we leverage a step function to approximate a graphon. We show that the cut distance of graphons can be relaxed to the Gromov-Wasserstein distance of their step functions. Accordingly, given a set of graphs generated by an underlying graphon, we learn the corresponding step function as the Gromov-Wasserstein barycenter of the given graphs. Furthermore, we develop several enhancements and extensions of the basic algorithm, $e.g.$, the smoothed Gromov-Wasserstein barycenter for guaranteeing the continuity of the learned graphons and the mixed Gromov-Wasserstein barycenters for learning multiple structured graphons. The proposed approach overcomes drawbacks of prior state-of-the-art methods, and outperforms them on both synthetic and real-world data. The code is available at https://github.com/HongtengXu/SGWB-Graphon.
△ Less
Submitted 17 December, 2020; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Competition-based control of the false discovery proportion
Authors:
Dong Luo,
Arya Ebadi,
Yilun He,
Kristen Emery,
William Stafford Noble,
Uri Keich
Abstract:
Recently, Barber and Candès laid the theoretical foundation for a general framework for false discovery rate (FDR) control based on the notion of "knockoffs." A closely related FDR control methodology has long been employed in the analysis of mass spectrometry data, referred to there as "target-decoy competition" (TDC). However, any approach that aims to control the FDR, which is defined as the ex…
▽ More
Recently, Barber and Candès laid the theoretical foundation for a general framework for false discovery rate (FDR) control based on the notion of "knockoffs." A closely related FDR control methodology has long been employed in the analysis of mass spectrometry data, referred to there as "target-decoy competition" (TDC). However, any approach that aims to control the FDR, which is defined as the expected value of the false discovery proportion (FDP), suffers from a problem. Specifically, even when successfully controlling the FDR at level $α$, the FDP in the list of discoveries can significantly exceed $α$. We offer FDP-SD, a new procedure that rigorously controls the FDP in the competition (knockoff / TDC) setup by guaranteeing that the FDP is bounded by $α$ at any desired confidence level. Compared with the just-published general framework of Katsevich and Ramdas, FDP-SD generally delivers more power and often substantially so in simulated as well as real data.
△ Less
Submitted 14 March, 2022; v1 submitted 24 November, 2020;
originally announced November 2020.
-
Detecting the skewness of data from the five-number summary and its application in meta-analysis
Authors:
Jiandong Shi,
Dehui Luo,
Xiang Wan,
Yue Liu,
Jiming Liu,
Zhaoxiang Bian,
Tiejun Tong
Abstract:
For clinical studies with continuous outcomes, when the data are potentially skewed, researchers may choose to report the whole or part of the five-number summary (the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. In the recent literature, it is often suggested to transform the five-number summary back to the s…
▽ More
For clinical studies with continuous outcomes, when the data are potentially skewed, researchers may choose to report the whole or part of the five-number summary (the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. In the recent literature, it is often suggested to transform the five-number summary back to the sample mean and standard deviation, which can be subsequently used in a meta-analysis. However, if a study contains skewed data, this transformation and hence the conclusions from the meta-analysis are unreliable. Therefore, we introduce a novel method for detecting the skewness of data using only the five-number summary and the sample size, and meanwhile propose a new flow chart to handle the skewed studies in a different manner. We further show by simulations that our skewness tests are able to control the type I error rates and provide good statistical power, followed by a simulated meta-analysis and a real data example that illustrate the usefulness of our new method in meta-analysis and evidence-based medicine.
△ Less
Submitted 5 May, 2023; v1 submitted 12 October, 2020;
originally announced October 2020.
-
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
Authors:
Lanqing Li,
Rui Yang,
Dijun Luo
Abstract:
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks without any interactions with the environments, making RL truly practical in many real-world applications. This problem is still not fully understood, for which two major challenges need to be addressed. First, offline RL usually suffers…
▽ More
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks without any interactions with the environments, making RL truly practical in many real-world applications. This problem is still not fully understood, for which two major challenges need to be addressed. First, offline RL usually suffers from bootstrapping errors of out-of-distribution state-actions which leads to divergence of value functions. Second, meta-RL requires efficient and robust task inference learned jointly with control policy. In this work, we enforce behavior regularization on learned policy as a general approach to offline RL, combined with a deterministic context encoder for efficient task inference. We propose a novel negative-power distance metric on bounded context embedding space, whose gradients propagation is detached from the Bellman backup. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches involving meta-RL and distance metric learning. To the best of our knowledge, our method is the first model-free and end-to-end OMRL algorithm, which is computationally efficient and demonstrated to outperform prior algorithms on several meta-RL benchmarks.
△ Less
Submitted 6 May, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Testing error distribution by kernelized Stein discrepancy in multivariate time series models
Authors:
Donghang Luo,
Ke Zhu,
Huan Gong,
Dong Li
Abstract:
Knowing the error distribution is important in many multivariate time series applications. To alleviate the risk of error distribution mis-specification, testing methodologies are needed to detect whether the chosen error distribution is correct. However, the majority of the existing tests only deal with the multivariate normal distribution for some special multivariate time series models, and the…
▽ More
Knowing the error distribution is important in many multivariate time series applications. To alleviate the risk of error distribution mis-specification, testing methodologies are needed to detect whether the chosen error distribution is correct. However, the majority of the existing tests only deal with the multivariate normal distribution for some special multivariate time series models, and they thus can not be used to testing for the often observed heavy-tailed and skewed error distributions in applications. In this paper, we construct a new consistent test for general multivariate time series models, based on the kernelized Stein discrepancy. To account for the estimation uncertainty and unobserved initial values, a bootstrap method is provided to calculate the critical values. Our new test is easy-to-implement for a large scope of multivariate error distributions, and its importance is illustrated by simulated and real data.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Hierarchical Optimal Transport for Robust Multi-View Learning
Authors:
Dixin Luo,
Hongteng Xu,
Lawrence Carin
Abstract:
Traditional multi-view learning methods often rely on two assumptions: ($i$) the samples in different views are well-aligned, and ($ii$) their representations in latent space obey the same distribution. Unfortunately, these two assumptions may be questionable in practice, which limits the application of multi-view learning. In this work, we propose a hierarchical optimal transport (HOT) method to…
▽ More
Traditional multi-view learning methods often rely on two assumptions: ($i$) the samples in different views are well-aligned, and ($ii$) their representations in latent space obey the same distribution. Unfortunately, these two assumptions may be questionable in practice, which limits the application of multi-view learning. In this work, we propose a hierarchical optimal transport (HOT) method to mitigate the dependency on these two assumptions. Given unaligned multi-view data, the HOT method penalizes the sliced Wasserstein distance between the distributions of different views. These sliced Wasserstein distances are used as the ground distance to calculate the entropic optimal transport across different views, which explicitly indicates the clustering structure of the views. The HOT method is applicable to both unsupervised and semi-supervised learning, and experimental results show that it performs robustly on both synthetic and real-world tasks.
△ Less
Submitted 8 June, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Optimally estimating the sample standard deviation from the five-number summary
Authors:
Jiandong Shi,
Dehui Luo,
Hong Weng,
Xian-Tao Zeng,
Lu Lin,
Haitao Chu,
Tiejun Tong
Abstract:
When reporting the results of clinical studies, some researchers may choose the five-number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation, particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the s…
▽ More
When reporting the results of clinical studies, some researchers may choose the five-number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation, particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the sample mean and standard deviation. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this paper, we propose to further advance the literature by developing a smoothly weighted estimator for the sample standard deviation that fully utilizes the sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the sample standard deviation. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal sample mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.
△ Less
Submitted 17 June, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Learning Autoencoders with Relational Regularization
Authors:
Hongteng Xu,
Dixin Luo,
Ricardo Henao,
Svati Shah,
Lawrence Carin
Abstract:
A new algorithmic framework is proposed for learning autoencoders of data distributions. We minimize the discrepancy between the model and target distributions, with a \emph{relational regularization} on the learnable latent prior. This regularization penalizes the fused Gromov-Wasserstein (FGW) distance between the latent prior and its corresponding posterior, allowing one to flexibly learn a str…
▽ More
A new algorithmic framework is proposed for learning autoencoders of data distributions. We minimize the discrepancy between the model and target distributions, with a \emph{relational regularization} on the learnable latent prior. This regularization penalizes the fused Gromov-Wasserstein (FGW) distance between the latent prior and its corresponding posterior, allowing one to flexibly learn a structured prior distribution associated with the generative model. Moreover, it helps co-training of multiple autoencoders even if they have heterogeneous architectures and incomparable latent spaces. We implement the framework with two scalable algorithms, making it applicable for both probabilistic and deterministic autoencoders. Our relational regularized autoencoder (RAE) outperforms existing methods, $e.g.$, the variational autoencoder, Wasserstein autoencoder, and their variants, on generating images. Additionally, our relational co-training strategy for autoencoders achieves encouraging results in both synthesis and real-world multi-view learning tasks. The code is at https://github.com/HongtengXu/ Relational-AutoEncoders.
△ Less
Submitted 25 June, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Fused Gromov-Wasserstein Alignment for Hawkes Processes
Authors:
Dixin Luo,
Hongteng Xu,
Lawrence Carin
Abstract:
We propose a novel fused Gromov-Wasserstein alignment method to jointly learn the Hawkes processes in different event spaces, and align their event types. Given two Hawkes processes, we use fused Gromov-Wasserstein discrepancy to measure their dissimilarity, which considers both the Wasserstein discrepancy based on their base intensities and the Gromov-Wasserstein discrepancy based on their infect…
▽ More
We propose a novel fused Gromov-Wasserstein alignment method to jointly learn the Hawkes processes in different event spaces, and align their event types. Given two Hawkes processes, we use fused Gromov-Wasserstein discrepancy to measure their dissimilarity, which considers both the Wasserstein discrepancy based on their base intensities and the Gromov-Wasserstein discrepancy based on their infectivity matrices. Accordingly, the learned optimal transport reflects the correspondence between the event types of these two Hawkes processes. The Hawkes processes and their optimal transport are learned jointly via maximum likelihood estimation, with a fused Gromov-Wasserstein regularizer. Experimental results show that the proposed method works well on synthetic and real-world data.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Adversarial Self-Paced Learning for Mixture Models of Hawkes Processes
Authors:
Dixin Luo,
Hongteng Xu,
Lawrence Carin
Abstract:
We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event sequences drawn from different Hawkes processes, the proposed method learns the target model iteratively, which generates "easy" sequences and uses…
▽ More
We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event sequences drawn from different Hawkes processes, the proposed method learns the target model iteratively, which generates "easy" sequences and uses them in an adversarial and self-paced manner. In each iteration, we first generate a set of augmented sequences from original observed sequences. Based on the fact that an easy sample of the target model can be an adversarial sample of a misspecified model, we apply a maximum likelihood estimation with an adversarial self-paced mechanism. In this manner the target model is updated, and the augmented sequences that obey it are employed for the next learning iteration. Experimental results show that the proposed method outperforms traditional methods consistently.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Interpretable ICD Code Embeddings with Self- and Mutual-Attention Mechanisms
Authors:
Dixin Luo,
Hongteng Xu,
Lawrence Carin
Abstract:
We propose a novel and interpretable embedding method to represent the international statistical classification codes of diseases and related health problems (i.e., ICD codes). This method considers a self-attention mechanism within the disease domain and a mutual-attention mechanism jointly between diseases and procedures. This framework captures the clinical relationships between the disease cod…
▽ More
We propose a novel and interpretable embedding method to represent the international statistical classification codes of diseases and related health problems (i.e., ICD codes). This method considers a self-attention mechanism within the disease domain and a mutual-attention mechanism jointly between diseases and procedures. This framework captures the clinical relationships between the disease codes and procedures associated with hospital admissions, and it predicts procedures according to diagnosed diseases. A self-attention network is learned to fuse the embeddings of the diseases for each admission. The similarities between the fused disease embedding and the procedure embeddings indicate which procedure should potentially be recommended. Additionally, when learning the embeddings of the ICD codes, the optimal transport between the diseases and the procedures within each admission is calculated as a regularizer of the embeddings. The optimal transport provides a mutual-attention map between diseases and the procedures, which suppresses the ambiguity within their clinical relationships. The proposed method achieves clinically-interpretable embeddings of ICD codes, and outperforms state-of-the-art embedding methods in procedure recommendation.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching
Authors:
Hongteng Xu,
Dixin Luo,
Lawrence Carin
Abstract:
We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nod…
▽ More
We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric on graphs. Given two graphs, the optimal transport associated with their Gromov-Wasserstein discrepancy provides the correspondence between their nodes and achieves graph matching. When one of the graphs has isolated but self-connected nodes ($i.e.$, a disconnected graph), the optimal transport indicates the clustering structure of the other graph and achieves graph partitioning. Using this concept, we extend our method to multi-graph partitioning and matching by learning a Gromov-Wasserstein barycenter graph for multiple observed graphs; the barycenter graph plays the role of the disconnected graph, and since it is learned, so is the clustering. Our method combines a recursive $K$-partition mechanism with a regularized proximal gradient algorithm, whose time complexity is $\mathcal{O}(K(E+V)\log_K V)$ for graphs with $V$ nodes and $E$ edges. To our knowledge, our method is the first attempt to make Gromov-Wasserstein discrepancy applicable to large-scale graph analysis and unify graph partitioning and matching into the same framework. It outperforms state-of-the-art graph partitioning and matching methods, achieving a trade-off between accuracy and efficiency.
△ Less
Submitted 9 October, 2019; v1 submitted 18 May, 2019;
originally announced May 2019.
-
POP-CNN: Predicting Odor's Pleasantness with Convolutional Neural Network
Authors:
Danli Wu,
Yu Cheng,
Dehan Luo,
Kin-Yeung Wong,
Kevin Hung,
Zhijing Yang
Abstract:
Predicting odor's pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry. Classical algorithms for predicting odor's pleasantness generally use a manual feature extractor and an independent classifier. Manual designing a good feature extractor depend on expert knowledge and experience is the key to the accuracy of the a…
▽ More
Predicting odor's pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry. Classical algorithms for predicting odor's pleasantness generally use a manual feature extractor and an independent classifier. Manual designing a good feature extractor depend on expert knowledge and experience is the key to the accuracy of the algorithms. In order to circumvent this difficulty, we proposed a model for predicting odor's pleasantness by using convolutional neural network. In our model, the convolutional neural layers replace manual feature extractor and show better performance. The experiments show that the correlation between our model and human is over 90% on pleasantness rating. And our model has 99.9% accuracy in distinguishing between absolutely pleasant or unpleasant odors.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Gromov-Wasserstein Learning for Graph Matching and Node Embedding
Authors:
Hongteng Xu,
Dixin Luo,
Hongyuan Zha,
Lawrence Carin
Abstract:
A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the…
▽ More
A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their correspondence, according to the learned optimal transport. The node embeddings associated with the two graphs are learned under the guidance of the optimal transport, the distance of which not only reflects the topological structure of each graph but also yields the correspondence across the graphs. These two learning steps are mutually-beneficial, and are unified here by minimizing the Gromov-Wasserstein discrepancy with structural regularizers. This framework leads to an optimization problem that is solved by a proximal point method. We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.
△ Less
Submitted 7 May, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Testing normality using the summary statistics with application to meta-analysis
Authors:
Dehui Luo,
Xiang Wan,
Jiming Liu,
Tiejun Tong
Abstract:
As the most important tool to provide high-level evidence-based medicine, researchers can statistically summarize and combine data from multiple studies by conducting meta-analysis. In meta-analysis, mean differences are frequently used effect size measurements to deal with continuous data, such as the Cohen's d statistic and Hedges' g statistic values. To calculate the mean difference based effec…
▽ More
As the most important tool to provide high-level evidence-based medicine, researchers can statistically summarize and combine data from multiple studies by conducting meta-analysis. In meta-analysis, mean differences are frequently used effect size measurements to deal with continuous data, such as the Cohen's d statistic and Hedges' g statistic values. To calculate the mean difference based effect sizes, the sample mean and standard deviation are two essential summary measures. However, many of the clinical reports tend not to directly record the sample mean and standard deviation. Instead, the sample size, median, minimum and maximum values and/or the first and third quartiles are reported. As a result, researchers have to transform the reported information to the sample mean and standard deviation for further compute the effect size. Since most of the popular transformation methods were developed upon the normality assumption of the underlying data, it is necessary to perform a pre-test before transforming the summary statistics. In this article, we had introduced test statistics for three popular scenarios in meta-analysis. We suggests medical researchers to perform a normality test of the selected studies before using them to conduct further analysis. Moreover, we applied three different case studies to demonstrate the usage of the newly proposed test statistics. The real data case studies indicate that the new test statistics are easy to apply in practice and by following the recommended path to conduct the meta-analysis, researchers can obtain more reliable conclusions.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
How to estimate the sample mean and standard deviation from the five number summary?
Authors:
Jiandong Shi,
Dehui Luo,
Hong Weng,
Xian-Tao Zeng,
Lu Lin,
Tiejun Tong
Abstract:
In some clinical studies, researchers may report the five number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. To conduct meta-analysis for pooling studies, one needs to first estimate the sample mean and standard deviation from the five number summary. A number of studies have been propo…
▽ More
In some clinical studies, researchers may report the five number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation. To conduct meta-analysis for pooling studies, one needs to first estimate the sample mean and standard deviation from the five number summary. A number of studies have been proposed in the recent literature to solve this problem. However, none of the existing estimators for the standard deviation is satisfactory for practical use. After a brief review of the existing literature, we point out that Wan et al.'s method (BMC Med Res Methodol 14:135, 2014) has a serious limitation in estimating the standard deviation from the five number summary. To improve it, we propose a smoothly weighted estimator by incorporating the sample size information and derive the optimal weight for the new estimator. For ease of implementation, we also provide an approximation formula of the optimal weight and a shortcut formula for estimating the standard deviation from the five number summary. The performance of the proposed estimator is evaluated through two simulation studies. In comparison with Wan et al.'s estimator, our new estimator provides a more accurate estimate for normal data and performs favorably for non-normal data. In real data analysis, our new method is also able to provide a more accurate estimate of the true sample standard deviation than the existing method. In this paper, we propose an optimal estimator of the standard deviation from the five number summary. Together with the optimal mean estimator in Luo et al. (Stat Methods Med Res, in press, 2017), our new methods have improved the existing literature and will make a solid contribution to meta-analysis and evidence-based medicine.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
Benefits from Superposed Hawkes Processes
Authors:
Hongteng Xu,
Dixin Luo,
Xu Chen,
Lawrence Carin
Abstract:
The superposition of temporal point processes has been studied for many years, although the usefulness of such models for practical applications has not be fully developed. We investigate superposed Hawkes process as an important class of such models, with properties studied in the framework of least squares estimation. The superposition of Hawkes processes is demonstrated to be beneficial for tig…
▽ More
The superposition of temporal point processes has been studied for many years, although the usefulness of such models for practical applications has not be fully developed. We investigate superposed Hawkes process as an important class of such models, with properties studied in the framework of least squares estimation. The superposition of Hawkes processes is demonstrated to be beneficial for tightening the upper bound of excess risk under certain conditions, and we show the feasibility of the benefit in typical situations. The usefulness of superposed Hawkes processes is verified on synthetic data, and its potential to solve the cold-start problem of recommendation systems is demonstrated on real-world data.
△ Less
Submitted 14 February, 2018; v1 submitted 13 October, 2017;
originally announced October 2017.
-
Learning Hawkes Processes from Short Doubly-Censored Event Sequences
Authors:
Hongteng Xu,
Dixin Luo,
Hongyuan Zha
Abstract:
Many real-world applications require robust algorithms to learn point processes based on a type of incomplete data --- the so-called short doubly-censored (SDC) event sequences. We study this critical problem of quantitative asynchronous event sequence analysis under the framework of Hawkes processes by leveraging the idea of data synthesis. Given SDC event sequences observed in a variety of time…
▽ More
Many real-world applications require robust algorithms to learn point processes based on a type of incomplete data --- the so-called short doubly-censored (SDC) event sequences. We study this critical problem of quantitative asynchronous event sequence analysis under the framework of Hawkes processes by leveraging the idea of data synthesis. Given SDC event sequences observed in a variety of time intervals, we propose a sampling-stitching data synthesis method --- sampling predecessors and successors for each SDC event sequence from potential candidates and stitching them together to synthesize long training sequences. The rationality and the feasibility of our method are discussed in terms of arguments based on likelihood. Experiments on both synthetic and real-world data demonstrate that the proposed data synthesis method improves learning results indeed for both time-invariant and time-varying Hawkes processes.
△ Less
Submitted 7 June, 2017; v1 submitted 22 February, 2017;
originally announced February 2017.
-
Optimally estimating the sample mean from the sample size, median, mid-range and/or mid-quartile range
Authors:
Dehui Luo,
Xiang Wan,
Jiming Liu,
Tiejun Tong
Abstract:
The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimat…
▽ More
The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this paper, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its importance, is either ignored or used in a stepwise but somewhat arbitrary manner, e.g., the famous method proposed by Hozo et al. We solve this issue by incorporating the sample size in a smoothly changing weight in the estimators to reach the optimal estimation. Our proposed estimators not only improve the existing ones significantly but also share the same virtue of the simplicity. The real data application indicates that our proposed estimators are capable to serve as "rules of thumb" and will be widely applied in evidence-based medicine.
△ Less
Submitted 4 October, 2016; v1 submitted 21 May, 2015;
originally announced May 2015.
-
Poisson-type Multivariate Transfer Function Model Reveals Short-term Effects of Ambient Air Pollutants on Hospital Emergency room Visits for Cerebro-cardiovascular Diseases
Authors:
Menghui Li,
Dasheng Luo,
Chenghua Cao,
Xiaochuan Pan,
Qixin Wang
Abstract:
Laboratory experiments have shown that cardiovascular diseases are positively correlated to the concentration of ambient air pollutants, such as SO2, NO2, PM10, etc. It has also been repeatedly reported in many countries that increased concentration of ambient air pollutants leads to rise in hospital emergency room visitss for these diseases. These studies mainly adopt either regression analysis o…
▽ More
Laboratory experiments have shown that cardiovascular diseases are positively correlated to the concentration of ambient air pollutants, such as SO2, NO2, PM10, etc. It has also been repeatedly reported in many countries that increased concentration of ambient air pollutants leads to rise in hospital emergency room visitss for these diseases. These studies mainly adopt either regression analysis or preliminary models in time series analysis, while the multivariable transfer function model, a relatively newly developed model, has multiple advantages over the conventional linear regression on analyzing time series. This study attempts to quantify the association between concentrations ambient air pollutants and hospital emergency room visitss for cerebro-cardiovascular diseases in Beijing using a Poisson-type multivariate transfer function model. The results show that the RR values of SO2, NO2 and PM10 for a 50 g/m3 increase are 1.129, 1.092 and 1.069 respectively. The lags for the three pollutants are estimated to be 2 days, 1 day and 1 day respectively. Compared with the ambient pollutants, daily average temperature and relative humidity do not influence the daily count of hospital emergency room visits significantly.
△ Less
Submitted 7 August, 2013;
originally announced August 2013.