-
Causal Discovery from Sparse Time-Series Data Using Echo State Network
Authors:
Haonan Chen,
Bo Yuan Chang,
Mohamed A. Naiel,
Georges Younes,
Steven Wardell,
Stan Kleinikkink,
John S. Zelek
Abstract:
Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new…
▽ More
Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network, which is a type of reservoir computer (i.e., used for chaotic system modelling) for Causal discovery. We evaluate the performance of our proposed system against three other off-the-shelf causal discovery algorithms, namely, structural expectation-maximization, sub-sampled linear auto-regression absolute coefficients, and multivariate Granger Causality with vector auto-regressive using the Tennessee Eastman chemical dataset; we report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms, demonstrating the viability of our approach to discover causal relationships in a complex system with missing entries.
△ Less
Submitted 10 January, 2023; v1 submitted 9 January, 2022;
originally announced January 2022.
-
CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks
Authors:
Jiaqi Ma,
Bo Chang,
Xuefei Zhang,
Qiaozhu Mei
Abstract:
Graph-structured data are ubiquitous. However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the \textit{representational} and the \textit{correlational} roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of informatio…
▽ More
Graph-structured data are ubiquitous. However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the \textit{representational} and the \textit{correlational} roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of information. Conceptually, the representational information provides guidance for the model to construct better node features; while the correlational information indicates the correlation between node outcomes conditional on node features. Through a simulation study, we find that many popular GNN models are incapable of effectively utilizing the correlational information. By leveraging the idea of the copula, a principled way to describe the dependence among multivariate random variables, we offer a general solution. The proposed Copula Graph Neural Network (CopulaGNN) can take a wide range of GNN models as base models and utilize both representational and correlational information stored in the graphs. Experimental results on two types of regression tasks verify the effectiveness of the proposed method.
△ Less
Submitted 18 March, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
Authors:
Ruizhi Deng,
Bo Chang,
Marcus A. Brubaker,
Greg Mori,
Andreas Lehrmann
Abstract:
Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing proper…
▽ More
Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.
△ Less
Submitted 13 July, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Variational Hyper RNN for Sequence Modeling
Authors:
Ruizhi Deng,
Yanshuai Cao,
Bo Chang,
Leonid Sigal,
Greg Mori,
Marcus A. Brubaker
Abstract:
In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. T…
▽ More
In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data that exhibit large scale variations, regime shifts, and complex dynamics.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Point Process Flows
Authors:
Nazanin Mehrasa,
Ruizhi Deng,
Mohamed Osama Ahmed,
Bo Chang,
Jiawei He,
Thibaut Durand,
Marcus Brubaker,
Greg Mori
Abstract:
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-th…
▽ More
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences.
△ Less
Submitted 22 December, 2019; v1 submitted 18 October, 2019;
originally announced October 2019.
-
AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks
Authors:
Bo Chang,
Minmin Chen,
Eldad Haber,
Ed H. Chi
Abstract:
Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical f…
▽ More
Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs
Authors:
Dar Gilboa,
Bo Chang,
Minmin Chen,
Greg Yang,
Samuel S. Schoenholz,
Ed H. Chi,
Jeffrey Pennington
Abstract:
Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and…
▽ More
Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trained on very long sequences. In this work, we develop a mean field theory of signal propagation in LSTMs and GRUs that enables us to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians. By optimizing these quantities in terms of the initialization hyperparameters, we derive a novel initialization scheme that eliminates or reduces training instabilities. We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower. We also observe a beneficial effect on generalization performance using this new initialization.
△ Less
Submitted 23 May, 2019; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Interpretable Spatio-temporal Attention for Video Action Recognition
Authors:
Lili Meng,
Bo Zhao,
Bo Chang,
Gao Huang,
Wei Sun,
Frederich Tung,
Leonid Sigal
Abstract:
Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we emplo…
▽ More
Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers to ensure that our attention mechanism attends to coherent regions in space and time. Our model not only improves video action recognition accuracy, but also localizes discriminative regions both spatially and temporally, despite being trained in a weakly-supervised manner with only classification labels (no bounding box labels or time frame temporal labels). We evaluate our approach on several public video action recognition datasets with ablation studies. Furthermore, we quantitatively and qualitatively evaluate our model's ability to localize discriminative regions spatially and critical frames temporally. Experimental results demonstrate the efficacy of our approach, showing superior or comparable accuracy with the state-of-the-art methods while increasing model interpretability.
△ Less
Submitted 2 June, 2019; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Prediction based on conditional distributions of vine copulas
Authors:
Bo Chang,
Harry Joe
Abstract:
Vine copulas are a flexible tool for multivariate non-Gaussian distributions. For data from an observational study where the explanatory variables and response variables are measured together, a proposed vine copula regression method uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable give…
▽ More
Vine copulas are a flexible tool for multivariate non-Gaussian distributions. For data from an observational study where the explanatory variables and response variables are measured together, a proposed vine copula regression method uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables. The performance of the proposed method is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula regression method is superior to linear regression in making inferences with conditional heteroscedasticity.
△ Less
Submitted 29 April, 2019; v1 submitted 23 July, 2018;
originally announced July 2018.
-
Automatically Composing Representation Transformations as a Means for Generalization
Authors:
Michael B. Chang,
Abhishek Gupta,
Sergey Levine,
Thomas L. Griffiths
Abstract:
A generally intelligent learner should generalize to more complex tasks than it has previously encountered, but the two common paradigms in machine learning -- either training a separate learner per task or training a single learner for all tasks -- both have difficulty with such generalization because they do not leverage the compositional structure of the task distribution. This paper introduces…
▽ More
A generally intelligent learner should generalize to more complex tasks than it has previously encountered, but the two common paradigms in machine learning -- either training a separate learner per task or training a single learner for all tasks -- both have difficulty with such generalization because they do not leverage the compositional structure of the task distribution. This paper introduces the compositional problem graph as a broadly applicable formalism to relate tasks of different complexity in terms of problems with shared subproblems. We propose the compositional generalization problem for measuring how readily old knowledge can be reused and hence built upon. As a first step for tackling compositional generalization, we introduce the compositional recursive learner, a domain-general framework for learning algorithmic procedures for composing representation transformations, producing a learner that reasons about what computation to execute by making analogies to previously seen problems. We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not.
△ Less
Submitted 8 May, 2019; v1 submitted 12 July, 2018;
originally announced July 2018.
-
Multi-level Residual Networks from Dynamical Systems View
Authors:
Bo Chang,
Lili Meng,
Eldad Haber,
Frederick Tung,
David Begert
Abstract:
Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully understood. Recently, several points of view have emerged to try to interpret ResNet theoretically, such as unraveled view, unrolled iterative estimation and dynamic…
▽ More
Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully understood. Recently, several points of view have emerged to try to interpret ResNet theoretically, such as unraveled view, unrolled iterative estimation and dynamical systems view. In this paper, we adopt the dynamical systems point of view, and analyze the lesioning properties of ResNet both theoretically and experimentally. Based on these analyses, we additionally propose a novel method for accelerating ResNet training. We apply the proposed method to train ResNets and Wide ResNets for three image classification benchmarks, reducing training time by more than 40% with superior or on-par accuracy.
△ Less
Submitted 1 February, 2018; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Reversible Architectures for Arbitrarily Deep Residual Neural Networks
Authors:
Bo Chang,
Lili Meng,
Eldad Haber,
Lars Ruthotto,
David Begert,
Elliot Holtham
Abstract:
Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. F…
▽ More
Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data.
△ Less
Submitted 18 November, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.
-
wtest: an R Package for Testing Main and Interaction Effect in Genotype Data with Binary Traits
Authors:
Rui Sun,
Billy Chang,
Benny Chung-Ying Zee,
Maggie Haitian Wang
Abstract:
This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. A…
▽ More
This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. Availability: The package is available in CRAN, or from website: http://www2.ccrb.cuhk.edu.hk/wtest
△ Less
Submitted 11 October, 2016;
originally announced October 2016.