Search | arXiv e-print repository

Causal Discovery from Sparse Time-Series Data Using Echo State Network

Authors: Haonan Chen, Bo Yuan Chang, Mohamed A. Naiel, Georges Younes, Steven Wardell, Stan Kleinikkink, John S. Zelek

Abstract: Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new… ▽ More Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network, which is a type of reservoir computer (i.e., used for chaotic system modelling) for Causal discovery. We evaluate the performance of our proposed system against three other off-the-shelf causal discovery algorithms, namely, structural expectation-maximization, sub-sampled linear auto-regression absolute coefficients, and multivariate Granger Causality with vector auto-regressive using the Tennessee Eastman chemical dataset; we report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms, demonstrating the viability of our approach to discover causal relationships in a complex system with missing entries. △ Less

Submitted 10 January, 2023; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2010.02089 [pdf, other]

CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks

Authors: Jiaqi Ma, Bo Chang, Xuefei Zhang, Qiaozhu Mei

Abstract: Graph-structured data are ubiquitous. However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the \textit{representational} and the \textit{correlational} roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of informatio… ▽ More Graph-structured data are ubiquitous. However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the \textit{representational} and the \textit{correlational} roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of information. Conceptually, the representational information provides guidance for the model to construct better node features; while the correlational information indicates the correlation between node outcomes conditional on node features. Through a simulation study, we find that many popular GNN models are incapable of effectively utilizing the correlational information. By leveraging the idea of the copula, a principled way to describe the dependence among multivariate random variables, we offer a general solution. The proposed Copula Graph Neural Network (CopulaGNN) can take a wide range of GNN models as base models and utilize both representational and correlational information stored in the graphs. Experimental results on two types of regression tasks verify the effectiveness of the proposed method. △ Less

Submitted 18 March, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: ICLR 2021

arXiv:2002.10516 [pdf, other]

Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows

Authors: Ruizhi Deng, Bo Chang, Marcus A. Brubaker, Greg Mori, Andreas Lehrmann

Abstract: Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing proper… ▽ More Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data. △ Less

Submitted 13 July, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2002.10501 [pdf, other]

Variational Hyper RNN for Sequence Modeling

Authors: Ruizhi Deng, Yanshuai Cao, Bo Chang, Leonid Sigal, Greg Mori, Marcus A. Brubaker

Abstract: In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. T… ▽ More In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data that exhibit large scale variations, regime shifts, and complex dynamics. △ Less

Submitted 24 February, 2020; originally announced February 2020.

arXiv:1910.08281 [pdf, other]

Point Process Flows

Authors: Nazanin Mehrasa, Ruizhi Deng, Mohamed Osama Ahmed, Bo Chang, Jiawei He, Thibaut Durand, Marcus Brubaker, Greg Mori

Abstract: Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-th… ▽ More Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences. △ Less

Submitted 22 December, 2019; v1 submitted 18 October, 2019; originally announced October 2019.

arXiv:1902.09689 [pdf, other]

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

Authors: Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi

Abstract: Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical f… ▽ More Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: Published as a conference paper at ICLR 2019

arXiv:1901.08987 [pdf, other]

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Authors: Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

Abstract: Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and… ▽ More Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trained on very long sequences. In this work, we develop a mean field theory of signal propagation in LSTMs and GRUs that enables us to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians. By optimizing these quantities in terms of the initialization hyperparameters, we derive a novel initialization scheme that eliminates or reduces training instabilities. We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower. We also observe a beneficial effect on generalization performance using this new initialization. △ Less

Submitted 23 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

arXiv:1810.04511 [pdf, other]

Interpretable Spatio-temporal Attention for Video Action Recognition

Authors: Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal

Abstract: Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we emplo… ▽ More Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers to ensure that our attention mechanism attends to coherent regions in space and time. Our model not only improves video action recognition accuracy, but also localizes discriminative regions both spatially and temporally, despite being trained in a weakly-supervised manner with only classification labels (no bounding box labels or time frame temporal labels). We evaluate our approach on several public video action recognition datasets with ablation studies. Furthermore, we quantitatively and qualitatively evaluate our model's ability to localize discriminative regions spatially and critical frames temporally. Experimental results demonstrate the efficacy of our approach, showing superior or comparable accuracy with the state-of-the-art methods while increasing model interpretability. △ Less

Submitted 2 June, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

arXiv:1807.08429 [pdf, other]

doi 10.1016/j.csda.2019.04.015

Prediction based on conditional distributions of vine copulas

Authors: Bo Chang, Harry Joe

Abstract: Vine copulas are a flexible tool for multivariate non-Gaussian distributions. For data from an observational study where the explanatory variables and response variables are measured together, a proposed vine copula regression method uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable give… ▽ More Vine copulas are a flexible tool for multivariate non-Gaussian distributions. For data from an observational study where the explanatory variables and response variables are measured together, a proposed vine copula regression method uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables. The performance of the proposed method is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula regression method is superior to linear regression in making inferences with conditional heteroscedasticity. △ Less

Submitted 29 April, 2019; v1 submitted 23 July, 2018; originally announced July 2018.

arXiv:1807.04640 [pdf, other]

Automatically Composing Representation Transformations as a Means for Generalization

Authors: Michael B. Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths

Abstract: A generally intelligent learner should generalize to more complex tasks than it has previously encountered, but the two common paradigms in machine learning -- either training a separate learner per task or training a single learner for all tasks -- both have difficulty with such generalization because they do not leverage the compositional structure of the task distribution. This paper introduces… ▽ More A generally intelligent learner should generalize to more complex tasks than it has previously encountered, but the two common paradigms in machine learning -- either training a separate learner per task or training a single learner for all tasks -- both have difficulty with such generalization because they do not leverage the compositional structure of the task distribution. This paper introduces the compositional problem graph as a broadly applicable formalism to relate tasks of different complexity in terms of problems with shared subproblems. We propose the compositional generalization problem for measuring how readily old knowledge can be reused and hence built upon. As a first step for tackling compositional generalization, we introduce the compositional recursive learner, a domain-general framework for learning algorithmic procedures for composing representation transformations, producing a learner that reasons about what computation to execute by making analogies to previously seen problems. We show on a symbolic and a high-dimensional domain that our compositional approach can generalize to more complex problems than the learner has previously encountered, whereas baselines that are not explicitly compositional do not. △ Less

Submitted 8 May, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

Comments: Accepted to the International Conference on Learning Representations 2019

arXiv:1710.10348 [pdf, other]

Multi-level Residual Networks from Dynamical Systems View

Authors: Bo Chang, Lili Meng, Eldad Haber, Frederick Tung, David Begert

Abstract: Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully understood. Recently, several points of view have emerged to try to interpret ResNet theoretically, such as unraveled view, unrolled iterative estimation and dynamic… ▽ More Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully understood. Recently, several points of view have emerged to try to interpret ResNet theoretically, such as unraveled view, unrolled iterative estimation and dynamical systems view. In this paper, we adopt the dynamical systems point of view, and analyze the lesioning properties of ResNet both theoretically and experimentally. Based on these analyses, we additionally propose a novel method for accelerating ResNet training. We apply the proposed method to train ResNets and Wide ResNets for three image classification benchmarks, reducing training time by more than 40% with superior or on-par accuracy. △ Less

Submitted 1 February, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

Comments: Published as a conference paper at ICLR 2018

arXiv:1709.03698 [pdf, other]

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Authors: Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, Elliot Holtham

Abstract: Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. F… ▽ More Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data. △ Less

Submitted 18 November, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

Comments: Accepted at AAAI 2018

arXiv:1610.03182 [pdf]

wtest: an R Package for Testing Main and Interaction Effect in Genotype Data with Binary Traits

Authors: Rui Sun, Billy Chang, Benny Chung-Ying Zee, Maggie Haitian Wang

Abstract: This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. A… ▽ More This R package evaluates main and pair-wise interaction effect of single nucleotide polymorphisms (SNPs) via the W-test, scalable to whole genome-wide data sets. The package provides fast and accurate p-value estimation of genetic markers, as well as diagnostic checking on the probability distributions. It allows flexible stage-wise or exhaustive association testing in a user-friendly interface. Availability: The package is available in CRAN, or from website: http://www2.ccrb.cuhk.edu.hk/wtest △ Less

Submitted 11 October, 2016; originally announced October 2016.

Comments: 7 pages, 1 figure

Showing 1–13 of 13 results for author: Chang, B