-
Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data
Authors:
Huawei Lin,
Jun Woo Chung,
Yingjie Lao,
Weijie Zhao
Abstract:
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and d…
▽ More
Gradient Boosting Decision Tree (GBDT) is one of the most popular machine learning models in various applications. However, in the traditional settings, all data should be simultaneously accessed in the training procedure: it does not allow to add or delete any data instances after training. In this paper, we propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To the best of our knowledge, this is the first work that considers an in-place unified incremental and decremental learning on GBDT. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. We theoretically show the relationship between the hyper-parameters of the proposed optimizations, which enables trading off accuracy and cost on incremental and decremental learning. The backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model using incremental and decremental learning, and the empirical results on public datasets confirm the effectiveness and efficiency of our proposed online learning framework and optimizations.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
A Sparse Bayesian Learning for Diagnosis of Nonstationary and Spatially Correlated Faults with Application to Multistation Assembly Systems
Authors:
Jihoon Chung,
Zhenyu Kong
Abstract:
Sensor technology developments provide a basis for effective fault diagnosis in manufacturing systems. However, the limited number of sensors due to physical constraints or undue costs hinders the accurate diagnosis in the actual process. In addition, time-varying operational conditions that generate nonstationary process faults and the correlation information in the process require to consider fo…
▽ More
Sensor technology developments provide a basis for effective fault diagnosis in manufacturing systems. However, the limited number of sensors due to physical constraints or undue costs hinders the accurate diagnosis in the actual process. In addition, time-varying operational conditions that generate nonstationary process faults and the correlation information in the process require to consider for accurate fault diagnosis in the manufacturing systems. This article proposes a novel fault diagnosis method: clustering spatially correlated sparse Bayesian learning (CSSBL), and explicitly demonstrates its applicability in a multistation assembly system that is vulnerable to the above challenges. Specifically, the method is based on a practical assumption that it will likely have a few process faults (sparse). In addition, the hierarchical structure of CSSBL has several parameterized prior distributions to address the above challenges. As posterior distributions of process faults do not have closed form, this paper derives approximate posterior distributions through Variational Bayes inference. The proposed method's efficacy is provided through numerical and real-world case studies utilizing an actual autobody assembly system. The generalizability of the proposed method allows the technique to be applied in fault diagnosis in other domains, including communication and healthcare systems.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Learning sources of variability from high-dimensional observational studies
Authors:
Eric W. Bridgeford,
Jaewon Chung,
Brian Gilbert,
Sambit Panda,
Adam Li,
Cencheng Shen,
Alexandra Badea,
Brian Caffo,
Joshua T. Vogelstein
Abstract:
Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estiman…
▽ More
Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.
△ Less
Submitted 28 November, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee
Authors:
Chenyang Li,
Jihoon Chung,
Mengnan Du,
Haimin Wang,
Xianlian Zhou,
Bo Shen
Abstract:
Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers…
▽ More
Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for model compression from a novel perspective of nonconvex optimization by designing an appropriate objective function. Then, we introduce NN-BCD, a block coordinate descent (BCD) algorithm to solve the nonconvex optimization. One advantage of our algorithm is that an efficient iteration scheme can be derived with closed-form, which is gradient-free. Therefore, our algorithm will not suffer from vanishing/exploding gradient problems. Furthermore, with the Kurdyka-Łojasiewicz (KŁ) property of our objective function, we show that our algorithm globally converges to a critical point at the rate of O(1/k), where k denotes the number of iterations. Lastly, extensive experiments with tensor train decomposition and weight pruning demonstrate the efficiency and superior performance of the proposed framework. Our code implementation is available at https://github.com/ChenyangLi-97/NN-BCD
△ Less
Submitted 15 August, 2024; v1 submitted 12 March, 2023;
originally announced March 2023.
-
Anomaly Detection in Additive Manufacturing Processes using Supervised Classification with Imbalanced Sensor Data based on Generative Adversarial Network
Authors:
Jihoon Chung,
Bo Shen,
Zhenyu,
Kong
Abstract:
Supervised classification methods have been widely utilized for the quality assurance of the advanced manufacturing process, such as additive manufacturing (AM) for anomaly (defects) detection. However, since abnormal states (with defects) occur much less frequently than normal ones (without defects) in a manufacturing process, the number of sensor data samples collected from a normal state is usu…
▽ More
Supervised classification methods have been widely utilized for the quality assurance of the advanced manufacturing process, such as additive manufacturing (AM) for anomaly (defects) detection. However, since abnormal states (with defects) occur much less frequently than normal ones (without defects) in a manufacturing process, the number of sensor data samples collected from a normal state is usually much more than that from an abnormal state. This issue causes imbalanced training data for classification analysis, thus deteriorating the performance of detecting abnormal states in the process. It is beneficial to generate effective artificial sample data for the abnormal states to make a more balanced training set. To achieve this goal, this paper proposes a novel data augmentation method based on a generative adversarial network (GAN) using additive manufacturing process image sensor data. The novelty of our approach is that a standard GAN and classifier are jointly optimized with techniques to stabilize the learning process of standard GAN. The diverse and high-quality generated samples provide balanced training data to the classifier. The iterative optimization between GAN and classifier provides the high-performance classifier. The effectiveness of the proposed method is validated by both open-source data and real-world case studies in polymer and metal AM processes.
△ Less
Submitted 25 November, 2022; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Reinforcement Learning-based Defect Mitigation for Quality Assurance of Additive Manufacturing
Authors:
Jihoon Chung,
Bo Shen,
Andrew Chung Chee Law,
Zhenyu,
Kong
Abstract:
Additive Manufacturing (AM) is a powerful technology that produces complex 3D geometries using various materials in a layer-by-layer fashion. However, quality assurance is the main challenge in AM industry due to the possible time-varying processing conditions during AM process. Notably, new defects may occur during printing, which cannot be mitigated by offline analysis tools that focus on existi…
▽ More
Additive Manufacturing (AM) is a powerful technology that produces complex 3D geometries using various materials in a layer-by-layer fashion. However, quality assurance is the main challenge in AM industry due to the possible time-varying processing conditions during AM process. Notably, new defects may occur during printing, which cannot be mitigated by offline analysis tools that focus on existing defects. This challenge motivates this work to develop online learning-based methods to deal with the new defects during printing. Since AM typically fabricates a small number of customized products, this paper aims to create an online learning-based strategy to mitigate the new defects in AM process while minimizing the number of samples needed. The proposed method is based on model-free Reinforcement Learning (RL). It is called Continual G-learning since it transfers several sources of prior knowledge to reduce the needed training samples in the AM process. Offline knowledge is obtained from literature, while online knowledge is learned during printing. The proposed method develops a new algorithm for learning the optimal defect mitigation strategies proven the best performance when utilizing both knowledge sources. Numerical and real-world case studies in a fused filament fabrication (FFF) platform are performed and demonstrate the effectiveness of the proposed method.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
A Novel Sparse Bayesian Learning and Its Application to Fault Diagnosis for Multistation Assembly Systems
Authors:
Jihoon Chung,
Bo Shen,
Zhenyu,
Kong
Abstract:
This paper addresses the problem of fault diagnosis in multistation assembly systems. Fault diagnosis is to identify process faults that cause the excessive dimensional variation of the product using dimensional measurements. For such problems, the challenge is solving an underdetermined system caused by a common phenomenon in practice; namely, the number of measurements is less than that of the p…
▽ More
This paper addresses the problem of fault diagnosis in multistation assembly systems. Fault diagnosis is to identify process faults that cause the excessive dimensional variation of the product using dimensional measurements. For such problems, the challenge is solving an underdetermined system caused by a common phenomenon in practice; namely, the number of measurements is less than that of the process errors. To address this challenge, this paper attempts to solve the following two problems: (1) how to utilize the temporal correlation in the time series data of each process error and (2) how to apply prior knowledge regarding which process errors are more likely to be process faults. A novel sparse Bayesian learning method is proposed to achieve the above objectives. The method consists of three hierarchical layers. The first layer has parameterized prior distribution that exploits the temporal correlation of each process error. Furthermore, the second and third layers achieve the prior distribution representing the prior knowledge of process faults. Then, these prior distributions are updated with the likelihood function of the measurement samples from the process, resulting in the accurate posterior distribution of process faults from an underdetermined system. Since posterior distributions of process faults are intractable, this paper derives approximate posterior distributions via Variational Bayes inference. Numerical and simulation case studies using an actual autobody assembly process are performed to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
slimTrain -- A Stochastic Approximation Method for Training Separable Deep Neural Networks
Authors:
Elizabeth Newman,
Julianne Chung,
Matthias Chung,
Lars Ruthotto
Abstract:
Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a g…
▽ More
Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a given task depends crucially on tuning hyperparameters, especially learning rates and regularization parameters. In the absence of theoretical guidelines or prior experience on similar tasks, this requires solving many training problems, which can be time-consuming and demanding on computational resources. This can limit the applicability of DNNs to problems with non-standard, complex, and scarce datasets, e.g., those arising in many scientific applications. To remedy the challenges of DNN training, we propose slimTrain, a stochastic optimization method for training DNNs with reduced sensitivity to the choice hyperparameters and fast initial convergence. The central idea of slimTrain is to exploit the separability inherent in many DNN architectures; that is, we separate the DNN into a nonlinear feature extractor followed by a linear model. This separability allows us to leverage recent advances made for solving large-scale, linear, ill-posed inverse problems. Crucially, for the linear weights, slimTrain does not require a learning rate and automatically adapts the regularization parameter. Since our method operates on mini-batches, its computational overhead per iteration is modest. In our numerical experiments, slimTrain outperforms existing DNN training methods with the recommended hyperparameter settings and reduces the sensitivity of DNN training to the remaining hyperparameters.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Multiscale Comparative Connectomics
Authors:
Vivek Gopalakrishnan,
Jaewon Chung,
Eric Bridgeford,
Benjamin D. Pedigo,
Jesús Arroyo,
Lucy Upchurch,
G. Allan Johnson,
Nian Wang,
Youngser Park,
Carey E. Priebe,
Joshua T. Vogelstein
Abstract:
The connectome, a map of the structural and/or functional connections in the brain, provides a complex representation of the neurobiological phenotypes on which it supervenes. This information-rich data modality has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological processes, disorders, and diseases. However, existing computati…
▽ More
The connectome, a map of the structural and/or functional connections in the brain, provides a complex representation of the neurobiological phenotypes on which it supervenes. This information-rich data modality has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological processes, disorders, and diseases. However, existing computational techniques used to analyze connectomes are oftentimes insufficient for interrogating multi-subject connectomics datasets: many current methods are either solely designed to analyze single connectomes or leverage heuristic graph statistics that are unable to capture the complete topology of multiscale connections between brain regions. To enable more rigorous connectomics analysis, we introduce a set of robust and interpretable effect size measures motivated by recent theoretical advances in random graph models. These measures facilitate simultaneous analysis of multiple connectomes across different scales of network topology, enabling the robust and reproducible discovery of hierarchical brain structures that vary in relation to phenotypic profiles. In addition to explaining the theoretical foundations and guarantees of our algorithms, we demonstrate their superiority over current state-of-the-art connectomics methods through extensive simulation studies and real-data experiments. Using a set of high-resolution connectomes obtained from genetically distinct mouse strains (including the BTBR mouse -- a standard model of autism -- and three behavioral wild-types), we illustrate how our methods successfully uncover latent information in multi-subject connectomics data and yield valuable insights into the connective correlates of neurological phenotypes that other methods do not capture. The data and code necessary to reproduce our analyses are available at https://github.com/neurodata/MCC.
△ Less
Submitted 2 December, 2024; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loop
Authors:
Jonathan Chung,
Anna Luo,
Xavier Raffin,
Scott Perry
Abstract:
We present the Battlesnake Challenge, a framework for multi-agent reinforcement learning with Human-In-the-Loop Learning (HILL). It is developed upon Battlesnake, a multiplayer extension of the traditional Snake game in which 2 or more snakes compete for the final survival. The Battlesnake Challenge consists of an offline module for model training and an online module for live competitions. We dev…
▽ More
We present the Battlesnake Challenge, a framework for multi-agent reinforcement learning with Human-In-the-Loop Learning (HILL). It is developed upon Battlesnake, a multiplayer extension of the traditional Snake game in which 2 or more snakes compete for the final survival. The Battlesnake Challenge consists of an offline module for model training and an online module for live competitions. We develop a simulated game environment for the offline multi-agent model training and identify a set of baseline heuristics that can be instilled to improve learning. Our framework is agent-agnostic and heuristics-agnostic such that researchers can design their own algorithms, train their models, and demonstrate in the online Battlesnake competition. We validate the framework and baseline heuristics with our preliminary experiments. Our results show that agents with the proposed HILL methods consistently outperform agents without HILL. Besides, heuristics of reward manipulation had the best performance in the online competition. We open source our framework at https://github.com/awslabs/sagemaker-battlesnake-ai.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
An Efficient Framework for Clustered Federated Learning
Authors:
Avishek Ghosh,
Jichan Chung,
Dong Yin,
Kannan Ramchandran
Abstract:
We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For th…
▽ More
We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For this new framework of clustered federated learning, we propose the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA is guaranteed to converge, and discuss the optimality of the statistical error rate. In particular, for the linear model with two clusters, we can guarantee that our algorithm converges as long as the initialization is slightly better than random. When the clustering structure is ambiguous, we propose to train the models by combining IFCA with the weight sharing technique in multi-task learning. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization and multiple restarts. We also present experimental results showing that our algorithm is efficient in non-convex problems such as neural networks. We demonstrate the benefits of IFCA over the baselines on several clustered FL benchmarks.
△ Less
Submitted 8 June, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation
Authors:
Woosung Choi,
Minseok Kim,
Jaehwa Chung,
Daewon Lee,
Soonyoung Jung
Abstract:
Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We…
▽ More
Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.
△ Less
Submitted 8 October, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
Authors:
Joon Son Chung,
Arsha Nagrani,
Ernesto Coto,
Weidi Xie,
Mitchell McLaren,
Douglas A Reynolds,
Andrew Zisserman
Abstract:
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Inte…
▽ More
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Interspeech 2019 in Graz, Austria. This paper outlines the challenge and provides its baselines, results and discussions.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics
Authors:
Jaewon Chung,
Bijan Varjavand,
Jesus Arroyo,
Anton Alyakin,
Joshua Agterberg,
Minh Tang,
Joshua T. Vogelstein,
Carey E. Priebe
Abstract:
Testing whether two graphs come from the same distribution is of interest in many real world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing frame-work consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic, and finally applying the…
▽ More
Testing whether two graphs come from the same distribution is of interest in many real world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing frame-work consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic, and finally applying the nonparametric maximum mean discrepancy(MMD) test to obtain a p-value. Using synthetic data generated from Drosophila brain networks, we show that the median flip heuristic results in an invalid test, and demonstrate that optimal transport Procrustes (OTP) for alignment resolves the invalidity. We further demonstrate that substituting the MMD test with multiscale graph correlation(MGC) test leads to a more powerful test both in synthetic and in simulated data. Lastly, we apply this powerful test to the right and left hemispheres of the larval Drosophila mushroom body brain networks, and conclude that there is not sufficient evidence to reject the null hypothesis that the two hemispheres are equally distributed.
△ Less
Submitted 13 September, 2021; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Dual-domain Cascade of U-nets for Multi-channel Magnetic Resonance Image Reconstruction
Authors:
Roberto Souza,
Mariana Bento,
Nikita Nogovitsyn,
Kevin J. Chung,
R. Marc Lebel,
Richard Frayne
Abstract:
The U-net is a deep-learning network model that has been used to solve a number of inverse problems. In this work, the concatenation of two-element U-nets, termed the W-net, operating in k-space (K) and image (I) domains, were evaluated for multi-channel magnetic resonance (MR) image reconstruction. The two element network combinations were evaluated for the four possible image-k-space domain conf…
▽ More
The U-net is a deep-learning network model that has been used to solve a number of inverse problems. In this work, the concatenation of two-element U-nets, termed the W-net, operating in k-space (K) and image (I) domains, were evaluated for multi-channel magnetic resonance (MR) image reconstruction. The two element network combinations were evaluated for the four possible image-k-space domain configurations: a) W-net II, b) W-net KK, c) W-net IK, and d) W-net KI were evaluated. Selected promising four element networks (WW-nets) were also examined. Two configurations of each network were compared: 1) Each coil channel processed independently, and 2) all channels processed simultaneously. One hundred and eleven volumetric, T1-weighted, 12-channel coil k-space datasets were used in the experiments. Normalized root mean squared error, peak signal to noise ratio, visual information fidelity and visual inspection were used to assess the reconstructed images against the fully sampled reference images. Our results indicated that networks that operate solely in the image domain are better suited when processing individual channels of multi-channel data independently. Dual domain methods are more advantageous when simultaneously reconstructing all channels of multi-channel data. Also, the appropriate cascade of U-nets compared favorably (p < 0.01) to the previously published, state-of-the-art Deep Cascade model in in three out of four experiments.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Independence Testing for Temporal Data
Authors:
Cencheng Shen,
Jaewon Chung,
Ronak Mehta,
Ting Xu,
Joshua T. Vogelstein
Abstract:
Temporal data are increasingly prevalent in modern data science. A fundamental question is whether two time series are related or not. Existing approaches often have limitations, such as relying on parametric assumptions, detecting only linear associations, and requiring multiple tests and corrections. While many non-parametric and universally consistent dependence measures have recently been prop…
▽ More
Temporal data are increasingly prevalent in modern data science. A fundamental question is whether two time series are related or not. Existing approaches often have limitations, such as relying on parametric assumptions, detecting only linear associations, and requiring multiple tests and corrections. While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in an invalid test. To address these challenges, this paper introduces the temporal dependence statistic with block permutation to test independence between temporal data. Under proper assumptions, the proposed procedure is asymptotically valid and universally consistent for testing independence between stationary time series, and capable of estimating the optimal dependence lag that maximizes the dependence. Moreover, it is compatible with a rich family of distance and kernel based dependence measures, eliminates the need for multiple testing, and exhibits excellent testing power in various simulation settings.
△ Less
Submitted 27 May, 2024; v1 submitted 18 August, 2019;
originally announced August 2019.
-
GraSPy: Graph Statistics in Python
Authors:
Jaewon Chung,
Benjamin D. Pedigo,
Eric W. Bridgeford,
Bijan K. Varjavand,
Hayden S. Helm,
Joshua T. Vogelstein
Abstract:
We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations. This package provides flexible and easy-to-use algorithms for analyzing and understanding graphs with a scikit-learn compliant API. GraSPy can be downloaded from Python Package Index (PyPi), and is released under the Apache 2.0 open-source license. The…
▽ More
We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations. This package provides flexible and easy-to-use algorithms for analyzing and understanding graphs with a scikit-learn compliant API. GraSPy can be downloaded from Python Package Index (PyPi), and is released under the Apache 2.0 open-source license. The documentation and all releases are available at https://neurodata.io/graspy.
△ Less
Submitted 14 August, 2019; v1 submitted 29 March, 2019;
originally announced April 2019.
-
Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems
Authors:
Julianne Chung,
Matthias Chung,
J. Tanner Slagel,
Luis Tenorio
Abstract:
We describe stochastic Newton and stochastic quasi-Newton approaches to efficiently solve large linear least-squares problems where the very large data sets present a significant computational burden (e.g., the size may exceed computer memory or data are collected in real-time). In our proposed framework, stochasticity is introduced in two different frameworks as a means to overcome these computat…
▽ More
We describe stochastic Newton and stochastic quasi-Newton approaches to efficiently solve large linear least-squares problems where the very large data sets present a significant computational burden (e.g., the size may exceed computer memory or data are collected in real-time). In our proposed framework, stochasticity is introduced in two different frameworks as a means to overcome these computational limitations, and probability distributions that can exploit structure and/or sparsity are considered. Theoretical results on consistency of the approximations for both the stochastic Newton and the stochastic quasi-Newton methods are provided. The results show, in particular, that stochastic Newton iterates, in contrast to stochastic quasi-Newton iterates, may not converge to the desired least-squares solution. Numerical examples, including an example from extreme learning machines, demonstrate the potential applications of these methods.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Iterative Refinement of the Approximate Posterior for Directed Belief Networks
Authors:
R Devon Hjelm,
Kyunghyun Cho,
Junyoung Chung,
Russ Salakhutdinov,
Vince Calhoun,
Nebojsa Jojic
Abstract:
Variational methods that rely on a recognition network to approximate the posterior of directed graphical models offer better inference and learning than previous methods. Recent advances that exploit the capacity and flexibility in this approach have expanded what kinds of models can be trained. However, as a proposal for the posterior, the capacity of the recognition network is limited, which ca…
▽ More
Variational methods that rely on a recognition network to approximate the posterior of directed graphical models offer better inference and learning than previous methods. Recent advances that exploit the capacity and flexibility in this approach have expanded what kinds of models can be trained. However, as a proposal for the posterior, the capacity of the recognition network is limited, which can constrain the representational power of the generative model and increase the variance of Monte Carlo estimates. To address these issues, we introduce an iterative refinement procedure for improving the approximate posterior of the recognition network and show that training with the refined posterior is competitive with state-of-the-art methods. The advantages of refinement are further evident in an increased effective sample size, which implies a lower variance of gradient estimates.
△ Less
Submitted 20 February, 2018; v1 submitted 19 November, 2015;
originally announced November 2015.
-
Sparse Projection Oblique Randomer Forests
Authors:
Tyler M. Tomita,
James Browne,
Cencheng Shen,
Jaewon Chung,
Jesse L. Patsolic,
Benjamin Falk,
Jason Yim,
Carey E. Priebe,
Randal Burns,
Mauro Maggioni,
Joshua T. Vogelstein
Abstract:
Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortuna…
▽ More
Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with >100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. SPORF can easily be incorporated into other ensemble methods such as boosting to obtain potentially similar gains.
△ Less
Submitted 3 October, 2019; v1 submitted 10 June, 2015;
originally announced June 2015.
-
Gated Feedback Recurrent Neural Networks
Authors:
Junyoung Chung,
Caglar Gulcehre,
Kyunghyun Cho,
Yoshua Bengio
Abstract:
In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively…
▽ More
In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.
△ Less
Submitted 17 June, 2015; v1 submitted 9 February, 2015;
originally announced February 2015.
-
Deep Attribute Networks
Authors:
Junyoung Chung,
Donghoon Lee,
Youngjoo Seo,
Chang D. Yoo
Abstract:
Obtaining compact and discriminative features is one of the major challenges in many of the real-world image classification tasks such as face verification and object recognition. One possible approach is to represent input image on the basis of high-level features that carry semantic meaning which humans can understand. In this paper, a model coined deep attribute network (DAN) is proposed to add…
▽ More
Obtaining compact and discriminative features is one of the major challenges in many of the real-world image classification tasks such as face verification and object recognition. One possible approach is to represent input image on the basis of high-level features that carry semantic meaning which humans can understand. In this paper, a model coined deep attribute network (DAN) is proposed to address this issue. For an input image, the model outputs the attributes of the input image without performing any classification. The efficacy of the proposed model is evaluated on unconstrained face verification and real-world object recognition tasks using the LFW and the a-PASCAL datasets. We demonstrate the potential of deep learning for attribute-based classification by showing comparable results with existing state-of-the-art results. Once properly trained, the DAN is fast and does away with calculating low-level features which are maybe unreliable and computationally expensive.
△ Less
Submitted 28 November, 2012; v1 submitted 12 November, 2012;
originally announced November 2012.