-
Fed-Joint: Joint Modeling of Nonlinear Degradation Signals and Failure Events for Remaining Useful Life Prediction using Federated Learning
Authors:
Cheoljoon Jeong,
Xubo Yue,
Seokhyun Chung
Abstract:
Many failure mechanisms of machinery are closely related to the behavior of condition monitoring (CM) signals. To achieve a cost-effective preventive maintenance strategy, accurate remaining useful life (RUL) prediction based on the signals is of paramount importance. However, the CM signals are often recorded at different factories and production lines, with limited amounts of data. Unfortunately…
▽ More
Many failure mechanisms of machinery are closely related to the behavior of condition monitoring (CM) signals. To achieve a cost-effective preventive maintenance strategy, accurate remaining useful life (RUL) prediction based on the signals is of paramount importance. However, the CM signals are often recorded at different factories and production lines, with limited amounts of data. Unfortunately, these datasets have rarely been shared between the sites due to data confidentiality and ownership issues, a lack of computing and storage power, and high communication costs associated with data transfer between sites and a data center. Another challenge in real applications is that the CM signals are often not explicitly specified \textit{a priori}, meaning that existing methods, which often usually a parametric form, may not be applicable. To address these challenges, we propose a new prognostic framework for RUL prediction using the joint modeling of nonlinear degradation signals and time-to-failure data within a federated learning scheme. The proposed method constructs a nonparametric degradation model using a federated multi-output Gaussian process and then employs a federated survival model to predict failure times and probabilities for in-service machinery. The superiority of the proposed method over other alternatives is demonstrated through comprehensive simulation studies and a case study using turbofan engine degradation signal data that include run-to-failure events.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Estimating Neural Representation Alignment from Sparsely Sampled Inputs and Features
Authors:
Chanwoo Chun,
Abdulkadir Canatar,
SueYeon Chung,
Daniel D. Lee
Abstract:
In both artificial and biological systems, the centered kernel alignment (CKA) has become a widely used tool for quantifying neural representation similarity. While current CKA estimators typically correct for the effects of finite stimuli sampling, the effects of sampling a subset of neurons are overlooked, introducing notable bias in standard experimental scenarios. Here, we provide a theoretica…
▽ More
In both artificial and biological systems, the centered kernel alignment (CKA) has become a widely used tool for quantifying neural representation similarity. While current CKA estimators typically correct for the effects of finite stimuli sampling, the effects of sampling a subset of neurons are overlooked, introducing notable bias in standard experimental scenarios. Here, we provide a theoretical analysis showing how this bias is affected by the representation geometry. We then introduce a novel estimator that corrects for both input and feature sampling. We use our method for evaluating both brain-to-brain and model-to-brain alignments and show that it delivers reliable comparisons even with very sparsely sampled neurons. We perform within-animal and across-animal comparisons on electrophysiological data from visual cortical areas V1, V4, and IT data, and use these as benchmarks to evaluate model-to-brain alignment. We also apply our method to reveal how object representations become progressively disentangled across layers in both biological and artificial systems. These findings underscore the importance of correcting feature-sampling biases in CKA and demonstrate that our bias-corrected estimator provides a more faithful measure of representation alignment. The improved estimates increase our understanding of how neural activity is structured across both biological and artificial systems.
△ Less
Submitted 24 February, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Statistical Mechanics of Support Vector Regression
Authors:
Abdulkadir Canatar,
SueYeon Chung
Abstract:
A key problem in deep learning and computational neuroscience is relating the geometrical properties of neural representations to task performance. Here, we consider this problem for continuous decoding tasks where neural variability may affect task precision. Using methods from statistical mechanics, we study the average-case learning curves for $\varepsilon$-insensitive Support Vector Regression…
▽ More
A key problem in deep learning and computational neuroscience is relating the geometrical properties of neural representations to task performance. Here, we consider this problem for continuous decoding tasks where neural variability may affect task precision. Using methods from statistical mechanics, we study the average-case learning curves for $\varepsilon$-insensitive Support Vector Regression ($\varepsilon$-SVR) and discuss its capacity as a measure of linear decodability. Our analysis reveals a phase transition in the training error at a critical load, capturing the interplay between the tolerance parameter $\varepsilon$ and neural variability. We uncover a double-descent phenomenon in the generalization error, showing that $\varepsilon$ acts as a regularizer, both suppressing and shifting these peaks. Theoretical predictions are validated both on toy models and deep neural networks, extending the theory of Support Vector Machines to continuous tasks with inherent neural variability.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices
Authors:
Chanwoo Chun,
SueYeon Chung,
Daniel D. Lee
Abstract:
Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to bia…
▽ More
Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based on dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.
△ Less
Submitted 8 February, 2025; v1 submitted 23 October, 2024;
originally announced October 2024.
-
Federated Automatic Latent Variable Selection in Multi-output Gaussian Processes
Authors:
Jingyi Gao,
Seokhyun Chung
Abstract:
This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources/units/entities. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and e…
▽ More
This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources/units/entities. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and extracting common independent latent processes to express each unit as a linear combination of the shared latent patterns. However, this approach poses key challenges in (i) determining the adequate number of latent processes and (ii) relying on centralized learning which leads to potential privacy risks and significant computational burdens on the central server. To address these issues, we propose a hierarchical model that places spike-and-slab priors on the coefficients of each latent process. These priors help automatically select only needed latent processes by shrinking the coefficients of unnecessary ones to zero. To estimate the model while avoiding the drawbacks of centralized learning, we propose a variational inference-based approach, that formulates model inference as an optimization problem compatible with federated settings. We then design a federated learning algorithm that allows units to jointly select and infer the common latent processes without sharing their data. We also discuss an efficient learning approach for a new unit within our proposed federated framework. Simulation and case studies on Li-ion battery degradation and air temperature data demonstrate the advantageous features of our proposed approach.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Nonlinear classification of neural manifolds with contextual information
Authors:
Francesca Mignacco,
Chi-Ning Chou,
SueYeon Chung
Abstract:
Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a prom…
▽ More
Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a promising framework linking population geometry to the separability of neural manifolds. However, this metric has been limited to linear readouts. To address this limitation, we introduce a theoretical framework that leverages latent directions in input space, which can be related to contextual information. We derive an exact formula for the context-dependent manifold capacity that depends on manifold geometry and context correlations, and validate it on synthetic and real data. Our framework's increased expressivity captures representation reformatting in deep networks at early stages of the layer hierarchy, previously inaccessible to analysis. As context-dependent nonlinearity is ubiquitous in neural systems, our data-driven and theoretically grounded approach promises to elucidate context-dependent computation across scales, datasets, and models.
△ Less
Submitted 30 March, 2025; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Real-time Adaptation for Condition Monitoring Signal Prediction using Label-aware Neural Processes
Authors:
Seokhyun Chung,
Raed Al Kontar
Abstract:
Building a predictive model that rapidly adapts to real-time condition monitoring (CM) signals is critical for engineering systems/units. Unfortunately, many current methods suffer from a trade-off between representation power and agility in online settings. For instance, parametric methods that assume an underlying functional form for CM signals facilitate efficient online prediction updates. How…
▽ More
Building a predictive model that rapidly adapts to real-time condition monitoring (CM) signals is critical for engineering systems/units. Unfortunately, many current methods suffer from a trade-off between representation power and agility in online settings. For instance, parametric methods that assume an underlying functional form for CM signals facilitate efficient online prediction updates. However, this simplification leads to vulnerability to model specifications and an inability to capture complex signals. On the other hand, approaches based on over-parameterized or non-parametric models can excel at explaining complex nonlinear signals, but real-time updates for such models pose a challenging task. In this paper, we propose a neural process-based approach that addresses this trade-off. It encodes available observations within a CM signal into a representation space and then reconstructs the signal's history and evolution for prediction. Once trained, the model can encode an arbitrary number of observations without requiring retraining, enabling on-the-spot real-time predictions along with quantified uncertainty and can be readily updated as more online data is gathered. Furthermore, our model is designed to incorporate qualitative information (i.e., labels) from individual units. This integration not only enhances individualized predictions for each unit but also enables joint inference for both signals and their associated labels. Numerical studies on both synthetic and real-world data in reliability engineering highlight the advantageous features of our model in real-time adaptation, enhanced signal prediction with uncertainty quantification, and joint prediction for labels and signals.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk
Authors:
Colleen Chan,
Kisung You,
Sunny Chung,
Mauro Giuffrè,
Theo Saarinen,
Niroop Rajashekar,
Yuan Pu,
Yeo Eun Shin,
Loren Laine,
Ambrose Wong,
René Kizilcec,
Jasjeet Sekhon,
Dennis Shung
Abstract:
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni…
▽ More
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Barron Space for Graph Convolution Neural Networks
Authors:
Seok-Young Chung,
Qiyu Sun
Abstract:
Graph convolutional neural network (GCNN) operates on graph domain and it has achieved a superior performance to accomplish a wide range of tasks. In this paper, we introduce a Barron space of functions on a compact domain of graph signals. We prove that the proposed Barron space is a reproducing kernel Banach space, it can be decomposed into the union of a family of reproducing kernel Hilbert spa…
▽ More
Graph convolutional neural network (GCNN) operates on graph domain and it has achieved a superior performance to accomplish a wide range of tasks. In this paper, we introduce a Barron space of functions on a compact domain of graph signals. We prove that the proposed Barron space is a reproducing kernel Banach space, it can be decomposed into the union of a family of reproducing kernel Hilbert spaces with neuron kernels, and it could be dense in the space of continuous functions on the domain. Approximation property is one of the main principles to design neural networks. In this paper, we show that outputs of GCNNs are contained in the Barron space and functions in the Barron space can be well approximated by outputs of some GCNNs in the integrated square and uniform measurements. We also estimate the Rademacher complexity of functions with bounded Barron norm and conclude that functions in the Barron space could be learnt from their random samples efficiently.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Inside the black box: Neural network-based real-time prediction of US recessions
Authors:
Seulki Chung
Abstract:
Long short-term memory (LSTM) and gated recurrent unit (GRU) are used to model US recessions from 1967 to 2021. Their predictive performances are compared to those of the traditional linear models. The out-of-sample performance suggests the application of LSTM and GRU in recession forecasting, especially for longer-term forecasts. The Shapley additive explanations (SHAP) method is applied to both…
▽ More
Long short-term memory (LSTM) and gated recurrent unit (GRU) are used to model US recessions from 1967 to 2021. Their predictive performances are compared to those of the traditional linear models. The out-of-sample performance suggests the application of LSTM and GRU in recession forecasting, especially for longer-term forecasts. The Shapley additive explanations (SHAP) method is applied to both groups of models. The SHAP-based different weight assignments imply the capability of these types of neural networks to capture the business cycle asymmetries and nonlinearities. The SHAP method delivers key recession indicators, such as the S&P 500 index for short-term forecasting up to 3 months and the term spread for longer-term forecasting up to 12 months. These findings are robust against other interpretation methods, such as the local interpretable model-agnostic explanations (LIME) and the marginal effects.
△ Less
Submitted 23 May, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Linear Classification of Neural Manifolds with Correlated Variability
Authors:
Albert J. Wakhloo,
Tamara J. Sussman,
SueYeon Chung
Abstract:
Understanding how the statistical and geometric properties of neural activity relate to performance is a key problem in theoretical neuroscience and deep learning. Here, we calculate how correlations between object representations affect the capacity, a measure of linear separability. We show that for spherical object manifolds, introducing correlations between centroids effectively pushes the sph…
▽ More
Understanding how the statistical and geometric properties of neural activity relate to performance is a key problem in theoretical neuroscience and deep learning. Here, we calculate how correlations between object representations affect the capacity, a measure of linear separability. We show that for spherical object manifolds, introducing correlations between centroids effectively pushes the spheres closer together, while introducing correlations between the axes effectively shrinks their radii, revealing a duality between correlations and geometry with respect to the problem of classification. We then apply our results to accurately estimate the capacity of deep network data.
△ Less
Submitted 13 July, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Domain Generalization for Robust Model-Based Offline Reinforcement Learning
Authors:
Alan Clark,
Shoaib Ahmed Siddiqui,
Robert Kirk,
Usman Anwar,
Stephen Chung,
David Krueger
Abstract:
Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when…
▽ More
Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when collecting data from multiple human operators, yet remains unexplored. Since different demonstrators induce different data distributions, we show that this can be naturally framed as a domain generalization problem, with each demonstrator corresponding to a different domain. Specifically, we propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the process of learning dynamics and rewards models. Our results show that models trained with REx exhibit improved domain generalization performance when compared with the natural baseline of pooling all demonstrators' data. We observe that the resulting models frequently enable the learning of superior policies in the offline model-based RL setting, can improve the stability of the policy learning process, and potentially enable increased exploration.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks
Authors:
Samuel Lippl,
L. F. Abbott,
SueYeon Chung
Abstract:
Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks, gated linear networks (GLNs), and generalize these results to gated networks described by general homogeneous polyno…
▽ More
Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks, gated linear networks (GLNs), and generalize these results to gated networks described by general homogeneous polynomials. We study the implications of our results, focusing first on two-layer GLNs. We then apply our theoretical predictions to GLNs trained on MNIST and show how architectural constraints and the implicit bias of gradient descent affect performance. Finally, we show that our theory captures a substantial portion of the inductive bias of ReLU networks. By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust learning algorithms.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
On the geometry of generalization and memorization in deep neural networks
Authors:
Cory Stephenson,
Suchismita Padhy,
Abhinav Ganesh,
Yue Hui,
Hanlin Tang,
SueYeon Chung
Abstract:
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this be…
▽ More
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this behavior to generalization performance. Memorization predominately occurs in the deeper layers, due to decreasing object manifolds' radius and dimension, whereas early layers are minimally affected. This predicts that generalization can be restored by reverting the final few layer weights to earlier epochs before significant memorization occurred, which is confirmed by the experiments. Additionally, by studying generalization under different model sizes, we reveal the connection between the double descent phenomenon and the underlying model geometry. Finally, analytical analysis shows that networks avoid memorization early in training because close to initialization, the gradient contribution from permuted examples are small. These findings provide quantitative evidence for the structure of memorization across layers of a deep neural network, the drivers for such structure, and its connection to manifold geometric properties.
△ Less
Submitted 30 May, 2021;
originally announced May 2021.
-
Reinforcement Learning with Feedback-modulated TD-STDP
Authors:
Stephen Chung,
Robert Kozma
Abstract:
Spiking neuron networks have been used successfully to solve simple reinforcement learning tasks with continuous action set applying learning rules based on spike-timing-dependent plasticity (STDP). However, most of these models cannot be applied to reinforcement learning tasks with discrete action set since they assume that the selected action is a deterministic function of firing rate of neurons…
▽ More
Spiking neuron networks have been used successfully to solve simple reinforcement learning tasks with continuous action set applying learning rules based on spike-timing-dependent plasticity (STDP). However, most of these models cannot be applied to reinforcement learning tasks with discrete action set since they assume that the selected action is a deterministic function of firing rate of neurons, which is continuous. In this paper, we propose a new STDP-based learning rule for spiking neuron networks which contains feedback modulation. We show that the STDP-based learning rule can be used to solve reinforcement learning tasks with discrete action set at a speed similar to standard reinforcement learning algorithms when applied to the CartPole and LunarLander tasks. Moreover, we demonstrate that the agent is unable to solve these tasks if feedback modulation is omitted from the learning rule. We conclude that feedback modulation allows better credit assignment when only the units contributing to the executed action and TD error participate in learning.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
Weakly-supervised Multi-output Regression via Correlated Gaussian Processes
Authors:
Seokhyun Chung,
Raed Al Kontar,
Zhenke Wu
Abstract:
Multi-output regression seeks to borrow strength and leverage commonalities across different but related outputs in order to enhance learning and prediction accuracy. A fundamental assumption is that the output/group membership labels for all observations are known. This assumption is often violated in real applications. For instance, in healthcare datasets, sensitive attributes such as ethnicity…
▽ More
Multi-output regression seeks to borrow strength and leverage commonalities across different but related outputs in order to enhance learning and prediction accuracy. A fundamental assumption is that the output/group membership labels for all observations are known. This assumption is often violated in real applications. For instance, in healthcare datasets, sensitive attributes such as ethnicity are often missing or unreported. To this end, we introduce a weakly-supervised multi-output model based on dependent Gaussian processes. Our approach is able to leverage data without complete group labels or possibly only prior belief on group memberships to enhance accuracy across all outputs. Through intensive simulations and case studies on an Insulin, Testosterone and Bodyfat dataset, we show that our model excels in multi-output settings with missing labels, while being competitive in traditional fully labeled settings. We end by highlighting the possible use of our approach in fair inference and sequential decision-making.
△ Less
Submitted 23 May, 2022; v1 submitted 19 February, 2020;
originally announced February 2020.
-
Online Optimization with Memory and Competitive Control
Authors:
Guanya Shi,
Yiheng Lin,
Soon-Jo Chung,
Yisong Yue,
Adam Wierman
Abstract:
This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a c…
▽ More
This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems.
△ Less
Submitted 8 January, 2021; v1 submitted 12 February, 2020;
originally announced February 2020.
-
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
Authors:
Joon Son Chung,
Arsha Nagrani,
Ernesto Coto,
Weidi Xie,
Mitchell McLaren,
Douglas A Reynolds,
Andrew Zisserman
Abstract:
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Inte…
▽ More
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Interspeech 2019 in Graz, Austria. This paper outlines the challenge and provides its baselines, results and discussions.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Novelty Detection Via Blurring
Authors:
Sungik Choi,
Sae-Young Chung
Abstract:
Conventional out-of-distribution (OOD) detection schemes based on variational autoencoder or Random Network Distillation (RND) have been observed to assign lower uncertainty to the OOD than the target distribution. In this work, we discover that such conventional novelty detection schemes are also vulnerable to the blurred images. Based on the observation, we construct a novel RND-based OOD detect…
▽ More
Conventional out-of-distribution (OOD) detection schemes based on variational autoencoder or Random Network Distillation (RND) have been observed to assign lower uncertainty to the OOD than the target distribution. In this work, we discover that such conventional novelty detection schemes are also vulnerable to the blurred images. Based on the observation, we construct a novel RND-based OOD detector, SVD-RND, that utilizes blurred images during training. Our detector is simple, efficient at test time, and outperforms baseline OOD detectors in various domains. Further results show that SVD-RND learns better target distribution representation than the baseline RND algorithm. Finally, SVD-RND combined with geometric transform achieves near-perfect detection accuracy on the CelebA dataset.
△ Less
Submitted 3 March, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Two-stage dimension reduction for noisy high-dimensional images and application to Cryogenic Electron Microscopy
Authors:
Szu-Chi Chung,
Shao-Hsuan Wang,
Po-Yao Niu,
Su-Yun Huang,
Wei-Hau Chang,
I-Ping Tu
Abstract:
Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional nois…
▽ More
Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional noisy image data. The first stage treats the image as a matrix, which is a tensor of order 2, and uses multilinear principal component analysis (MPCA) for matrix rank reduction and image denoising. The second stage vectorizes the reduced-rank matrix and achieves further dimension and noise reduction. Simulation studies demonstrate excellent performance of 2SDR, for which we also develop an asymptotic theory that establishes consistency of its rank selection. Applications to cryo-EM (cryogenic electronic microscopy), which has revolutionized structural biology, organic and medical chemistry, cellular and molecular physiology in the past decade, are also provided and illustrated with benchmark cryo-EM datasets. Connections to other contemporaneous developments in image reconstruction and high-dimensional statistical inference are also discussed.
△ Less
Submitted 27 February, 2021; v1 submitted 21 November, 2019;
originally announced November 2019.
-
Robust Training with Ensemble Consensus
Authors:
Jisoo Lee,
Sae-Young Chung
Abstract:
Since deep neural networks are over-parameterized, they can memorize noisy examples. We address such a memorization issue in the presence of label noise. From the fact that deep neural networks cannot generalize to neighborhoods of memorized features, we hypothesize that noisy examples do not consistently incur small losses on the network under a certain perturbation. Based on this, we propose a n…
▽ More
Since deep neural networks are over-parameterized, they can memorize noisy examples. We address such a memorization issue in the presence of label noise. From the fact that deep neural networks cannot generalize to neighborhoods of memorized features, we hypothesize that noisy examples do not consistently incur small losses on the network under a certain perturbation. Based on this, we propose a novel training method called Learning with Ensemble Consensus (LEC) that prevents overfitting to noisy examples by removing them based on the consensus of an ensemble of perturbed networks. One of the proposed LECs, LTEC outperforms the current state-of-the-art methods on noisy MNIST, CIFAR-10, and CIFAR-100 in an efficient manner.
△ Less
Submitted 11 November, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Identification of relevant diffusion MRI metrics impacting cognitive functions using a novel feature selection method
Authors:
Tongda Xu,
Xiyan Cai,
Yao Wang,
Xiuyuan Wang,
Sohae Chung,
Els Fieremans,
Joseph Rath,
Steven Flanagan,
Yvonne W Lui
Abstract:
Mild Traumatic Brain Injury (mTBI) is a significant public health problem. The most troubling symptoms after mTBI are cognitive complaints. Studies show measurable differences between patients with mTBI and healthy controls with respect to tissue microstructure using diffusion MRI. However, it remains unclear which diffusion measures are the most informative with regard to cognitive functions in b…
▽ More
Mild Traumatic Brain Injury (mTBI) is a significant public health problem. The most troubling symptoms after mTBI are cognitive complaints. Studies show measurable differences between patients with mTBI and healthy controls with respect to tissue microstructure using diffusion MRI. However, it remains unclear which diffusion measures are the most informative with regard to cognitive functions in both the healthy state as well as after injury. In this study, we use diffusion MRI to formulate a predictive model for performance on working memory based on the most relevant MRI features. The key challenge is to identify relevant features over a large feature space with high accuracy in an efficient manner. To tackle this challenge, we propose a novel improvement of the best first search approach with crossover operators inspired by genetic algorithm. Compared against other heuristic feature selection algorithms, the proposed method achieves significantly more accurate predictions and yields clinically interpretable selected features.
△ Less
Submitted 11 November, 2019; v1 submitted 10 August, 2019;
originally announced August 2019.
-
Robust Regression for Safe Exploration in Control
Authors:
Anqi Liu,
Guanya Shi,
Soon-Jo Chung,
Anima Anandkumar,
Yisong Yue
Abstract:
We study the problem of safe learning and exploration in sequential control problems. The goal is to safely collect data samples from operating in an environment, in order to learn to achieve a challenging control goal (e.g., an agile maneuver close to a boundary). A central challenge in this setting is how to quantify uncertainty in order to choose provably-safe actions that allow us to collect i…
▽ More
We study the problem of safe learning and exploration in sequential control problems. The goal is to safely collect data samples from operating in an environment, in order to learn to achieve a challenging control goal (e.g., an agile maneuver close to a boundary). A central challenge in this setting is how to quantify uncertainty in order to choose provably-safe actions that allow us to collect informative data and reduce uncertainty, thereby achieving both improved controller safety and optimality. To address this challenge, we present a deep robust regression model that is trained to directly predict the uncertainty bounds for safe exploration. We derive generalization bounds for learning, and connect them with safety and stability bounds in control. We demonstrate empirically that our robust regression approach can outperform the conventional Gaussian process (GP) based safe exploration in settings where it is difficult to specify a good GP prior.
△ Less
Submitted 26 June, 2020; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Fourier Phase Retrieval with Extended Support Estimation via Deep Neural Network
Authors:
Kyung-Su Kim,
Sae-Young Chung
Abstract:
We consider the problem of sparse phase retrieval from Fourier transform magnitudes to recover the $k$-sparse signal vector and its support $\mathcal{T}$. We exploit extended support estimate $\mathcal{E}$ with size larger than $k$ satisfying $\mathcal{E} \supseteq \mathcal{T}$ and obtained by a trained deep neural network (DNN). To make the DNN learnable, it provides $\mathcal{E}$ as the union of…
▽ More
We consider the problem of sparse phase retrieval from Fourier transform magnitudes to recover the $k$-sparse signal vector and its support $\mathcal{T}$. We exploit extended support estimate $\mathcal{E}$ with size larger than $k$ satisfying $\mathcal{E} \supseteq \mathcal{T}$ and obtained by a trained deep neural network (DNN). To make the DNN learnable, it provides $\mathcal{E}$ as the union of equivalent solutions of $\mathcal{T}$ by utilizing modulo Fourier invariances. Set $\mathcal{E}$ can be estimated with short running time via the DNN, and support $\mathcal{T}$ can be determined from the DNN output rather than from the full index set by applying hard thresholding to $\mathcal{E}$. Thus, the DNN-based extended support estimation improves the reconstruction performance of the signal with a low complexity burden dependent on $k$. Numerical results verify that the proposed scheme has a superior performance with lower complexity compared to local search-based greedy sparse phase retrieval and a state-of-the-art variant of the Fienup method.
△ Less
Submitted 13 August, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Tree Search Network for Sparse Regression
Authors:
Kyung-Su Kim,
Sae-Young Chung
Abstract:
We consider the classical sparse regression problem of recovering a sparse signal $x_0$ given a measurement vector $y = Φx_0+w$. We propose a tree search algorithm driven by the deep neural network for sparse regression (TSN). TSN improves the signal reconstruction performance of the deep neural network designed for sparse regression by performing a tree search with pruning. It is observed in both…
▽ More
We consider the classical sparse regression problem of recovering a sparse signal $x_0$ given a measurement vector $y = Φx_0+w$. We propose a tree search algorithm driven by the deep neural network for sparse regression (TSN). TSN improves the signal reconstruction performance of the deep neural network designed for sparse regression by performing a tree search with pruning. It is observed in both noiseless and noisy cases, TSN recovers synthetic and real signals with lower complexity than a conventional tree search and is superior to existing algorithms by a large margin for various types of the sensing matrix $Φ$, widely used in sparse regression.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Authors:
Nan Wu,
Jason Phang,
Jungkyu Park,
Yiqiu Shen,
Zhe Huang,
Masha Zorin,
Stanisław Jastrzębski,
Thibault Févry,
Joe Katsnelson,
Eric Kim,
Stacey Wolfson,
Ujas Parikh,
Sushma Gaddam,
Leng Leng Young Lin,
Kara Ho,
Joshua D. Weinstein,
Beatriu Reig,
Yiming Gao,
Hildegard Toth,
Kristine Pysarenko,
Alana Lewin,
Jiyon Lee,
Krystal Airola,
Eralda Mema,
Stephanie Chung
, et al. (7 additional authors not shown)
Abstract:
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use…
▽ More
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Functional Principal Component Analysis for Extrapolating Multi-stream Longitudinal Data
Authors:
Seokhyun Chung,
Raed Kontar
Abstract:
The advance of modern sensor technologies enables collection of multi-stream longitudinal data where multiple signals from different units are collected in real-time. In this article, we present a non-parametric approach to predict the evolution of multi-stream longitudinal data for an in-service unit through borrowing strength from other historical units. Our approach first decomposes each stream…
▽ More
The advance of modern sensor technologies enables collection of multi-stream longitudinal data where multiple signals from different units are collected in real-time. In this article, we present a non-parametric approach to predict the evolution of multi-stream longitudinal data for an in-service unit through borrowing strength from other historical units. Our approach first decomposes each stream into a linear combination of eigenfunctions and their corresponding functional principal component (FPC) scores. A Gaussian process prior for the FPC scores is then established based on a functional semi-metric that measures similarities between streams of historical units and the in-service unit. Finally, an empirical Bayesian updating strategy is derived to update the established prior using real-time stream data obtained from the in-service unit. Experiments on synthetic and real world data show that the proposed framework outperforms state-of-the-art approaches and can effectively account for heterogeneity as well as achieve high predictive accuracy.
△ Less
Submitted 9 March, 2019;
originally announced March 2019.
-
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
Authors:
Su Young Lee,
Sungik Choi,
Sae-Young Chung
Abstract:
We propose Episodic Backward Update (EBU) - a novel deep reinforcement learning algorithm with a direct value propagation. In contrast to the conventional use of the experience replay with uniform random sampling, our agent samples a whole episode and successively propagates the value of a state to its previous states. Our computationally efficient recursive algorithm allows sparse and delayed rew…
▽ More
We propose Episodic Backward Update (EBU) - a novel deep reinforcement learning algorithm with a direct value propagation. In contrast to the conventional use of the experience replay with uniform random sampling, our agent samples a whole episode and successively propagates the value of a state to its previous states. Our computationally efficient recursive algorithm allows sparse and delayed rewards to propagate directly through all transitions of the sampled episode. We theoretically prove the convergence of the EBU method and experimentally demonstrate its performance in both deterministic and stochastic environments. Especially in 49 games of Atari 2600 domain, EBU achieves the same mean and median human normalized performance of DQN by using only 5% and 10% of samples, respectively.
△ Less
Submitted 11 November, 2019; v1 submitted 31 May, 2018;
originally announced May 2018.
-
A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation
Authors:
Seokhyun Chung,
Young Woong Park,
Taesu Cheong
Abstract:
Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example…
▽ More
Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example, the user adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring that the majority of the important regression assumptions are met. We also propose an efficient constraint to approximate the constraint for the coefficient t-test. When no subset satisfies all of the considered regression assumptions, our model provides an alternative subset that satisfies most of these assumptions. Computational results show that our model yields better solutions (i.e., satisfying more regression assumptions) compared to the state-of-the-art benchmark models while maintaining similar explanatory power.
△ Less
Submitted 26 July, 2020; v1 submitted 12 December, 2017;
originally announced December 2017.
-
Classification and Geometry of General Perceptual Manifolds
Authors:
SueYeon Chung,
Daniel D. Lee,
Haim Sompolinsky
Abstract:
Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to…
▽ More
Perceptual manifolds arise when a neural population responds to an ensemble of sensory signals associated with different physical features (e.g., orientation, pose, scale, location, and intensity) of the same perceptual object. Object recognition and discrimination requires classifying the manifolds in a manner that is insensitive to variability within a manifold. How neuronal systems give rise to invariant object classification and recognition is a fundamental problem in brain theory as well as in machine learning. Here we study the ability of a readout network to classify objects from their perceptual manifold representations. We develop a statistical mechanical theory for the linear classification of manifolds with arbitrary geometry revealing a remarkable relation to the mathematics of conic decomposition. Novel geometrical measures of manifold radius and manifold dimension are introduced which can explain the classification capacity for manifolds of various geometries. The general theory is demonstrated on a number of representative manifolds, including L2 ellipsoids prototypical of strictly convex manifolds, L1 balls representing polytopes consisting of finite sample points, and orientation manifolds which arise from neurons tuned to respond to a continuous angle variable, such as object orientation. The effects of label sparsity on the classification capacity of manifolds are elucidated, revealing a scaling relation between label sparsity and manifold radius. Theoretical predictions are corroborated by numerical simulations using recently developed algorithms to compute maximum margin solutions for manifold dichotomies. Our theory and its extensions provide a powerful and rich framework for applying statistical mechanics of linear classification to data arising from neuronal responses to object stimuli, as well as to artificial deep networks trained for object recognition tasks.
△ Less
Submitted 24 June, 2018; v1 submitted 17 October, 2017;
originally announced October 2017.
-
Learning Data Manifolds with a Cutting Plane Method
Authors:
SueYeon Chung,
Uri Cohen,
Haim Sompolinsky,
Daniel D. Lee
Abstract:
We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic se…
▽ More
We consider the problem of classifying data manifolds where each manifold represents invariances that are parameterized by continuous degrees of freedom. Conventional data augmentation methods rely upon sampling large numbers of training examples from these manifolds; instead, we propose an iterative algorithm called M_{CP} based upon a cutting-plane approach that efficiently solves a quadratic semi-infinite programming problem to find the maximum margin solution. We provide a proof of convergence as well as a polynomial bound on the number of iterations required for a desired tolerance in the objective function. The efficiency and performance of M_{CP} are demonstrated in high-dimensional simulations and on image manifolds generated from the ImageNet dataset. Our results indicate that M_{CP} is able to rapidly learn good classifiers and shows superior generalization performance compared with conventional maximum margin methods using data augmentation methods.
△ Less
Submitted 28 May, 2017;
originally announced May 2017.
-
Linear Readout of Object Manifolds
Authors:
SueYeon Chung,
Daniel D. Lee,
Haim Sompolinsky
Abstract:
Objects are represented in sensory systems by continuous manifolds due to sensitivity of neuronal responses to changes in physical features such as location, orientation, and intensity. What makes certain sensory representations better suited for invariant decoding of objects by downstream networks? We present a theory that characterizes the ability of a linear readout network, the perceptron, to…
▽ More
Objects are represented in sensory systems by continuous manifolds due to sensitivity of neuronal responses to changes in physical features such as location, orientation, and intensity. What makes certain sensory representations better suited for invariant decoding of objects by downstream networks? We present a theory that characterizes the ability of a linear readout network, the perceptron, to classify objects from variable neural responses. We show how the readout perceptron capacity depends on the dimensionality, size, and shape of the object manifolds in its input neural representation.
△ Less
Submitted 21 August, 2016; v1 submitted 6 December, 2015;
originally announced December 2015.