-
A False Discovery Rate Control Method Using a Fully Connected Hidden Markov Random Field for Neuroimaging Data
Authors:
Taehyo Kim,
Qiran Jia,
Mony J. de Leon,
Hai Shu
Abstract:
False discovery rate (FDR) control methods are essential for voxel-wise multiple testing in neuroimaging data analysis, where hundreds of thousands or even millions of tests are conducted to detect brain regions associated with disease-related changes. Classical FDR control methods (e.g., BH, q-value, and LocalFDR) assume independence among tests and often lead to high false non-discovery rates (F…
▽ More
False discovery rate (FDR) control methods are essential for voxel-wise multiple testing in neuroimaging data analysis, where hundreds of thousands or even millions of tests are conducted to detect brain regions associated with disease-related changes. Classical FDR control methods (e.g., BH, q-value, and LocalFDR) assume independence among tests and often lead to high false non-discovery rates (FNR). Although various spatial FDR control methods have been developed to improve power, they still fall short of jointly addressing three major challenges in neuroimaging applications: capturing complex spatial dependencies, maintaining low variability in both false discovery proportion (FDP) and false non-discovery proportion (FNP) across replications, and achieving computational scalability for high-resolution data. To address these challenges, we propose fcHMRF-LIS, a powerful, stable, and scalable spatial FDR control method for voxel-wise multiple testing. It integrates the local index of significance (LIS)-based testing procedure with a novel fully connected hidden Markov random field (fcHMRF) designed to model complex spatial structures using a parsimonious parameterization. We develop an efficient expectation-maximization algorithm incorporating mean-field approximation, the Conditional Random Fields as Recurrent Neural Networks (CRF-RNN) technique, and permutohedral lattice filtering, reducing the time complexity from quadratic to linear in the number of tests. Extensive simulations demonstrate that fcHMRF-LIS achieves accurate FDR control, lower FNR, reduced variability in FDP and FNP, and a higher number of true positives compared to existing methods. Applied to an FDG-PET dataset from the Alzheimer's Disease Neuroimaging Initiative, fcHMRF-LIS identifies neurobiologically relevant brain regions and offers notable advantages in computational efficiency.
△ Less
Submitted 29 May, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Data-Driven Sequential Sampling for Tail Risk Mitigation
Authors:
Dohyun Ahn,
Taeho Kim
Abstract:
Given a finite collection of stochastic alternatives, we study the problem of sequentially allocating a fixed sampling budget to identify the optimal alternative with a high probability, where the optimal alternative is defined as the one with the smallest value of extreme tail risk. We particularly consider a situation where these alternatives generate heavy-tailed losses whose probability distri…
▽ More
Given a finite collection of stochastic alternatives, we study the problem of sequentially allocating a fixed sampling budget to identify the optimal alternative with a high probability, where the optimal alternative is defined as the one with the smallest value of extreme tail risk. We particularly consider a situation where these alternatives generate heavy-tailed losses whose probability distributions are unknown and may not admit any specific parametric representation. In this setup, we propose data-driven sequential sampling policies that maximize the rate at which the likelihood of falsely selecting suboptimal alternatives decays to zero. We rigorously demonstrate the superiority of the proposed methods over existing approaches, which is further validated via numerical studies.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
A Generalized Theory of Mixup for Structure-Preserving Synthetic Data
Authors:
Chungpa Lee,
Jongho Im,
Joseph H. T. Kim
Abstract:
Mixup is a widely adopted data augmentation technique known for enhancing the generalization of machine learning models by interpolating between data points. Despite its success and popularity, limited attention has been given to understanding the statistical properties of the synthetic data it generates. In this paper, we delve into the theoretical underpinnings of mixup, specifically its effects…
▽ More
Mixup is a widely adopted data augmentation technique known for enhancing the generalization of machine learning models by interpolating between data points. Despite its success and popularity, limited attention has been given to understanding the statistical properties of the synthetic data it generates. In this paper, we delve into the theoretical underpinnings of mixup, specifically its effects on the statistical structure of synthesized data. We demonstrate that while mixup improves model performance, it can distort key statistical properties such as variance, potentially leading to unintended consequences in data synthesis. To address this, we propose a novel mixup method that incorporates a generalized and flexible weighting scheme, better preserving the original data's structure. Through theoretical developments, we provide conditions under which our proposed method maintains the (co)variance and distributional properties of the original dataset. Numerical experiments confirm that the new approach not only preserves the statistical characteristics of the original data but also sustains model performance across repeated synthesis, alleviating concerns of model collapse identified in previous research.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Constructing balanced datasets for predicting failure modes in structural systems under seismic hazards
Authors:
Jungho Kim,
Taeyong Kim
Abstract:
Accurate prediction of structural failure modes under seismic excitations is essential for seismic risk and resilience assessment. Traditional simulation-based approaches often result in imbalanced datasets dominated by non-failure or frequently observed failure scenarios, limiting the effectiveness in machine learning-based prediction. To address this challenge, this study proposes a framework fo…
▽ More
Accurate prediction of structural failure modes under seismic excitations is essential for seismic risk and resilience assessment. Traditional simulation-based approaches often result in imbalanced datasets dominated by non-failure or frequently observed failure scenarios, limiting the effectiveness in machine learning-based prediction. To address this challenge, this study proposes a framework for constructing balanced datasets that include distinct failure modes. The framework consists of three key steps. First, critical ground motion features (GMFs) are identified to effectively represent ground motion time histories. Second, an adaptive algorithm is employed to estimate the probability densities of various failure domains in the space of critical GMFs and structural parameters. Third, samples generated from these probability densities are transformed into ground motion time histories by using a scaling factor optimization process. A balanced dataset is constructed by performing nonlinear response history analyses on structural systems with parameters matching the generated samples, subjected to corresponding transformed ground motion time histories. Deep neural network models are trained on balanced and imbalanced datasets to highlight the importance of dataset balancing. To further evaluate the framework's applicability, numerical investigations are conducted using two different structural models subjected to recorded and synthetic ground motions. The results demonstrate the framework's robustness and effectiveness in addressing dataset imbalance and improving machine learning performance in seismic failure mode prediction.
△ Less
Submitted 26 February, 2025;
originally announced March 2025.
-
Optimizing Input Data Collection for Ranking and Selection
Authors:
Eunhye Song,
Taeho Kim
Abstract:
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of t…
▽ More
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of the optimum and show that its posterior probability of optimality converges to one at an exponential rate as the sampling budget increases. Assuming that the input parameters belong to a finite set, we characterize the $ε$-optimal static sampling ratios for input and simulation data that maximize the convergence rate. Using these ratios as guidance, we propose the optimal sampling algorithm for R&S (OSAR) that achieves the $ε$-optimal ratios almost surely in the limit. We further extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction. This not only improves OSAR's finite-sample performance, but also lets us tackle the case where the input parameters lie in a continuous space with a strong consistency guarantee for finding the optimum. We numerically demonstrate that OSAR outperforms a state-of-the-art competitor.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Matrix factorization and prediction for high dimensional co-occurrence count data via shared parameter alternating zero inflated Gamma model
Authors:
Taejoon Kim,
Haiyan Wang
Abstract:
High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user…
▽ More
High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user-item pairs in e-commerce, which also generates high-dimensional data. The objective is to utilize this data to predict the relevance between items or users. In this paper, we assume that items or users can be represented by unknown dense vectors. The model treats the co-occurrence counts as arising from zero-inflated Gamma random variables and employs cosine similarity between the unknown vectors to summarize item-item relevance. The unknown values are estimated using the shared parameter alternating zero-inflated Gamma regression models (SA-ZIG). Both canonical link and log link models are considered. Two parameter updating schemes are proposed, along with an algorithm to estimate the unknown parameters. Convergence analysis is presented analytically. Numerical studies demonstrate that the SA-ZIG using Fisher scoring without learning rate adjustment may fail to fi nd the maximum likelihood estimate. However, the SA-ZIG with learning rate adjustment performs satisfactorily in our simulation studies.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Global dense vector representations for words or items using shared parameter alternating Tweedie model
Authors:
Taejoon Kim,
Haiyan Wang
Abstract:
In this article, we present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform, cooccurring word-word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources. Different from traditional regr…
▽ More
In this article, we present a model for analyzing the cooccurrence count data derived from practical fields such as user-item or item-item data from online shopping platform, cooccurring word-word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying relevance of items or words from non-numerical sources. Different from traditional regression models, there are no observations for covariates. Additionally, the cooccurrence matrix is typically of so high dimension that it does not fit into a computer's memory for modeling. We extract numerical data by defining windows of cooccurrence using weighted count on the continuous scale. Positive probability mass is allowed for zero observations. We present Shared parameter Alternating Tweedie (SA-Tweedie) model and an algorithm to estimate the parameters. We introduce a learning rate adjustment used along with the Fisher scoring method in the inner loop to help the algorithm stay on track of optimizing direction. Gradient descent with Adam update was also considered as an alternative method for the estimation. Simulation studies and an application showed that our algorithm with Fisher scoring and learning rate adjustment outperforms the other two methods. Pseudo-likelihood approach with alternating parameter update was also studied. Numerical studies showed that the pseudo-likelihood approach is not suitable in our shared parameter alternating regression models with unobserved covariates.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Deep learning-based modularized loading protocol for parameter estimation of Bouc-Wen class models
Authors:
Sebin Oh,
Junho Song,
Taeyong Kim
Abstract:
This study proposes a modularized deep learning-based loading protocol for optimal parameter estimation of Bouc-Wen (BW) class models. The protocol consists of two key components: optimal loading history construction and CNN-based rapid parameter estimation. Each component is decomposed into independent sub-modules tailored to distinct hysteretic behaviors-basic hysteresis, structural degradation,…
▽ More
This study proposes a modularized deep learning-based loading protocol for optimal parameter estimation of Bouc-Wen (BW) class models. The protocol consists of two key components: optimal loading history construction and CNN-based rapid parameter estimation. Each component is decomposed into independent sub-modules tailored to distinct hysteretic behaviors-basic hysteresis, structural degradation, and pinching effect-making the protocol adaptable to diverse hysteresis models. Three independent CNN architectures are developed to capture the path-dependent nature of these hysteretic behaviors. By training these CNN architectures on diverse loading histories, minimal loading sequences, termed \textit{loading history modules}, are identified and then combined to construct an optimal loading history. The three CNN models, trained on the respective loading history modules, serve as rapid parameter estimators. Numerical evaluation of the protocol, including nonlinear time history analysis of a 3-story steel moment frame and fragility curve construction for a 3-story reinforced concrete frame, demonstrates that the proposed protocol significantly reduces total analysis time while maintaining or improving estimation accuracy. The proposed protocol can be extended to other hysteresis models, suggesting a systematic approach for identifying general hysteresis models.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Improved identification of breakpoints in piecewise regression and its applications
Authors:
Taehyeong Kim,
Hyungu Lee,
Hayoung Choi
Abstract:
Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It…
▽ More
Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It has a fast convergence rate and stability to find optimal breakpoints. Moreover, it can determine the optimal number of breakpoints. The computational results for real and synthetic data show that its accuracy is better than any existing methods. The real-world datasets demonstrate that breakpoints through the proposed algorithm provide valuable data information.
△ Less
Submitted 27 August, 2024; v1 submitted 25 August, 2024;
originally announced August 2024.
-
AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler
Authors:
Changhun Kim,
Taewon Kim,
Seungyeon Woo,
June Yong Yang,
Eunho Yang
Abstract:
In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to…
▽ More
In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to target data without accessing source data, crucial for privacy-sensitive tabular domains. However, existing TTA methods either 1) overlook the nature of tabular distribution shifts, often involving label distribution shifts, or 2) impose architectural constraints on the model, leading to a lack of applicability. To this end, we propose AdapTable, a novel TTA framework for tabular data. AdapTable operates in two stages: 1) calibrating model predictions using a shift-aware uncertainty calibrator, and 2) adjusting these predictions to match the target label distribution with a label distribution handler. We validate the effectiveness of AdapTable through theoretical analysis and extensive experiments on various distribution shift scenarios. Our results demonstrate AdapTable's ability to handle various real-world distribution shifts, achieving up to a 16% improvement on the HELOC dataset.
△ Less
Submitted 12 February, 2025; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Accelerated System-Reliability-based Disaster Resilience Analysis for Structural Systems
Authors:
Taeyong Kim,
Sang-ri Yi
Abstract:
Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resi…
▽ More
Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resilience analysis framework was developed. The framework describes resilience using three criteria: reliability, redundancy, and recoverability, and the system's internal resilience is evaluated by inspecting the characteristics of reliability and redundancy for different possible progressive failure modes. However, the practical application of this framework has been limited to complex structures with numerous sub-components, as it becomes intractable to evaluate the performances for all possible initial disruption scenarios. To bridge the gap between the theory and practical use, especially for evaluating reliability and redundancy, this study centers on the idea that the computational burden can be substantially alleviated by focusing on initial disruption scenarios that are practically significant. To achieve this research goal, we propose three methods to efficiently eliminate insignificant scenarios: the sequential search method, the n-ball sampling method, and the surrogate model-based adaptive sampling algorithm. Three numerical examples, including buildings and a bridge, are introduced to prove the applicability and efficiency of the proposed approaches. The findings of this study are expected to offer practical solutions to the challenges of assessing resilience performance in complex structural systems.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network
Authors:
Taeyoung Kim,
Hongseok Yang
Abstract:
The recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks, and brought new practical techniques for finding appropriate hyperparameters, learning network weights, and performing inference. In this paper, we broaden this line of research by showing that this infinite-width…
▽ More
The recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks, and brought new practical techniques for finding appropriate hyperparameters, learning network weights, and performing inference. In this paper, we broaden this line of research by showing that this infinite-width analysis can be extended to the Jacobian of a deep neural network. We show that a multilayer perceptron (MLP) and its Jacobian at initialisation jointly converge to a Gaussian process (GP) as the widths of the MLP's hidden layers go to infinity and characterise this GP. We also prove that in the infinite-width limit, the evolution of the MLP under the so-called robust training (i.e., training with a regulariser on the Jacobian) is described by a linear first-order ordinary differential equation that is determined by a variant of the Neural Tangent Kernel. We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
△ Less
Submitted 21 August, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
DeepFDR: A Deep Learning-based False Discovery Rate Control Method for Neuroimaging Data
Authors:
Taehyo Kim,
Hai Shu,
Qiran Jia,
Mony J. de Leon
Abstract:
Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencie…
▽ More
Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencies of the brain. Concurrently, deep learning methods have revolutionized image segmentation, a task closely related to voxel-based multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR control method that leverages unsupervised deep learning-based image segmentation to address the voxel-based multiple testing problem. Numerical studies, including comprehensive simulations and Alzheimer's disease FDG-PET image analysis, demonstrate DeepFDR's superiority over existing methods. DeepFDR not only excels in FDR control and effectively diminishes the false nondiscovery rate, but also boasts exceptional computational efficiency highly suited for tackling large-scale neuroimaging data.
△ Less
Submitted 10 March, 2024; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Selective Generation for Controllable Language Models
Authors:
Minjae Lee,
Kyungmin Kim,
Taesoo Kim,
Sangdon Park
Abstract:
Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems. Hence, certified risk control methods such as selective prediction and conformal prediction have been applied to mitigating the hallucination problem in various supervised downstream tasks. However, the lack of appropriate correctness metric hinders applying such principled meth…
▽ More
Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems. Hence, certified risk control methods such as selective prediction and conformal prediction have been applied to mitigating the hallucination problem in various supervised downstream tasks. However, the lack of appropriate correctness metric hinders applying such principled methods to language generation tasks. In this paper, we circumvent this problem by leveraging the concept of textual entailment to evaluate the correctness of the generated sequence, and propose two selective generation algorithms which control the false discovery rate with respect to the textual entailment relation (FDR-E) with a theoretical guarantee: $\texttt{SGen}^{\texttt{Sup}}$ and $\texttt{SGen}^{\texttt{Semi}}$. $\texttt{SGen}^{\texttt{Sup}}$, a direct modification of the selective prediction, is a supervised learning algorithm which exploits entailment-labeled data, annotated by humans. Since human annotation is costly, we further propose a semi-supervised version, $\texttt{SGen}^{\texttt{Semi}}$, which fully utilizes the unlabeled data by pseudo-labeling, leveraging an entailment set function learned via conformal prediction. Furthermore, $\texttt{SGen}^{\texttt{Semi}}$ enables to use more general class of selection functions, neuro-selection functions, and provides users with an optimal selection function class given multiple candidates. Finally, we demonstrate the efficacy of the $\texttt{SGen}$ family in achieving a desired FDR-E level with comparable selection efficiency to those from baselines on both open and closed source GLMs. Code and datasets are provided at https://github.com/ml-postech/selective-generation.
△ Less
Submitted 27 January, 2025; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Efficient Treatment Effect Estimation with Out-of-bag Post-stratification
Authors:
Taebin Kim,
Lili Wang,
Randy Lai,
Sangho Yoon
Abstract:
Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by mapping the covariates into a post-stratif…
▽ More
Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by mapping the covariates into a post-stratification factor (PSF) using predictive regression models. Inspired by the bootstrap aggregating (bagging) method, we utilize the out-of-bag delete-D jackknife to estimate strata boundaries, strata weights, and the variance of the point estimate. Confidence intervals are constructed with these estimators to take into account the additional variability coming from uncertainty in the strata boundaries and weights. Extensive simulations show that our proposed method consistently improves the efficiency of the estimates when the regression models are predictive and tends to be more robust than the regression imputation method.
△ Less
Submitted 12 September, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Maximum Agreement Linear Prediction via the Concordance Correlation Coefficient
Authors:
Taeho Kim,
George Luta,
Matteo Bottai,
Pierre Chausse,
Gheorghe Doros,
Edsel A. Pena
Abstract:
This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretica…
▽ More
This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretically and through computer experiments, with the estimated least-squares linear predictor (LSLP). Finite-sample and asymptotic properties are obtained, and confidence intervals are also presented. The predictors are illustrated using two real data sets: an eye data set and a bodyfat data set. The results indicate that the estimated MALP is a viable alternative to the estimated LSLP if one desires a predictor whose predicted values possess higher agreement with the predictand values, as measured by the CCC.
△ Less
Submitted 10 February, 2024; v1 submitted 9 April, 2023;
originally announced April 2023.
-
Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning
Authors:
Wonguk Cho,
Jinha Park,
Taesup Kim
Abstract:
Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus…
▽ More
Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning.
△ Less
Submitted 13 October, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Sufficient Invariant Learning for Distribution Shift
Authors:
Taero Kim,
Subeen Park,
Sungjun Lim,
Yonghan Jung,
Krikamol Muandet,
Kyungwoo Song
Abstract:
Learning robust models under distribution shifts between training and test datasets is a fundamental challenge in machine learning. While learning invariant features across environments is a popular approach, it often assumes that these features are fully observed in both training and test sets-a condition frequently violated in practice. When models rely on invariant features absent in the test s…
▽ More
Learning robust models under distribution shifts between training and test datasets is a fundamental challenge in machine learning. While learning invariant features across environments is a popular approach, it often assumes that these features are fully observed in both training and test sets-a condition frequently violated in practice. When models rely on invariant features absent in the test set, their robustness in new environments can deteriorate. To tackle this problem, we introduce a novel learning principle called the Sufficient Invariant Learning (SIL) framework, which focuses on learning a sufficient subset of invariant features rather than relying on a single feature. After demonstrating the limitation of existing invariant learning methods, we propose a new algorithm, Adaptive Sharpness-aware Group Distributionally Robust Optimization (ASGDRO), to learn diverse invariant features by seeking common flat minima across the environments. We theoretically demonstrate that finding a common flat minima enables robust predictions based on diverse invariant features. Empirical evaluations on multiple datasets, including our new benchmark, confirm ASGDRO's robustness against distribution shifts, highlighting the limitations of existing methods.
△ Less
Submitted 18 November, 2024; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Bounding the Rademacher Complexity of Fourier neural operators
Authors:
Taeyoung Kim,
Myungjoo Kang
Abstract:
A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, Graph neural operator (GNO), and Multiwavelet-based operator (MWTO). Compared with other models, the FNO is computationally efficient and can learn nonlinear operators b…
▽ More
A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, Graph neural operator (GNO), and Multiwavelet-based operator (MWTO). Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. From the perspective of our result, we inferred that the type of group norms determines the information about the weights and architecture of the FNO model stored in the capacity. And then, we confirmed these inferences through experiments. Based on this fact, we gained insight into the impact of the number of modes used in the FNO model on the generalization error. And we got experimental results that followed our insights.
△ Less
Submitted 26 September, 2022; v1 submitted 12 September, 2022;
originally announced September 2022.
-
Selection of the Most Probable Best
Authors:
Taeho Kim,
Kyoung-kuk Kim,
Eunhye Song
Abstract:
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB…
▽ More
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.
△ Less
Submitted 20 April, 2024; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Dynamic Gene Coexpression Analysis with Correlation Modeling
Authors:
Tae Hyun Kim,
Dan Nicolae
Abstract:
In many transcriptomic studies, the correlation of genes might fluctuate with quantitative factors such as genetic ancestry. We propose a method that models the covariance between two variables to vary against a continuous covariate. For the bivariate case, the proposed score test statistic is computationally simple and robust to model misspecification of the covariance term. Subsequently, the met…
▽ More
In many transcriptomic studies, the correlation of genes might fluctuate with quantitative factors such as genetic ancestry. We propose a method that models the covariance between two variables to vary against a continuous covariate. For the bivariate case, the proposed score test statistic is computationally simple and robust to model misspecification of the covariance term. Subsequently, the method is expanded to test relationships between one highly connected gene, such as a transcription factor, and several other genes for a more global investigation of the dynamic of the coexpression network. Simulations show that the proposed method has higher statistical power than alternatives, can be used in more diverse scenarios, and is computationally cheaper. We apply this method to African American subjects from GTEx to analyze the dynamic behavior of their gene coexpression against genetic ancestry and to identify transcription factors whose coexpression with their target genes change with the genetic ancestry. The proposed method can be applied to a wide array of problems that require covariance modeling.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Flexible Model Aggregation for Quantile Regression
Authors:
Rasool Fakoor,
Taesup Kim,
Jonas Mueller,
Alexander J. Smola,
Ryan J. Tibshirani
Abstract:
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for…
▽ More
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in statistics, machine learning, and related fields. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.
△ Less
Submitted 15 April, 2023; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Density Deconvolution with Non-Standard Error Distributions: Rates of Convergence and Adaptive Estimation
Authors:
Alexander Goldenshluger,
Taeho Kim
Abstract:
It is a typical standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper we focus on non--standard settings where the characteristic funct…
▽ More
It is a typical standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper we focus on non--standard settings where the characteristic function of the measurement errors has zeros, and study how zeros multiplicity affects the estimation accuracy. For a prototypical problem of this type we demonstrate that the best achievable estimation accuracy is determined by the multiplicity of zeros, the rate of decay of the error characteristic function, as well as by the smoothness and the tail behavior of the estimated density. We derive lower bounds on the minimax risk and develop optimal in the minimax sense estimators. In addition, we consider the problem of adaptive estimation and propose a data-driven estimator that automatically adapts to unknown smoothness and tail behavior of the density to be estimated.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Adaptive Local Bayesian Optimization Over Multiple Discrete Variables
Authors:
Taehyeon Kim,
Jaeyeon Ahn,
Nakyil Kim,
Seyoung Yun
Abstract:
In the machine learning algorithms, the choice of the hyperparameter is often an art more than a science, requiring labor-intensive search with expert experience. Therefore, automation on hyperparameter optimization to exclude human intervention is a great appeal, especially for the black-box functions. Recently, there have been increasing demands of solving such concealed tasks for better general…
▽ More
In the machine learning algorithms, the choice of the hyperparameter is often an art more than a science, requiring labor-intensive search with expert experience. Therefore, automation on hyperparameter optimization to exclude human intervention is a great appeal, especially for the black-box functions. Recently, there have been increasing demands of solving such concealed tasks for better generalization, though the task-dependent issue is not easy to solve. The Black-Box Optimization challenge (NeurIPS 2020) required competitors to build a robust black-box optimizer across different domains of standard machine learning problems. This paper describes the approach of team KAIST OSI in a step-wise manner, which outperforms the baseline algorithms by up to +20.39%. We first strengthen the local Bayesian search under the concept of region reliability. Then, we design a combinatorial kernel for a Gaussian process kernel. In a similar vein, we combine the methodology of Bayesian and multi-armed bandit,(MAB) approach to select the values with the consideration of the variable types; the real and integer variables are with Bayesian, while the boolean and categorical variables are with MAB. Empirical evaluations demonstrate that our method outperforms the existing methods across different tasks.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Can we Generalize and Distribute Private Representation Learning?
Authors:
Sheikh Shams Azam,
Taejin Kim,
Seyyedali Hosseinalipour,
Carlee Joe-Wong,
Saurabh Bagchi,
Christopher Brinton
Abstract:
We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL s…
▽ More
We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL solutions. While centrally-aggregated dataset is a prerequisite for most PRL techniques, data in real-world is often siloed across multiple distributed nodes unwilling to share the raw data because of privacy concerns. We address this practical constraint by developing D-EIGAN, the first distributed PRL method that learns representations at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement), and D-EIGAN's performance is consistently on par with EIGAN under different network settings.
△ Less
Submitted 30 January, 2022; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Prediction Regions for Poisson and Over-Dispersed Poisson Regression Models with Applications to Forecasting Number of Deaths during the COVID-19 Pandemic
Authors:
T. KIm,
B. Lieberman,
G. Luta,
E. Pena
Abstract:
Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions…
▽ More
Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions.
△ Less
Submitted 6 July, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
FrostNet: Towards Quantization-Aware Network Architecture Search
Authors:
Taehoon Kim,
YoungJoon Yoo,
Jihoon Yang
Abstract:
INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a ne…
▽ More
INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%).
△ Less
Submitted 30 November, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Designing a Bonus-Malus system reflecting the claim size under the dependent frequency-severity model
Authors:
Rosy Oh,
Joseph H. T. Kim,
Jae Youn Ahn
Abstract:
In auto insurance, a Bonus-Malus System (BMS) is commonly used as a posteriori risk classification mechanism to set the premium for the next contract period based on a policyholder's claim history. Even though recent literature reports evidence of a significant dependence between frequency and severity, the current BMS practice is to use a frequency-based transition rule while ignoring severity in…
▽ More
In auto insurance, a Bonus-Malus System (BMS) is commonly used as a posteriori risk classification mechanism to set the premium for the next contract period based on a policyholder's claim history. Even though recent literature reports evidence of a significant dependence between frequency and severity, the current BMS practice is to use a frequency-based transition rule while ignoring severity information. Although Oh et al. (2019) claim that the frequency-driven BMS transition rule can accommodate the dependence between frequency and severity, their proposal is only a partial solution, as the transition rule still completely ignores the claim severity and is unable to penalize large claims. In this study, we propose to use the BMS with a transition rule based on both frequency and size of claim, based on the bivariate random effect model, which conveniently allows dependence between frequency and severity. We analytically derive the optimal relativities under the proposed BMS framework and show that the proposed BMS outperforms the existing frequency-driven BMS. Later numerical experiments are also provided using both hypothetical and actual datasets in order to assess the effect of various dependencies on the BMS risk classification and confirm our theoretical findings.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image
Authors:
Taewon Kim,
Yeseong Park,
Youngbin Park,
Il Hong Suh
Abstract:
For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make t…
▽ More
For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Fast and Accurate Transferability Measurement for Heterogeneous Multivariate Data
Authors:
Seungcheol Park,
Huiwen Xu,
Taehun Kim,
Inhwan Hwang,
Kyung-Jun Kim,
U Kang
Abstract:
Given a set of heterogeneous source datasets with their classifiers, how can we quickly find the most useful source dataset for a specific target task? We address the problem of measuring transferability between source and target datasets, where the source and the target have different feature spaces and distributions. We propose Transmeter, a fast and accurate method to estimate the transferabili…
▽ More
Given a set of heterogeneous source datasets with their classifiers, how can we quickly find the most useful source dataset for a specific target task? We address the problem of measuring transferability between source and target datasets, where the source and the target have different feature spaces and distributions. We propose Transmeter, a fast and accurate method to estimate the transferability of two heterogeneous multivariate datasets. We address three challenges in measuring transferability between two heterogeneous multivariate datasets: reducing time, minimizing domain gap, and extracting meaningful homogeneous representations. To overcome the above issues, we utilize a pre-trained source model, an adversarial network, and an encoder-decoder architecture. Extensive experiments on heterogeneous multivariate datasets show that Transmeter gives the most accurate transferability measurement with up to 10.3 times faster performance than its competitor. We also show that selecting the best source data with Transmeter followed by a full transfer leads to the best transfer accuracy and the fastest running time.
△ Less
Submitted 29 January, 2021; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients
Authors:
Brenden K. Petersen,
Mikel Landajuela,
T. Nathan Mundhenk,
Claudio P. Santiago,
Soo K. Kim,
Joanne T. Kim
Abstract:
Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic regression}$. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via…
▽ More
Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic regression}$. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.
△ Less
Submitted 5 April, 2021; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information
Authors:
Taeho Kim,
Edsel A. Pena
Abstract:
Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the perfo…
▽ More
Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the performance of the MIE. The developed thresholds require additional information, so a prior distribution is assumed for this purpose. The MIE procedure is then evaluated by two performance measures: a global coverage probability and a global expected content, which are averages with respect to the prior distribution. The procedure defined by the performance measures will be called a Bayes MIE with thresholding (BMIE Thres). In this study, a normal-normal model is utilized to build up the BMIE Thres for a set of location parameters. Then, the behaviors of BMIE Thres are investigated in terms of the performance measures, which approach those of the corresponding z-based MIE as the thresholding parameter, C, goes to infinity. In addition, an optimization procedure is introduced to achieve the best thresholding parameter C. For illustrations, in-season baseball batting average data and leukemia gene expression data are used to demonstrate the procedure for the known and unknown standard deviations situations, respectively. In the ensuing simulations, the target parameters are generated from different true generating distributions to consider the misspecified prior situation. The simulation also involves Bayes credible MIEs, and the effectiveness among the different MIEs are compared with respect to the performance measures. In general, the thresholding procedure helps to achieve a meaningful reduction in the global expected content while maintaining a nominal level of the global coverage probability.
△ Less
Submitted 8 December, 2019;
originally announced December 2019.
-
Variational Temporal Abstraction
Authors:
Taesup Kim,
Sungjin Ahn,
Yoshua Bengio
Abstract:
We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the…
▽ More
We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the jumpy-imagination ability in imagination-augmented agent-learning in order to improve the efficiency of the imagination. In experiments, we demonstrate that our proposed method can model 2D and 3D visual sequence datasets with interpretable temporal structure discovery and that its application to jumpy imagination enables more efficient agent-learning in a 3D navigation task.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Scalable Neural Architecture Search for 3D Medical Image Segmentation
Authors:
Sungwoong Kim,
Ildoo Kim,
Sungbin Lim,
Woonhyuk Baek,
Chiheon Kim,
Hyungjoo Cho,
Boogeon Yoon,
Taesup Kim
Abstract:
In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due…
▽ More
In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due to high-resolution 3D medical images, a novel stochastic sampling algorithm based on a continuous relaxation is also proposed for scalable gradient based optimization. On the 3D medical image segmentation tasks with a benchmark dataset, an automatically designed architecture by the proposed NAS framework outperforms the human-designed 3D U-Net, and moreover this optimized architecture is well suited to be transferred for different tasks.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
On Single Source Robustness in Deep Fusion Models
Authors:
Taewan Kim,
Joydeep Ghosh
Abstract:
Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against sin…
▽ More
Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against single source noise is not guaranteed in a linear fusion model. Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise. Experimental results show that both training algorithms and our fusion layer make a deep fusion-based 3D object detector robust against noise applied to a single source, while preserving the original performance on clean data.
△ Less
Submitted 16 October, 2019; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Scaling Video Analytics on Constrained Edge Nodes
Authors:
Christopher Canel,
Thomas Kim,
Giulio Zhou,
Conglong Li,
Hyeontaek Lim,
David G. Andersen,
Michael Kaminsky,
Subramanya R. Dulloor
Abstract:
As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian tracking to offload high-quality video streams to a datacenter. This paper presents FilterForward, a new edge-to-cloud system that enables datacen…
▽ More
As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian tracking to offload high-quality video streams to a datacenter. This paper presents FilterForward, a new edge-to-cloud system that enables datacenter-based applications to process content from thousands of cameras by installing lightweight edge filters that backhaul only relevant video frames. FilterForward introduces fast and expressive per-application microclassifiers that share computation to simultaneously detect dozens of events on computationally constrained edge nodes. Only matching events are transmitted to the cloud. Evaluation on two real-world camera feed datasets shows that FilterForward reduces bandwidth use by an order of magnitude while improving computational efficiency and event detection accuracy for challenging video content.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
Fast AutoAugment
Authors:
Sungbin Lim,
Ildoo Kim,
Taesup Kim,
Chiheon Kim,
Sungwoong Kim
Abstract:
Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset.…
▽ More
Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet.
△ Less
Submitted 25 May, 2019; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Intra- and Inter-epoch Temporal Context Network (IITNet) Using Sub-epoch Features for Automatic Sleep Scoring on Raw Single-channel EEG
Authors:
Hogeon Seo,
Seunghyeok Back,
Seongju Lee,
Deokhwan Park,
Tae Kim,
Kyoobin Lee
Abstract:
A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch leve…
▽ More
A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch level by a residual neural network and captures intra- and inter-epoch temporal contexts from the sequence of the features via bidirectional LSTM. The performance was investigated for three datasets as the sequence length (L) increased from one to ten. IITNet achieved the comparable performance with other state-of-the-art results. The best accuracy, MF1, and Cohen's kappa ($κ$) were 83.9%, 77.6%, 0.78 for SleepEDF (L=10), 86.5%, 80.7%, 0.80 for MASS (L=9), and 86.7%, 79.8%, 0.81 for SHHS (L=10), respectively. Even though using four epochs, the performance was still comparable. Compared to using a single epoch, on average, accuracy and MF1 increased by 2.48%p and 4.90%p and F1 of N1, N2, and REM increased by 16.1%p, 1.50%p, and 6.42%p, respectively. Above four epochs, the performance improvement was not significant. The results support that considering the latest two-minute raw single-channel EEG can be a reasonable choice for sleep scoring via deep neural networks with efficiency and reliability. Furthermore, the experiments with the baselines showed that introducing intra-epoch temporal context learning with a deep residual network contributes to the improvement in the overall performance and has the positive synergy effect with the inter-epoch temporal context learning.
△ Less
Submitted 10 June, 2020; v1 submitted 18 February, 2019;
originally announced February 2019.
-
Effective Network Compression Using Simulation-Guided Iterative Pruning
Authors:
Dae-Woong Jeong,
Jaehun Kim,
Youngseok Kim,
Tae-Ho Kim,
Myungsu Chae
Abstract:
Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the redu…
▽ More
Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the reduced network. A simple experiment was conducted to evaluate the method; the results showed that the proposed method achieved higher performance than existing methods at the same pruning level.
△ Less
Submitted 11 February, 2019;
originally announced February 2019.
-
Stochastic Doubly Robust Gradient
Authors:
Kanghoon Lee,
Jihye Choi,
Moonsu Cha,
Jung-Kwon Lee,
Taeyoon Kim
Abstract:
When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when t…
▽ More
When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when training models using stochastic gradient descent (SGD). Inspired by the design of doubly robust estimator and its theoretical property of double robustness, we introduce stochastic doubly robust gradient (SDRG) consisting of two models: weight-corrected gradients for inverse propensity score weighting and per-covariate control variates for regression adjustment. Also, we identify the connection between double robustness and variance reduction in SGD by demonstrating the SDRG algorithm with a unifying framework for variance reduced SGD. The performance of our approach is empirically tested by showing the convergence in training image classifiers with several examples of missing data.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Quantifying Generalization in Reinforcement Learning
Authors:
Karl Cobbe,
Oleg Klimov,
Chris Hesse,
Taehoon Kim,
John Schulman
Abstract:
In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent's ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test se…
▽ More
In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent's ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test sets. Most notably, we introduce a new environment called CoinRun, designed as a benchmark for generalization in RL. Using CoinRun, we find that agents overfit to surprisingly large training sets. We then show that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization.
△ Less
Submitted 14 July, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Authors:
Hyeonwoo Noh,
Taehoon Kim,
Jonghwan Mun,
Bohyung Han
Abstract:
We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured…
▽ More
We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured and transferred to visual question answering models due to missing link between question dependent answering models and visual data without question. We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models. Specifically, we employ linguistic knowledge sources such as structured lexical database (e.g. WordNet) and visual descriptions for unsupervised task discovery, and transfer a learned task conditional visual classifier as an answering unit in a visual question answering model. We empirically show that the proposed algorithm generalizes to out-of-vocabulary answers successfully using the knowledge transferred from the visual dataset.
△ Less
Submitted 7 April, 2019; v1 submitted 3 October, 2018;
originally announced October 2018.
-
End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights
Authors:
Myungsu Chae,
Tae-Ho Kim,
Young Hoon Shin,
June-Woo Kim,
Soo-Young Lee
Abstract:
Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient cons…
▽ More
Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks.
△ Less
Submitted 2 October, 2018; v1 submitted 3 September, 2018;
originally announced September 2018.
-
Bayesian Model-Agnostic Meta-Learning
Authors:
Taesup Kim,
Jaesik Yoon,
Ousmane Dia,
Sungwoong Kim,
Yoshua Bengio,
Sungjin Ahn
Abstract:
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During…
▽ More
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During fast adaptation, the method is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation. In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update. Remaining an efficient gradient-based meta-learner, the method is also model-agnostic and simple to implement. Experiment results show the accuracy and robustness of the proposed method in various tasks: sinusoidal regression, image classification, active learning, and reinforcement learning.
△ Less
Submitted 18 November, 2018; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Deep Fluids: A Generative Network for Parameterized Fluid Simulations
Authors:
Byungsoo Kim,
Vinicius C. Azevedo,
Nils Thuerey,
Theodore Kim,
Markus Gross,
Barbara Solenthaler
Abstract:
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training d…
▽ More
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training data set, while providing plausible interpolated in-betweens. The proposed generative model is optimized for fluids by a novel loss function that guarantees divergence-free velocity fields at all times. In addition, we demonstrate that we can handle complex parameterizations in reduced spaces, and advance simulations in time by integrating in the latent space with a second network. Our method models a wide variety of fluid behaviors, thus enabling applications such as fast construction of simulations, interpolation of fluids with different parameters, time re-sampling, latent space simulations, and compression of fluid simulation data. Reconstructed velocity fields are generated up to 700x faster than re-simulating the data with the underlying CPU solver, while achieving compression rates of up to 1300x.
△ Less
Submitted 1 February, 2019; v1 submitted 6 June, 2018;
originally announced June 2018.
-
RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records
Authors:
Bum Chul Kwon,
Min-Je Choi,
Joanne Taery Kim,
Edward Choi,
Young Bin Kim,
Soonwook Kwon,
Jimeng Sun,
Jaegul Choo
Abstract:
We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular pred…
▽ More
We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular prediction. Such black-box nature of RNNs can impede its wide adoption in clinical practice. Furthermore, we have no established methods to interactively leverage users' domain expertise and prior knowledge as inputs for steering the model. Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers. Following the iterative design process between the experts, we design, implement, and evaluate a visual analytics tool called RetainVis, which couples a newly improved, interpretable and interactive RNN-based model called RetainEX and visualizations for users' exploration of EMR data in the context of prediction tasks. Our study shows the effective use of RetainVis for gaining insights into how individual medical codes contribute to making risk predictions, using EMRs of patients with heart failure and cataract symptoms. Our study also demonstrates how we made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity. This study will provide a useful guideline for researchers that aim to design an interpretable and interactive visual analytics tool for RNNs.
△ Less
Submitted 23 October, 2018; v1 submitted 27 May, 2018;
originally announced May 2018.
-
A Deep Reinforcement Learning Chatbot (Short Version)
Authors:
Iulian V. Serban,
Chinnadhurai Sankar,
Mathieu Germain,
Saizheng Zhang,
Zhouhan Lin,
Sandeep Subramanian,
Taesup Kim,
Michael Pieper,
Sarath Chandar,
Nan Rosemary Ke,
Sai Rajeswar,
Alexandre de Brebisson,
Jose M. R. Sotelo,
Dendi Suhubdy,
Vincent Michalski,
Alexandre Nguyen,
Joelle Pineau,
Yoshua Bengio
Abstract:
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based…
▽ More
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.
△ Less
Submitted 20 January, 2018;
originally announced January 2018.
-
Relaxed Oracles for Semi-Supervised Clustering
Authors:
Taewan Kim,
Joydeep Ghosh
Abstract:
Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle model…
▽ More
Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle models are considered where ambiguity in answering depends on the distance between two points. We show that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle. Experimental results on synthetic and real data show the effectiveness of our approach in overcoming supervision uncertainties and yielding high quality clusters.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Semi-Supervised Active Clustering with Weak Oracles
Authors:
Taewan Kim,
Joydeep Ghosh
Abstract:
Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different…
▽ More
Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the first model, random-weak oracle, an oracle randomly abstains with a certain probability. We also proposed two distance-weak oracle models which simulate the case of getting confused based on the distance between two points in a pairwise query. For each weak oracle model, we show that a small query complexity is adequate for the effective $k$ means clustering with high probability. Sufficient conditions for the guarantee include a $γ$-margin property of the data, and an existence of a point close to each cluster center. Furthermore, we provide a sample complexity with a reduced effect of the cluster's margin and only a logarithmic dependency on the data dimension. Our results allow significantly less number of same-cluster queries if the margin of the clusters is tight, i.e. $γ\approx 1$. Experimental results on synthetic data show the effective performance of our approach in overcoming uncertainties.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
A Deep Reinforcement Learning Chatbot
Authors:
Iulian V. Serban,
Chinnadhurai Sankar,
Mathieu Germain,
Saizheng Zhang,
Zhouhan Lin,
Sandeep Subramanian,
Taesup Kim,
Michael Pieper,
Sarath Chandar,
Nan Rosemary Ke,
Sai Rajeshwar,
Alexandre de Brebisson,
Jose M. R. Sotelo,
Dendi Suhubdy,
Vincent Michalski,
Alexandre Nguyen,
Joelle Pineau,
Yoshua Bengio
Abstract:
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-wor…
▽ More
We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.
△ Less
Submitted 5 November, 2017; v1 submitted 7 September, 2017;
originally announced September 2017.