Search | arXiv e-print repository

Toward Model-Agnostic Detection of New Physics Using Data-Driven Signal Regions

Authors: Soheun Yi, John Alison, Mikael Kuusela

Abstract: In the search for new particles in high-energy physics, it is crucial to select the Signal Region (SR) in such a way that it is enriched with signal events if they are present. While most existing search methods set the region relying on prior domain knowledge, it may be unavailable for a completely novel particle that falls outside the current scope of understanding. We address this issue by prop… ▽ More In the search for new particles in high-energy physics, it is crucial to select the Signal Region (SR) in such a way that it is enriched with signal events if they are present. While most existing search methods set the region relying on prior domain knowledge, it may be unavailable for a completely novel particle that falls outside the current scope of understanding. We address this issue by proposing a method built upon a model-agnostic but often realistic assumption about the localized topology of the signal events, in which they are concentrated in a certain area of the feature space. Considering the signal component as a localized high-frequency feature, our approach employs the notion of a low-pass filter. We define the SR as an area which is most affected when the observed events are smeared with additive random noise. We overcome challenges in density estimation in the high-dimensional feature space by learning the density ratio of events that potentially include a signal to the complementary observation of events that closely resemble the target events but are free of any signals. By applying our method to simulated $\mathrm{HH} \rightarrow 4b$ events, we demonstrate that the method can efficiently identify a data-driven SR in a high-dimensional feature space in which a high portion of signal events concentrate. △ Less

Submitted 10 December, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: 5 pages, 2 figures

arXiv:2404.13321 [pdf]

Accelerated System-Reliability-based Disaster Resilience Analysis for Structural Systems

Authors: Taeyong Kim, Sang-ri Yi

Abstract: Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resi… ▽ More Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resilience analysis framework was developed. The framework describes resilience using three criteria: reliability, redundancy, and recoverability, and the system's internal resilience is evaluated by inspecting the characteristics of reliability and redundancy for different possible progressive failure modes. However, the practical application of this framework has been limited to complex structures with numerous sub-components, as it becomes intractable to evaluate the performances for all possible initial disruption scenarios. To bridge the gap between the theory and practical use, especially for evaluating reliability and redundancy, this study centers on the idea that the computational burden can be substantially alleviated by focusing on initial disruption scenarios that are practically significant. To achieve this research goal, we propose three methods to efficiently eliminate insignificant scenarios: the sequential search method, the n-ball sampling method, and the surrogate model-based adaptive sampling algorithm. Three numerical examples, including buildings and a bridge, are introduced to prove the applicability and efficiency of the proposed approaches. The findings of this study are expected to offer practical solutions to the challenges of assessing resilience performance in complex structural systems. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 25 pages, 18 figures

arXiv:2403.11429 [pdf, other]

Long-range Ising model for regional-scale seismic risk analysis

Authors: Sebin Oh, Sang-ri Yi, Ziqi Wang

Abstract: This study introduces the long-range Ising model from statistical mechanics to the Performance-Based Earthquake Engineering (PBEE) framework for regional seismic damage analysis. The application of the PBEE framework at a regional scale involves estimating the damage states of numerous structures, typically performed using fragility function-based stochastic simulations. However, these simulations… ▽ More This study introduces the long-range Ising model from statistical mechanics to the Performance-Based Earthquake Engineering (PBEE) framework for regional seismic damage analysis. The application of the PBEE framework at a regional scale involves estimating the damage states of numerous structures, typically performed using fragility function-based stochastic simulations. However, these simulations often assume conditional independence or employ simplistic dependency models among the damage states of structures, leading to significant misrepresentation of regional risk. The Ising model addresses this issue by converting the available information on binary damage states (safe or failure) into a joint probability mass function, leveraging the principle of maximum entropy. The Ising model offers two main benefits: (1) it requires only the first- and second-order cross-moments, enabling seamless integration with the existing PBEE framework, and (2) it provides meaningful physical interpretations of the model parameters, facilitating the uncovering of insights not apparent from data. To demonstrate the proposed method, we applied the Ising model to 156 buildings in Antakya, Turkey, using post-hazard damage evaluation data, and to 182 buildings in Pacific Heights, San Francisco, using simulated data from the Regional Resilience Determination (R2D) tool. In both instances, the Ising model accurately reproduces the provided information and generates meaningful insights into regional damage. The study also investigates the change in Ising model parameters under varying earthquake magnitudes, along with the mean-field approximation, further facilitating the applicability of the proposed approach. △ Less

Submitted 23 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.04582 [pdf, other]

Dimensionality reduction can be used as a surrogate model for high-dimensional forward uncertainty quantification

Authors: Jungho Kim, Sang-ri Yi, Ziqi Wang

Abstract: We introduce a method to construct a stochastic surrogate model from the results of dimensionality reduction in forward uncertainty quantification. The hypothesis is that the high-dimensional input augmented by the output of a computational model admits a low-dimensional representation. This assumption can be met by numerous uncertainty quantification applications with physics-based computational… ▽ More We introduce a method to construct a stochastic surrogate model from the results of dimensionality reduction in forward uncertainty quantification. The hypothesis is that the high-dimensional input augmented by the output of a computational model admits a low-dimensional representation. This assumption can be met by numerous uncertainty quantification applications with physics-based computational models. The proposed approach differs from a sequential application of dimensionality reduction followed by surrogate modeling, as we "extract" a surrogate model from the results of dimensionality reduction in the input-output space. This feature becomes desirable when the input space is genuinely high-dimensional. The proposed method also diverges from the Probabilistic Learning on Manifold, as a reconstruction mapping from the feature space to the input-output space is circumvented. The final product of the proposed method is a stochastic simulator that propagates a deterministic input into a stochastic output, preserving the convenience of a sequential "dimensionality reduction + Gaussian process regression" approach while overcoming some of its limitations. The proposed method is demonstrated through two uncertainty quantification problems characterized by high-dimensional input uncertainties. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2306.02106 [pdf, other]

Impacts of Innovation School System in Korea: A Latent Space Item Response Model with Neyman-Scott Point Process

Authors: Seorim Yi, Minkyu Kim, Jaewoo Park, Minjeong Jeon, Ick Hoon Jin

Abstract: South Korea's educational system has faced criticism for its lack of focus on critical thinking and creativity, resulting in high levels of stress and anxiety among students. As part of the government's effort to improve the educational system, the innovation school system was introduced in 2009, which aims to develop students' creativity as well as their non-cognitive skills. To better understand… ▽ More South Korea's educational system has faced criticism for its lack of focus on critical thinking and creativity, resulting in high levels of stress and anxiety among students. As part of the government's effort to improve the educational system, the innovation school system was introduced in 2009, which aims to develop students' creativity as well as their non-cognitive skills. To better understand the differences between innovation and regular school systems in South Korea, we propose a novel method that combines the latent space item response model (LSIRM) with the Neyman-Scott (NS) point process model. Our method accounts for the heterogeneity of items and students, captures relationships between respondents and items, and identifies item and student clusters that can provide a comprehensive understanding of students' behaviors/perceptions on non-cognitive outcomes. Our analysis reveals that students in the innovation school system show a higher sense of citizenship, while those in the regular school system tend to associate confidence in appearance with social ability. We compare our model with exploratory item factor analysis in terms of item clustering and find that our approach provides a more detailed and automated analysis. A comparison with exploratory item factor analysis highlights our method's advantages in terms of uncertainty quantification of the clustering process and more detailed and nuanced clustering results. Our method is made available to an existing R package, lsirm12pl. △ Less

Submitted 27 May, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

arXiv:2207.06587 [pdf, other]

A Spatio-Temporal Dirichlet Process Mixture Model for Coronavirus Disease-19

Authors: Jaewoo Park, Seorim Yi, Won Chang, Jorge Mateu

Abstract: Understanding the spatio-temporal patterns of the coronavirus disease 2019 (COVID-19) is essential to construct public health interventions. Spatially referenced data can provide richer opportunities to understand the mechanism of the disease spread compared to the more often encountered aggregated count data. We propose a spatio-temporal Dirichlet process mixture model to analyze confirmed cases… ▽ More Understanding the spatio-temporal patterns of the coronavirus disease 2019 (COVID-19) is essential to construct public health interventions. Spatially referenced data can provide richer opportunities to understand the mechanism of the disease spread compared to the more often encountered aggregated count data. We propose a spatio-temporal Dirichlet process mixture model to analyze confirmed cases of COVID-19 in an urban environment. Our method can detect unobserved cluster centers of the epidemics, and estimate the space-time range of the clusters that are useful to construct a warning system. Furthermore, our model can measure the impact of different types of landmarks in the city, which provides an intuitive explanation of disease spreading sources from different time points. To efficiently capture the temporal dynamics of the disease patterns, we employ a sequential approach that uses the posterior distribution of the parameters for the previous time step as the prior information for the current time step. This approach enables us to incorporate time dependence into our model in a computationally efficient manner without complicating the model structure. We also develop a model assessment by comparing the data with theoretical densities, and outline the goodness-of-fit of our fitted model. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 26 pages, 10 figures

arXiv:2206.12891 [pdf, other]

Hierarchical nuclear norm penalization for multi-view data

Authors: Sangyoon Yi, Raymond K. W. Wong, Irina Gaynanova

Abstract: The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifyi… ▽ More The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifying associations across views. However, existing methods have limitations in modeling partially-shared structures due to either too restrictive models, or restrictive identifiability conditions. To address these challenges, we formulate a new model for partially-shared signals based on grouping the views into so-called hierarchical levels. The proposed hierarchy leads us to introduce a new penalty, hierarchical nuclear norm (HNN), for signal estimation. In contrast to existing methods, HNN penalization avoids scores and loadings factorization of the signals and leads to a convex optimization problem, which we solve using a dual forward-backward algorithm. We propose a simple refitting procedure to adjust the penalization bias and develop an adapted version of bi-cross-validation for selecting tuning parameters. Extensive simulation studies and analysis of the genotype-tissue expression data demonstrate the advantages of our method over existing alternatives. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: 39 pages, 10 figures, 3 tables

arXiv:2201.00459 [pdf, ps, other]

A sampling scheme for estimating the prevalence of a pandemic

Authors: Ze Liu, Siyu Yi, Jianghu, Dong, Min-Qian Liu, Yongdao Zhou

Abstract: The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper,… ▽ More The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilises the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verified its efficiency. △ Less

Submitted 2 January, 2022; originally announced January 2022.

arXiv:2011.08753 [pdf, other]

Confounding Feature Acquisition for Causal Effect Estimation

Authors: Shirly Wang, Seung Eun Yi, Shalmali Joshi, Marzyeh Ghassemi

Abstract: Reliable treatment effect estimation from observational data depends on the availability of all confounding information. While much work has targeted treatment effect estimation from observational data, there is relatively little work in the setting of confounding variable missingness, where collecting more information on confounders is often costly or time-consuming. In this work, we frame this c… ▽ More Reliable treatment effect estimation from observational data depends on the availability of all confounding information. While much work has targeted treatment effect estimation from observational data, there is relatively little work in the setting of confounding variable missingness, where collecting more information on confounders is often costly or time-consuming. In this work, we frame this challenge as a problem of feature acquisition of confounding features for causal inference. Our goal is to prioritize acquiring values for a fixed and known subset of missing confounders in samples that lead to efficient average treatment effect estimation. We propose two acquisition strategies based on i) covariate balancing (CB), and ii) reducing statistical estimation error on observed factual outcome error (OE). We compare CB and OE on five common causal effect estimation methods, and demonstrate improved sample efficiency of OE over baseline methods under various settings. We also provide visualizations for further analysis on the difference between our proposed methods. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2011.04868 [pdf, other]

Neural Network Compression Via Sparse Optimization

Authors: Tianyi Chen, Bo Ji, Yixin Shi, Tianyu Ding, Biyi Fang, Sheng Yi, Xiao Tu

Abstract: The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yi… ▽ More The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively. △ Less

Submitted 11 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

arXiv:2007.10740 [pdf, other]

Balanced Meta-Softmax for Long-Tailed Visual Recognition

Authors: Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, Hongsheng Li

Abstract: Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function, though used in most classification tasks, gives a biased gradient estimation under the long-tailed setup. This paper presents Balanced Softmax, an elegant unbiased… ▽ More Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function, though used in most classification tasks, gives a biased gradient estimation under the long-tailed setup. This paper presents Balanced Softmax, an elegant unbiased extension of Softmax, to accommodate the label distribution shift between training and testing. Theoretically, we derive the generalization bound for multiclass Softmax regression and show our loss minimizes the bound. In addition, we introduce Balanced Meta-Softmax, applying a complementary Meta Sampler to estimate the optimal class sample rate and further improve long-tailed learning. In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks. △ Less

Submitted 22 November, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: NeurIPS 2020 camera-ready; Code available at https://github.com/jiawei-ren/BalancedMetaSoftmax

arXiv:2007.05181 [pdf, other]

Sample-based Regularization: A Transfer Learning Strategy Toward Better Generalization

Authors: Yunho Jeon, Yongseok Choi, Jaesun Park, Subin Yi, Dongyeon Cho, Jiwon Kim

Abstract: Training a deep neural network with a small amount of data is a challenging problem as it is vulnerable to overfitting. However, one of the practical difficulties that we often face is to collect many samples. Transfer learning is a cost-effective solution to this problem. By using the source model trained with a large-scale dataset, the target model can alleviate the overfitting originated from t… ▽ More Training a deep neural network with a small amount of data is a challenging problem as it is vulnerable to overfitting. However, one of the practical difficulties that we often face is to collect many samples. Transfer learning is a cost-effective solution to this problem. By using the source model trained with a large-scale dataset, the target model can alleviate the overfitting originated from the lack of training data. Resorting to the ability of generalization of the source model, several methods proposed to use the source knowledge during the whole training procedure. However, this is likely to restrict the potential of the target model and some transferred knowledge from the source can interfere with the training procedure. For improving the generalization performance of the target model with a few training samples, we proposed a regularization method called sample-based regularization (SBR), which does not rely on the source's knowledge during training. With SBR, we suggested a new training framework for transfer learning. Experimental results showed that our framework outperformed existing methods in various configurations. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2004.03639 [pdf, other]

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Authors: Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu

Abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal st… ▽ More Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy. △ Less

Submitted 23 July, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Accepted by ECML 2020

arXiv:1910.07632 [pdf, other]

Adaptive Transfer Learning of Multi-View Time Series Classification

Authors: Donglin Zhan, Shiyu Yi, Dongli Xu, Xiao Yu, Denglin Jiang, Siqi Yu, Haoting Zhang, Wenfang Shangguan, Weihua Zhang

Abstract: Time Series Classification (TSC) has been an important and challenging task in data mining, especially on multivariate time series and multi-view time series data sets. Meanwhile, transfer learning has been widely applied in computer vision and natural language processing applications to improve deep neural network's generalization capabilities. However, very few previous works applied transfer le… ▽ More Time Series Classification (TSC) has been an important and challenging task in data mining, especially on multivariate time series and multi-view time series data sets. Meanwhile, transfer learning has been widely applied in computer vision and natural language processing applications to improve deep neural network's generalization capabilities. However, very few previous works applied transfer learning framework to time series mining problems. Particularly, the technique of measuring similarities between source domain and target domain based on dynamic representation such as density estimation with importance sampling has never been combined with transfer learning framework. In this paper, we first proposed a general adaptive transfer learning framework for multi-view time series data, which shows strong ability in storing inter-view importance value in the process of knowledge transfer. Next, we represented inter-view importance through some time series similarity measurements and approximated the posterior distribution in latent space for the importance sampling via density estimation techniques. We then computed the matrix norm of sampled importance value, which controls the degree of knowledge transfer in pre-training process. We further evaluated our work, applied it to many other time series classification tasks, and observed that our architecture maintained desirable generalization ability. Finally, we concluded that our framework could be adapted with deep learning techniques to receive significant model performance improvements. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: 12 pages, 5 figures

arXiv:1910.02519 [pdf, other]

FIS-GAN: GAN with Flow-based Importance Sampling

Authors: Shiyu Yi, Donglin Zhan, Wenqing Zhang, Denglin Jiang, Kang An, Hao Wang

Abstract: Generative Adversarial Networks (GAN) training process, in most cases, apply Uniform or Gaussian sampling methods in the latent space, which probably spends most of the computation on examples that can be properly handled and easy to generate. Theoretically, importance sampling speeds up stochastic optimization in supervised learning by prioritizing training examples. In this paper, we explore the… ▽ More Generative Adversarial Networks (GAN) training process, in most cases, apply Uniform or Gaussian sampling methods in the latent space, which probably spends most of the computation on examples that can be properly handled and easy to generate. Theoretically, importance sampling speeds up stochastic optimization in supervised learning by prioritizing training examples. In this paper, we explore the possibility of adapting importance sampling into adversarial learning. We use importance sampling to replace Uniform and Gaussian sampling methods in the latent space and employ normalizing flow to approximate latent space posterior distribution by density estimation. Empirically, results on MNIST and Fashion-MNIST demonstrate that our method significantly accelerates GAN's optimization while retaining visual fidelity in generated samples. △ Less

Submitted 16 December, 2022; v1 submitted 6 October, 2019; originally announced October 2019.

arXiv:1909.04999 [pdf, other]

Domain-Agnostic Few-Shot Classification by Learning Disparate Modulators

Authors: Yongseok Choi, Junyoung Park, Subin Yi, Dong-Yeon Cho

Abstract: Although few-shot learning research has advanced rapidly with the help of meta-learning, its practical usefulness is still limited because most of them assumed that all meta-training and meta-testing examples came from a single domain. We propose a simple but effective way for few-shot classification in which a task distribution spans multiple domains including ones never seen during meta-training… ▽ More Although few-shot learning research has advanced rapidly with the help of meta-learning, its practical usefulness is still limited because most of them assumed that all meta-training and meta-testing examples came from a single domain. We propose a simple but effective way for few-shot classification in which a task distribution spans multiple domains including ones never seen during meta-training. The key idea is to build a pool of models to cover this wide task distribution and learn to select the best one for a particular task through cross-domain meta-learning. All models in the pool share a base network while each model has a separate modulator to refine the base network in its own way. This framework allows the pool to have representational diversity without losing beneficial domain-invariant features. We verify the effectiveness of the proposed algorithm through experiments on various datasets across diverse domains. △ Less

Submitted 17 September, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: Presented at NeurIPS 2019 Workshop on Meta-Learning (MetaLearn 2019)

arXiv:1906.01819 [pdf, other]

Discriminative Few-Shot Learning Based on Directional Statistics

Authors: Junyoung Park, Subin Yi, Yongseok Choi, Dong-Yeon Cho, Jiwon Kim

Abstract: Metric-based few-shot learning methods try to overcome the difficulty due to the lack of training examples by learning embedding to make comparison easy. We propose a novel algorithm to generate class representatives for few-shot classification tasks. As a probabilistic model for learned features of inputs, we consider a mixture of von Mises-Fisher distributions which is known to be more expressiv… ▽ More Metric-based few-shot learning methods try to overcome the difficulty due to the lack of training examples by learning embedding to make comparison easy. We propose a novel algorithm to generate class representatives for few-shot classification tasks. As a probabilistic model for learned features of inputs, we consider a mixture of von Mises-Fisher distributions which is known to be more expressive than Gaussian in a high dimensional space. Then, from a discriminative classifier perspective, we get a better class representative considering inter-class correlation which has not been addressed by conventional few-shot learning algorithms. We apply our method to \emph{mini}ImageNet and \emph{tiered}ImageNet datasets, and show that the proposed approach outperforms other comparable methods in few-shot classification tasks. △ Less

Submitted 5 June, 2019; originally announced June 2019.

arXiv:1812.07699 [pdf, other]

A Comparison of LSTMs and Attention Mechanisms for Forecasting Financial Time Series

Authors: Thomas Hollis, Antoine Viscardi, Seung Eun Yi

Abstract: While LSTMs show increasingly promising results for forecasting Financial Time Series (FTS), this paper seeks to assess if attention mechanisms can further improve performance. The hypothesis is that attention can help prevent long-term dependencies experienced by LSTM models. To test this hypothesis, the main contribution of this paper is the implementation of an LSTM with attention. Both the ben… ▽ More While LSTMs show increasingly promising results for forecasting Financial Time Series (FTS), this paper seeks to assess if attention mechanisms can further improve performance. The hypothesis is that attention can help prevent long-term dependencies experienced by LSTM models. To test this hypothesis, the main contribution of this paper is the implementation of an LSTM with attention. Both the benchmark LSTM and the LSTM with attention were compared and both achieved reasonable performances of up to 60% on five stocks from Kaggle's Two Sigma dataset. This comparative analysis demonstrates that an LSTM with attention can indeed outperform standalone LSTMs but further investigation is required as issues do arise with such model architectures. △ Less

Submitted 18 December, 2018; originally announced December 2018.

arXiv:1805.00215 [pdf, other]

Internal node bagging

Authors: Shun Yi

Abstract: We introduce a novel view to understand how dropout works as an inexplicit ensemble learning method, which doesn't point out how many and which nodes to learn a certain feature. We propose a new training method named internal node bagging, it explicitly forces a group of nodes to learn a certain feature in training time, and combine those nodes to be one node in inference time. It means we can use… ▽ More We introduce a novel view to understand how dropout works as an inexplicit ensemble learning method, which doesn't point out how many and which nodes to learn a certain feature. We propose a new training method named internal node bagging, it explicitly forces a group of nodes to learn a certain feature in training time, and combine those nodes to be one node in inference time. It means we can use much more parameters to improve model's fitting ability in training time while keeping model small in inference time. We test our method on several benchmark datasets and find it performs significantly better than dropout on small models. △ Less

Submitted 20 September, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

Showing 1–19 of 19 results for author: Yi, S