Search | arXiv e-print repository

DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

Authors: Youngjun Song, Youngsik Hwang, Jonghun Lee, Heechang Lee, Dong-Young Lim

Abstract: Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape. However, we show that minimizing the total loss sharpness does not guarantee sharpness across individual domains. In particular, SAM ca… ▽ More Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape. However, we show that minimizing the total loss sharpness does not guarantee sharpness across individual domains. In particular, SAM can converge to fake flat minima, where the total loss may exhibit flat minima, but sharp minima are present in individual domains. Moreover, the current perturbation update in gradient ascent steps is ineffective in directly updating the sharpness of individual domains. Motivated by these findings, we introduce a novel DG algorithm, Decreased-overhead Gradual Sharpness-Aware Minimization (DGSAM), that applies gradual domain-wise perturbation to reduce sharpness consistently across domains while maintaining computational efficiency. Our experiments demonstrate that DGSAM outperforms state-of-the-art DG methods, achieving improved robustness to domain shifts and better performance across various benchmarks, while reducing computational overhead compared to SAM. △ Less

Submitted 30 March, 2025; originally announced March 2025.

arXiv:2409.18426 [pdf, other]

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Authors: Youngsik Hwang, Dong-Young Lim

Abstract: Physics-informed neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations (PDEs) by minimizing a combined loss function that incorporates both boundary loss and PDE residual loss. Despite their remarkable empirical performance in various scientific computing tasks, PINNs often fail to generate reasonable solutions, and such pathological behaviors rema… ▽ More Physics-informed neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations (PDEs) by minimizing a combined loss function that incorporates both boundary loss and PDE residual loss. Despite their remarkable empirical performance in various scientific computing tasks, PINNs often fail to generate reasonable solutions, and such pathological behaviors remain difficult to explain and resolve. In this paper, we identify that PINNs can be adversely trained when gradients of each loss function exhibit a significant imbalance in their magnitudes and present a negative inner product value. To address these issues, we propose a novel optimization framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of the updated gradient to ensure it falls within a dual cone region. This region is defined as a set of vectors where the inner products with both the gradients of the PDE residual loss and the boundary loss are non-negative. Theoretically, we analyze the convergence properties of DCGD algorithms in a non-convex setting. On a variety of benchmark equations, we demonstrate that DCGD outperforms other optimization algorithms in terms of various evaluation metrics. In particular, DCGD achieves superior predictive accuracy and enhances the stability of training for failure modes of PINNs and complex PDEs, compared to existing optimally tuned models. Moreover, DCGD can be further improved by combining it with popular strategies for PINNs, including learning rate annealing and the Neural Tangent Kernel (NTK). △ Less

Submitted 14 January, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

arXiv:2405.02905 [pdf, other]

doi 10.1002/sta4.70062

Mixture of partially linear experts

Authors: Yeongsan Hwang, Byungtae Seo, Sangkon Oh

Abstract: In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to cap… ▽ More In the mixture of experts model, a common assumption is the linearity between a response variable and covariates. While this assumption has theoretical and computational benefits, it may lead to suboptimal estimates by overlooking potential nonlinear relationships among the variables. To address this limitation, we propose a partially linear structure that incorporates unspecified functions to capture nonlinear relationships. We establish the identifiability of the proposed model under mild conditions and introduce a practical estimation algorithm. We present the performance of our approach through numerical studies, including simulations and real data analysis. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Journal ref: Stat, Volume 14, Issue 2, e70062, 2025

arXiv:2401.10437 [pdf, other]

Optimal Sensor Allocation with Multiple Linear Dispersion Processes

Authors: Xinchao Liu, Dzung Phan, Youngdeok Hwang, Levente Klein, Xiao Liu, Kyongmin Yeo

Abstract: This paper considers the optimal sensor allocation for estimating the emission rates of multiple sources in a two-dimensional spatial domain. Locations of potential emission sources are known (e.g., factory stacks), and the number of sources is much greater than the number of sensors that can be deployed, giving rise to the optimal sensor allocation problem. In particular, we consider linear dispe… ▽ More This paper considers the optimal sensor allocation for estimating the emission rates of multiple sources in a two-dimensional spatial domain. Locations of potential emission sources are known (e.g., factory stacks), and the number of sources is much greater than the number of sensors that can be deployed, giving rise to the optimal sensor allocation problem. In particular, we consider linear dispersion forward models, and the optimal sensor allocation is formulated as a bilevel optimization problem. The outer problem determines the optimal sensor locations by minimizing the overall Mean Squared Error of the estimated emission rates over various wind conditions, while the inner problem solves an inverse problem that estimates the emission rates. Two algorithms, including the repeated Sample Average Approximation and the Stochastic Gradient Descent based bilevel approximation, are investigated in solving the sensor allocation problem. Convergence analysis is performed to obtain the performance guarantee, and numerical examples are presented to illustrate the proposed approach. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2110.10604 [pdf, other]

Bayesian Model Calibration and Sensitivity Analysis for Oscillating Biological Experiments

Authors: Youngdeok Hwang, Hang J. Kim, Won Chang, Christian Hong, Steven N. MacEachern

Abstract: Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues,… ▽ More Understanding the oscillating behaviors that govern organisms' internal biological processes requires interdisciplinary efforts combining both biological and computer experiments, as the latter can complement the former by simulating perturbed conditions with higher resolution. Harmonizing the two types of experiment, however, poses significant statistical challenges due to identifiability issues, numerical instability, and ill behavior in high dimension. This article devises a new Bayesian calibration framework for oscillating biochemical models. The proposed Bayesian model is estimated relying on an advanced Markov chain Monte Carlo (MCMC) technique which can efficiently infer the parameter values that match the simulated and observed oscillatory processes. Also proposed is an approach to sensitivity analysis based on the intervention posterior. This approach measures the influence of individual parameters on the target process by using the obtained MCMC samples as a computational tool. The proposed framework is illustrated with circadian oscillations observed in a filamentous fungus, Neurospora crassa. △ Less

Submitted 13 December, 2024; v1 submitted 20 October, 2021; originally announced October 2021.

Comments: manuscript 33 pages, appendix 6 pages

MSC Class: 62P10 (Primary); 62-08 (Secondary)

arXiv:2004.11819 [pdf]

doi 10.1109/TGRS.2020.3010055

Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images

Authors: Younghwan Na, Jun Hee Kim, Kyungsu Lee, Juhum Park, Jae Youn Hwang, Jihwan P. Choi

Abstract: Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentati… ▽ More Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentation models trained by a training dataset fail to segment buildings for the test dataset. In this paper, we propose segmentation networks based on a domain adaptive transfer attack (DATA) scheme for building extraction from aerial images. The proposed system combines the domain transfer and adversarial attack concepts. Based on the DATA scheme, the distribution of the input images can be shifted to that of the target images while turning images into adversarial examples against a target network. Defending adversarial examples adapted to the target domain can overcome the performance degradation due to the domain gap and increase the robustness of the segmentation model. Cross-dataset experiments and the ablation study are conducted for the three different datasets: the Inria aerial image labeling dataset, the Massachusetts building dataset, and the WHU East Asia dataset. Compared to the performance of the segmentation network without the DATA scheme, the proposed method shows improvements in the overall IoU. Moreover, it is verified that the proposed method outperforms even when compared to feature adaptation (FA) and output space adaptation (OSA). △ Less

Submitted 29 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

Comments: 11pages, 12 figures

arXiv:2001.01401 [pdf, other]

Mel-spectrogram augmentation for sequence to sequence voice conversion

Authors: Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, Seong-Whan Lee

Abstract: For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies… ▽ More For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we proposed new policies (i.e., frequency warping, loudness and time length control) for more data variations. Moreover, to find the appropriate hyperparameters of augmentation policies without training the VC model, we proposed hyperparameter search strategy and the new metric for reducing experimental cost, namely deformation per deteriorating ratio. We compared the effect of these Mel-spectrogram augmentation methods based on various sizes of training set and augmentation policies. In the experimental results, the time axis warping based policies (i.e., time length control and time warping.) showed better performance than other policies. These results indicate that the use of the Mel-spectrogram augmentation is more beneficial for training the VC model. △ Less

Submitted 15 June, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: 5pages, 1 figures, 8 tables

arXiv:1911.04602 [pdf, other]

A clustered Gaussian process model for computer experiments

Authors: Chih-Li Sung, Benjamin Haaland, Youngdeok Hwang, Siyuan Lu

Abstract: A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is… ▽ More A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square errors than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters. An R package for the proposed methodology is provided in an open repository. △ Less

Submitted 5 November, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

arXiv:1806.05131 [pdf, other]

Synthesizing simulation and field data of solar irradiance

Authors: Furong Sun, Robert B. Gramacy, Benjamin Haaland, Siyuan Lu, Youngdeok Hwang

Abstract: Predicting the intensity and amount of sunlight as a function of location and time is an essential component in identifying promising locations for economical solar farming. Although weather models and irradiance data are relatively abundant, these have yet, to our knowledge, been hybridized on a continental scale. Rather, much of the emphasis in the literature has been on short-term localized for… ▽ More Predicting the intensity and amount of sunlight as a function of location and time is an essential component in identifying promising locations for economical solar farming. Although weather models and irradiance data are relatively abundant, these have yet, to our knowledge, been hybridized on a continental scale. Rather, much of the emphasis in the literature has been on short-term localized forecasting. This is probably because the amount of data involved in a more global analysis is prohibitive with the canonical toolkit, via the Gaussian process (GP). Here we show how GP surrogate and discrepancy models can be combined to tractably and accurately predict solar irradiance on time-aggregated and daily scales with measurements at thousands of sites across the continental United States. Our results establish short term accuracy of bias-corrected weather-based simulation of irradiance, when realizations are available in real space-time (e.g., in future days), and provide accurate surrogates for smoothing in the more common situation where reliable weather data is not available (e.g., in future years). △ Less

Submitted 22 June, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: 25 pages,7 figures, 3 tables

arXiv:1511.08343 [pdf, other]

The Automatic Statistician: A Relational Perspective

Authors: Yunseong Hwang, Anh Tong, Jaesik Choi

Abstract: Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite cova… ▽ More Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data. △ Less

Submitted 11 February, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

Showing 1–10 of 10 results for author: Hwang, Y