Search | arXiv e-print repository

doi 10.1007/s42952-025-00325-3

Building nonstationary extreme value model using L-moments

Authors: Yire Shin, Yonggwan Shin, Jeong-Soo Park

Abstract: The maximum likelihood estimation for a time-dependent nonstationary (NS) extreme value model is often too sensitive to influential observations, such as large values toward the end of a sample. Thus, alternative methods using L-moments have been developed in NS models to address this problem while retaining the advantages of the stationary L-moment method. However, one method using L-moments disp… ▽ More The maximum likelihood estimation for a time-dependent nonstationary (NS) extreme value model is often too sensitive to influential observations, such as large values toward the end of a sample. Thus, alternative methods using L-moments have been developed in NS models to address this problem while retaining the advantages of the stationary L-moment method. However, one method using L-moments displays inferior performance compared to stationary estimation when the data exhibit a positive trend in variance. To address this problem, we propose a new algorithm for efficiently estimating the NS parameters. The proposed method combines L-moments and robust regression, using standardized residuals. A simulation study demonstrates that the proposed method overcomes the mentioned problem. The comparison is conducted using conventional and redefined return level estimates. An application to peak streamflow data in Trehafod in the UK illustrates the usefulness of the proposed method. Additionally, we extend the proposed method to a NS extreme value model in which physical covariates are employed as predictors. Furthermore, we consider a model selection criterion based on the cross-validated generalized L-moment distance as an alternative to the likelihood-based criteria. △ Less

Submitted 1 June, 2025; originally announced June 2025.

Journal ref: Journal of the Korean Statistical Society, 2025

arXiv:2505.21417 [pdf, ps, other]

Model averaging with mixed criteria for estimating high quantiles of extreme values: Application to heavy rainfall

Authors: Yonggwan Shin, Yire Shin, Jeong-Soo Park

Abstract: Accurately estimating high quantiles beyond the largest observed value is crucial in risk assessment and devising effective adaptation strategies to prevent a greater disaster. The generalized extreme value distribution is widely used for this purpose, with L-moment estimation (LME) and maximum likelihood estimation (MLE) being the primary methods. However, estimating high quantiles with a small s… ▽ More Accurately estimating high quantiles beyond the largest observed value is crucial in risk assessment and devising effective adaptation strategies to prevent a greater disaster. The generalized extreme value distribution is widely used for this purpose, with L-moment estimation (LME) and maximum likelihood estimation (MLE) being the primary methods. However, estimating high quantiles with a small sample size becomes challenging when the upper endpoint is unbounded, or equivalently, when there are larger uncertainties involved in extrapolation. This study introduces an improved approach using a model averaging (MA) technique. The proposed method combines MLE and LME to construct candidate submodels and assign weights effectively. The properties of the proposed approach are evaluated through Monte Carlo simulations and an application to maximum daily rainfall data in Korea. Additionally, theoretical considerations are provided, including asymptotic variance with random weights. A surrogate model of MA estimation is also developed and applied for further analysis. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2502.07033 [pdf, ps, other]

Compatible Imputation for Hierarchical Linear Models with Incomplete Data: Interaction Effects of Continuous and Categorical Covariates MAR

Authors: Dongho Shin, Yongyun Shin

Abstract: This article focuses on Bayesian estimation of a hierarchical linear model (HLM) from incomplete data assumed missing at random where continuous covariates C and discrete categorical covariates $D$ have interaction effects on a continuous response $R$. Given small sample sizes, maximum likelihood estimation is suboptimal, and existing Gibbs samplers are based on a Bayesian joint distribution compa… ▽ More This article focuses on Bayesian estimation of a hierarchical linear model (HLM) from incomplete data assumed missing at random where continuous covariates C and discrete categorical covariates $D$ have interaction effects on a continuous response $R$. Given small sample sizes, maximum likelihood estimation is suboptimal, and existing Gibbs samplers are based on a Bayesian joint distribution compatible with the HLM, but impute missing values of $C$ and the underlying latent continuous variables $D^*$ of $D$ by a Metropolis algorithm via proposal normal densities having constant variances while the target conditional distributions of $C$ and $D$ have nonconstant variances. Therefore, the samplers are neither guaranteed to be compatible with the joint distribution nor ensured to always produce unbiased estimation of the HLM. We assume a Bayesian joint distribution of parameters and partially observed variables, including correlated categorical $D$, and introduce a compatible Gibbs sampler that draws parameters and missing values directly from the exact posterior distributions. We apply our sampler to incompletely observed longitudinal data from the small number of patient-physician encounters during office visits, and compare our estimators with those of existing methods by simulation. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: arXiv admin note: text overlap with arXiv:2405.21020

arXiv:2408.08764 [pdf, other]

doi 10.1007/s00477-023-02642-7

Generalized logistic model for $r$ largest order statistics, with hydrological application

Authors: Yire Shin, Jeong-Soo Park

Abstract: The effective use of available information in extreme value analysis is critical because extreme values are scarce. Thus, using the $r$ largest order statistics (rLOS) instead of the block maxima is encouraged. Based on the four-parameter kappa model for the rLOS (rK4D), we introduce a new distribution for the rLOS as a special case of the rK4D. That is the generalized logistic model for rLOS (rGL… ▽ More The effective use of available information in extreme value analysis is critical because extreme values are scarce. Thus, using the $r$ largest order statistics (rLOS) instead of the block maxima is encouraged. Based on the four-parameter kappa model for the rLOS (rK4D), we introduce a new distribution for the rLOS as a special case of the rK4D. That is the generalized logistic model for rLOS (rGLO). This distribution can be useful when the generalized extreme value model for rLOS is no longer efficient to capture the variability of extreme values. Moreover, the rGLO enriches a pool of candidate distributions to determine the best model to yield accurate and robust quantile estimates. We derive a joint probability density function, the marginal and conditional distribution functions of new model. The maximum likelihood estimation, delta method, profile likelihood, order selection by the entropy difference test, cross-validated likelihood criteria, and model averaging were considered for inferences. The usefulness and practical effectiveness of the rGLO are illustrated by the Monte Carlo simulation and an application to extreme streamflow data in Bevern Stream, UK. △ Less

Submitted 25 October, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

Comments: In this revision, some modification and correction from the published one are made on sentences and formula with blue color

Journal ref: Stoch Environ Res Risk Assess 38 (2024) 1567-1581

arXiv:2405.21020 [pdf, other]

Bayesian Estimation of Hierarchical Linear Models from Incomplete Data: Cluster-Level Interaction Effects and Small Sample Sizes

Authors: Dongho Shin, Yongyun Shin, Nao Hagiwara

Abstract: We consider Bayesian estimation of a hierarchical linear model (HLM) from partially observed data, assumed to be missing at random, and small sample sizes. A vector of continuous covariates $C$ includes cluster-level partially observed covariates with interaction effects. Due to small sample sizes from 37 patient-physician encounters repeatedly measured at four time points, maximum likelihood esti… ▽ More We consider Bayesian estimation of a hierarchical linear model (HLM) from partially observed data, assumed to be missing at random, and small sample sizes. A vector of continuous covariates $C$ includes cluster-level partially observed covariates with interaction effects. Due to small sample sizes from 37 patient-physician encounters repeatedly measured at four time points, maximum likelihood estimation is suboptimal. Existing Gibbs samplers impute missing values of $C$ by a Metropolis algorithm using proposal densities that have constant variances while the target posterior distributions have nonconstant variances. Therefore, these samplers may not ensure compatibility with the HLM and, as a result, may not guarantee unbiased estimation of the HLM. We introduce a compatible Gibbs sampler that imputes parameters and missing values directly from the exact posterior distributions. We apply our Gibbs sampler to the longitudinal patient-physician encounter data and compare our estimators with those from existing methods by simulation. △ Less

Submitted 30 January, 2025; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2312.10072 [pdf, other]

Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk

Authors: Colleen Chan, Kisung You, Sunny Chung, Mauro Giuffrè, Theo Saarinen, Niroop Rajashekar, Yuan Pu, Yeo Eun Shin, Loren Laine, Ambrose Wong, René Kizilcec, Jasjeet Sekhon, Dennis Shung

Abstract: Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni… ▽ More Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10, 2023, New Orleans, United States, 11 pages

arXiv:2309.01020 [pdf, other]

On the training and generalization of deep operator networks

Authors: Sanghyun Lee, Yeonjong Shin

Abstract: We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and non… ▽ More We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2308.13564 [pdf, other]

SGMM: Stochastic Approximation to Generalized Method of Moments

Authors: Xiaohong Chen, Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin, Myunghyun Song

Abstract: We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure c… ▽ More We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes. △ Less

Submitted 30 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 46 pages, 4 tables, 2 figures

arXiv:2209.14502 [pdf, other]

Fast Inference for Quantile Regression with Tens of Millions of Observations

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Abstract: Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear quantile regression applied to "ultra-large" datase… ▽ More Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear quantile regression applied to "ultra-large" datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic subgradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming "new observation", (ii) aggregating it as a $\textit{Polyak-Ruppert}$ average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time-series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over $10^3$ covariates to mitigate confounding effects. △ Less

Submitted 31 October, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: 62 pages, 8 figures

arXiv:2202.06383 [pdf, other]

Surgical Scheduling via Optimization and Machine Learning with Long-Tailed Data

Authors: Yuan Shi, Saied Mahdian, Jose Blanchet, Peter Glynn, Andrew Y. Shin, David Scheinker

Abstract: Using data from cardiovascular surgery patients with long and highly variable post-surgical lengths of stay (LOS), we develop a modeling framework to reduce recovery unit congestion. We estimate the LOS and its probability distribution using machine learning models, schedule procedures on a rolling basis using a variety of optimization models, and estimate performance with simulation. The machine… ▽ More Using data from cardiovascular surgery patients with long and highly variable post-surgical lengths of stay (LOS), we develop a modeling framework to reduce recovery unit congestion. We estimate the LOS and its probability distribution using machine learning models, schedule procedures on a rolling basis using a variety of optimization models, and estimate performance with simulation. The machine learning models achieved only modest LOS prediction accuracy, despite access to a very rich set of patient characteristics. Compared to the current paper-based system used in the hospital, most optimization models failed to reduce congestion without increasing wait times for surgery. A conservative stochastic optimization with sufficient sampling to capture the long tail of the LOS distribution outperformed the current manual process and other stochastic and robust optimization approaches. These results highlight the perils of using oversimplified distributional models of LOS for scheduling procedures and the importance of using optimization methods well-suited to dealing with long-tailed behavior. △ Less

Submitted 28 November, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

arXiv:2111.07513 [pdf]

A Comparative Study on Basic Elements of Deep Learning Models for Spatial-Temporal Traffic Forecasting

Authors: Yuyol Shin, Yoonjin Yoon

Abstract: Traffic forecasting plays a crucial role in intelligent transportation systems. The spatial-temporal complexities in transportation networks make the problem especially challenging. The recently suggested deep learning models share basic elements such as graph convolution, graph attention, recurrent units, and/or attention mechanism. In this study, we designed an in-depth comparative study for fou… ▽ More Traffic forecasting plays a crucial role in intelligent transportation systems. The spatial-temporal complexities in transportation networks make the problem especially challenging. The recently suggested deep learning models share basic elements such as graph convolution, graph attention, recurrent units, and/or attention mechanism. In this study, we designed an in-depth comparative study for four deep neural network models utilizing different basic elements. For base models, one RNN-based model and one attention-based model were chosen from previous literature. Then, the spatial feature extraction layers in the models were substituted with graph convolution and graph attention. To analyze the performance of each element in various environments, we conducted experiments on four real-world datasets - highway speed, highway flow, urban speed from a homogeneous road link network, and urban speed from a heterogeneous road link network. The results demonstrate that the RNN-based model and the attention-based model show a similar level of performance for short-term prediction, and the attention-based model outperforms the RNN in longer-term predictions. The choice of graph convolution and graph attention makes a larger difference in the RNN-based models. Also, our modified version of GMAN shows comparable performance with the original with less memory consumption. △ Less

Submitted 22 March, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

Comments: 14 pages, 4 figures, 3 Tables, This paper is accepted for AAAI-22 Workshop: AI for Transportation

arXiv:2106.03156 [pdf, other]

doi 10.1609/aaai.v36i7.20701

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Abstract: We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online data and is rigorously underpinned by a functiona… ▽ More We develop a new method of online inference for a vector of parameters estimated by the Polyak-Ruppert averaging procedure of stochastic gradient descent (SGD) algorithms. We leverage insights from time series regression in econometrics and construct asymptotically pivotal statistics via random scaling. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem. Our proposed inference method has a couple of key advantages over the existing methods. First, the test statistic is computed in an online fashion with only SGD iterates and the critical values can be obtained without any resampling methods, thereby allowing for efficient implementation suitable for massive online data. Second, there is no need to estimate the asymptotic variance and our inference method is shown to be robust to changes in the tuning parameters for SGD algorithms in simulation experiments with synthetic data. △ Less

Submitted 6 October, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: 29 pages, 8 figures, 8 tables

MSC Class: Primary 62J10; 62M02; secondary 60K35 ACM Class: G.3

Journal ref: Proceedings of the 36th AAAI Conference on Artificial Intelligence, 36(7), 2022, pp. 7381-7389

arXiv:2105.11025 [pdf, other]

Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization Bounds

Authors: John Y. Shin

Abstract: Heavy-tailed distributions have been studied in statistics, random matrix theory, physics, and econometrics as models of correlated systems, among other domains. Further, heavy-tail distributed eigenvalues of the covariance matrix of the weight matrices in neural networks have been shown to empirically correlate with test set accuracy in several works (e.g. arXiv:1901.08276), but a formal relation… ▽ More Heavy-tailed distributions have been studied in statistics, random matrix theory, physics, and econometrics as models of correlated systems, among other domains. Further, heavy-tail distributed eigenvalues of the covariance matrix of the weight matrices in neural networks have been shown to empirically correlate with test set accuracy in several works (e.g. arXiv:1901.08276), but a formal relationship between heavy-tail distributed parameters and generalization bounds was yet to be demonstrated. In this work, the compression framework of arXiv:1802.05296 is utilized to show that matrices with heavy-tail distributed matrix elements can be compressed, resulting in networks with sparse weight matrices. Since the parameter count has been reduced to a sum of the non-zero elements of sparse matrices, the compression framework allows us to bound the generalization gap of the resulting compressed network with a non-vacuous generalization bound. Further, the action of these matrices on a vector is discussed, and how they may relate to compression and resilient classification is analyzed. △ Less

Submitted 23 May, 2021; originally announced May 2021.

arXiv:2101.11568 [pdf, other]

doi 10.1016/j.jeconom.2022.11.006

Predictive Quantile Regression with Mixed Roots and Increasing Dimensions: The ALQR Approach

Authors: Rui Fan, Ji Hyung Lee, Youngki Shin

Abstract: In this paper we propose the adaptive lasso for predictive quantile regression (ALQR). Reflecting empirical findings, we allow predictors to have various degrees of persistence and exhibit different signal strengths. The number of predictors is allowed to grow with the sample size. We study regularity conditions under which stationary, local unit root, and cointegrated predictors are present simul… ▽ More In this paper we propose the adaptive lasso for predictive quantile regression (ALQR). Reflecting empirical findings, we allow predictors to have various degrees of persistence and exhibit different signal strengths. The number of predictors is allowed to grow with the sample size. We study regularity conditions under which stationary, local unit root, and cointegrated predictors are present simultaneously. We next show the convergence rates, model selection consistency, and asymptotic distributions of ALQR. We apply the proposed method to the out-of-sample quantile prediction problem of stock returns and find that it outperforms the existing alternatives. We also provide numerical evidence from additional Monte Carlo experiments, supporting the theoretical results. △ Less

Submitted 3 December, 2022; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: 71 pages, 5 figures, 18 tables

Journal ref: Journal of Econometrics, Vol 237, No 2, Part C, Article 105372, 2023

arXiv:2010.07604 [pdf, other]

Sequential Likelihood-Free Inference with Neural Proposal

Authors: Dongjun Kim, Kyungwoo Song, YoonYeong Kim, Yongjin Shin, Wanmo Kang, Il-Chul Moon, Weonyoung Joo

Abstract: Bayesian inference without the likelihood evaluation, or likelihood-free inference, has been a key research topic in simulation studies for gaining quantitatively validated simulation models on real-world datasets. As the likelihood evaluation is inaccessible, previous papers train the amortized neural network to estimate the ground-truth posterior for the simulation of interest. Training the netw… ▽ More Bayesian inference without the likelihood evaluation, or likelihood-free inference, has been a key research topic in simulation studies for gaining quantitatively validated simulation models on real-world datasets. As the likelihood evaluation is inaccessible, previous papers train the amortized neural network to estimate the ground-truth posterior for the simulation of interest. Training the network and accumulating the dataset alternatively in a sequential manner could save the total simulation budget by orders of magnitude. In the data accumulation phase, the new simulation inputs are chosen within a portion of the total simulation budget to accumulate upon the collected dataset. This newly accumulated data degenerates because the set of simulation inputs is hardly mixed, and this degenerated data collection process ruins the posterior inference. This paper introduces a new sampling approach, called Neural Proposal (NP), of the simulation input that resolves the biased data collection as it guarantees the i.i.d. sampling. The experiments show the improved performance of our sampler, especially for the simulations with multi-modal posteriors. △ Less

Submitted 4 November, 2022; v1 submitted 15 October, 2020; originally announced October 2020.

arXiv:2007.12031 [pdf, other]

doi 10.1016/j.wace.2022.100533

Modeling climate extremes using the four-parameter kappa distribution for $r$-largest order statistics

Authors: Yire Shin, Jeong-Soo Park

Abstract: Accurate estimation of the T-year return levels of climate extremes using statistical distribution is a critical step in the projection of future climate and in engineering design for disaster response. We show how the estimation of such quantities can be improved by fitting {the four-parameter kappa distribution for $r$-largest order statistics} (rK4D), which was developed in this study. The rK4D… ▽ More Accurate estimation of the T-year return levels of climate extremes using statistical distribution is a critical step in the projection of future climate and in engineering design for disaster response. We show how the estimation of such quantities can be improved by fitting {the four-parameter kappa distribution for $r$-largest order statistics} (rK4D), which was developed in this study. The rK4D is an extension of {the generalized extreme value distribution for $r$-largest order statistics} (rGEVD), similar to the four-parameter kappa distribution (K4D), which is an extension of the generalized extreme value distribution (GEVD). This new distribution (rK4D) can be useful not only for fitting data when three parameters in the GEVD are not sufficient to capture the variability of the extreme observations, but also in reducing the estimation uncertainty by making use of the r-largest extreme observations instead of only the block maxima. We derive a joint probability density function (PDF) of rK4D and the marginal and conditional cumulative distribution functions and PDFs. To estimate the parameters, the maximum likelihood estimation and the maximum penalized likelihood estimation methods were considered. The usefulness and practical effectiveness of the rK4D are illustrated by the Monte Carlo simulation and by an application to the Bangkok extreme rainfall data. A few new distributions for $r$-largest order statistics are also derived as special cases of the rK4D, such as the $r$-largest logistic, the $r$-largest generalized logistic, and the $r$-largest generalized Gumbel distributions. These distributions for $r$-largest order statistics would be useful in modeling extreme values for many research areas, including hydrology and climatology. △ Less

Submitted 5 December, 2024; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: In this revision, some modification and correction from the published one are made on sentences and formula with blue color

Journal ref: Weather and Climate Extremes, 39, 100533 (2023)

arXiv:2007.09726 [pdf, other]

doi 10.1007/s00477-018-1629-7

Integration of max-stable processes and Bayesian model averaging to predict extreme climatic events in multi-model ensembles

Authors: Yonggwan Shin, Youngsaeng Lee, Juntae Choi, Jeong-Soo Park

Abstract: Projections of changes in extreme climate are sometimes predicted by using multi-model ensemble methods such as Bayesian model averaging (BMA) embedded with the generalized extreme value (GEV) distribution. BMA is a popular method for combining the forecasts of individual simulation models by weighted averaging and characterizing the uncertainty induced by simulating the model structure. This meth… ▽ More Projections of changes in extreme climate are sometimes predicted by using multi-model ensemble methods such as Bayesian model averaging (BMA) embedded with the generalized extreme value (GEV) distribution. BMA is a popular method for combining the forecasts of individual simulation models by weighted averaging and characterizing the uncertainty induced by simulating the model structure. This method is referred to as the GEV-embedded BMA. It is, however, based on a point-wise analysis of extreme events, which means it overlooks the spatial dependency between nearby grid cells. Instead of a point-wise model, a spatial extreme model such as the max-stable process (MSP) is often employed to improve precision by considering spatial dependency. We propose an approach that integrates the MSP into BMA, which is referred to as the MSP-BMA herein. The superiority of the proposed method over the GEV-embedded BMA is demonstrated by using extreme rainfall intensity data on the Korean peninsula from Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-models. The reanalysis data called APHRODITE (Asian Precipitation Highly-Resolved Observational Data Integration Towards Evaluation, v1101) and 17 CMIP5 models are examined for 10 grid boxes in Korea. In this example, the MSP-BMA achieves a variance reduction over the GEV-embedded BMA. The bias inflation by MSP-BMA over the GEV-embedded BMA is also discussed. A by-product technical advantage of the MSP-BMA is that tedious `regridding' is not required before and after the analysis while it should be done for the GEV-embedded BMA. △ Less

Submitted 19 July, 2020; originally announced July 2020.

Journal ref: Stochastic Environmental Research and Risk Assessment 33, 2019, 47-57

arXiv:2007.08199 [pdf, other]

Learning from Noisy Labels with Deep Neural Networks: A Survey

Authors: Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee

Abstract: Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in mod… ▽ More Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we perform an in-depth analysis of noise rate estimation and summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies. All the contents will be available at https://github.com/songhwanjun/Awesome-Noisy-Labels. △ Less

Submitted 9 March, 2022; v1 submitted 16 July, 2020; originally announced July 2020.

Comments: Final version published in TNNLS Journal (2022 March)

arXiv:2007.07213 [pdf, other]

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance

Authors: Mark Ainsworth, Yeonjong Shin

Abstract: The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant th… ▽ More The ability of neural networks to provide `best in class' approximation across a wide range of applications is well-documented. Nevertheless, the powerful expressivity of neural networks comes to naught if one is unable to effectively train (choose) the parameters defining the network. In general, neural networks are trained by gradient descent type optimization methods, or a stochastic variant thereof. In practice, such methods result in the loss function decreases rapidly at the beginning of training but then, after a relatively small number of steps, significantly slow down. The loss may even appear to stagnate over the period of a large number of epochs, only to then suddenly start to decrease fast again for no apparent reason. This so-called plateau phenomenon manifests itself in many learning tasks. The present work aims to identify and quantify the root causes of plateau phenomenon. No assumptions are made on the number of neurons relative to the number of training data, and our results hold for both the lazy and adaptive regimes. The main findings are: plateaux correspond to periods during which activation patterns remain constant, where activation pattern refers to the number of data points that activate a given neuron; quantification of convergence of the gradient flow dynamics; and, characterization of stationary points in terms solutions of local least squares regression lines over subsets of the training data. Based on these conclusions, we propose a new iterative training method, the Active Neuron Least Squares (ANLS), characterised by the explicit adjustment of the activation pattern at each step, which is designed to enable a quick exit from a plateau. Illustrative numerical examples are included throughout. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.01458 [pdf, other]

Confidence-Aware Learning for Deep Neural Networks

Authors: Jooyoung Moon, Jihyo Kim, Younghak Shin, Sangheum Hwang

Abstract: Despite the power of deep neural networks for a wide range of tasks, an overconfident prediction issue has limited their practical use in many safety-critical applications. Many recent works have been proposed to mitigate this issue, but most of them require either additional computational costs in training and/or inference phases or customized architectures to output confidence estimates separate… ▽ More Despite the power of deep neural networks for a wide range of tasks, an overconfident prediction issue has limited their practical use in many safety-critical applications. Many recent works have been proposed to mitigate this issue, but most of them require either additional computational costs in training and/or inference phases or customized architectures to output confidence estimates separately. In this paper, we propose a method of training deep neural networks with a novel loss function, named Correctness Ranking Loss, which regularizes class probabilities explicitly to be better confidence estimates in terms of ordinal ranking according to confidence. The proposed method is easy to implement and can be applied to the existing architectures without any modification. Also, it has almost the same computational costs for training as conventional deep classifiers and outputs reliable predictions by a single inference. Extensive experimental results on classification benchmark datasets indicate that the proposed method helps networks to produce well-ranked confidence estimates. We also demonstrate that it is effective for the tasks closely related to confidence estimation, out-of-distribution detection and active learning. △ Less

Submitted 12 August, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: ICML 2020. The first two authors contributed equally

arXiv:2006.10555 [pdf, other]

doi 10.1016/j.jeconom.2020.08.008

Sparse HP Filter: Finding Kinks in the COVID-19 Contact Rate

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Abstract: In this paper, we estimate the time-varying COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate is constructed using data on actively infected, recovered and deceased cases. We propose a new trend filtering method that is a variant of the Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We term it the… ▽ More In this paper, we estimate the time-varying COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate is constructed using data on actively infected, recovered and deceased cases. We propose a new trend filtering method that is a variant of the Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We term it the $\textit{sparse HP filter}$ and apply it to daily data from five countries: Canada, China, South Korea, the UK and the US. Our new method yields the kinks that are well aligned with actual events in each country. We find that the sparse HP filter provides a fewer kinks than the $\ell_1$ trend filter, while both methods fitting data equally well. Theoretically, we establish risk consistency of both the sparse HP and $\ell_1$ trend filters. Ultimately, we propose to use time-varying $\textit{contact growth rates}$ to document and monitor outbreaks of COVID-19. △ Less

Submitted 29 July, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: 42 pages, 15 figures, 1 table

Journal ref: Journal of Econometrics, 220(1), 2021, pp. 158-180

arXiv:2003.03299 [pdf, other]

doi 10.1017/S0266466621000402

Complete Subset Averaging for Quantile Regressions

Authors: Ji Hyung Lee, Youngki Shin

Abstract: We propose a novel conditional quantile prediction method based on complete subset averaging (CSA) for quantile regressions. All models under consideration are potentially misspecified and the dimension of regressors goes to infinity as the sample size increases. Since we average over the complete subsets, the number of models is much larger than the usual model averaging method which adopts sophi… ▽ More We propose a novel conditional quantile prediction method based on complete subset averaging (CSA) for quantile regressions. All models under consideration are potentially misspecified and the dimension of regressors goes to infinity as the sample size increases. Since we average over the complete subsets, the number of models is much larger than the usual model averaging method which adopts sophisticated weighting schemes. We propose to use an equal weight but select the proper size of the complete subset based on the leave-one-out cross-validation method. Building upon the theory of Lu and Su (2015), we investigate the large sample properties of CSA and show the asymptotic optimality in the sense of Li (1987). We check the finite sample performance via Monte Carlo simulations and empirical applications. △ Less

Submitted 12 July, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: 46 pages, 3 figures, 9 tables

arXiv:1911.10979 [pdf, other]

doi 10.1109/TNNLS.2020.3045000

Simple yet Effective Way for Improving the Performance of GAN

Authors: Yong-Goo Shin, Yoon-Jae Yeo, Sung-Jea Ko

Abstract: In adversarial learning, discriminator often fails to guide the generator successfully since it distinguishes between real and generated images using silly or non-robust features. To alleviate this problem, this brief presents a simple but effective way that improves the performance of generative adversarial network (GAN) without imposing the training overhead or modifying the network architecture… ▽ More In adversarial learning, discriminator often fails to guide the generator successfully since it distinguishes between real and generated images using silly or non-robust features. To alleviate this problem, this brief presents a simple but effective way that improves the performance of generative adversarial network (GAN) without imposing the training overhead or modifying the network architectures of existing methods. The proposed method employs a novel cascading rejection (CR) module for discriminator, which extracts multiple non-overlapped features in an iterative manner using the vector rejection operation. Since the extracted diverse features prevent the discriminator from concentrating on non-meaningful features, the discriminator can guide the generator effectively to produce the images that are more similar to the real images. In addition, since the proposed CR module requires only a few simple vector operations, it can be readily applied to existing frameworks with marginal training overheads. Quantitative evaluations on various datasets including CIFAR-10, CelebA, CelebA-HQ, LSUN, and tiny-ImageNet confirm that the proposed method significantly improves the performance of GAN and conditional GAN in terms of Frechet inception distance (FID) indicating the diversity and visual appearance of the generated images. △ Less

Submitted 19 January, 2021; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted to IEEE transactions on neural networks and learning systems

arXiv:1910.05874 [pdf, other]

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

Authors: Yeonjong Shin

Abstract: Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this pap… ▽ More Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this paper, we study a layer-wise training using a block coordinate gradient descent (BCGD) for deep linear networks. We establish a general convergence analysis of BCGD and found the optimal learning rate, which results in the fastest decrease in the loss. More importantly, the optimal learning rate can directly be applied in practice, as it does not require any prior knowledge. Thus, tuning the learning rate is not needed at all. Also, we identify the effects of depth, width, and initialization in the training process. We show that when the orthogonal-like initialization is employed, the width of intermediate layers plays no role in gradient-based training, as long as the width is greater than or equal to both the input and output dimensions. We show that under some conditions, the deeper the network is, the faster the convergence is guaranteed. This implies that in an extreme case, the global optimum is achieved after updating each weight matrix only once. Besides, we found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered. Numerical examples are provided to justify our theoretical findings and demonstrate the performance of layer-wise training by BCGD. △ Less

Submitted 7 September, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

arXiv:1909.07105 [pdf]

doi 10.1109/TITS.2020.3031331

Incorporating dynamicity of transportation network with multi-weight traffic graph convolutional network for traffic forecasting

Authors: Yuyol Shin, Yoonjin Yoon

Abstract: Traffic forecasting problem remains a challenging task in the intelligent transportation system due to its spatio-temporal complexity. Although temporal dependency has been well studied and discussed, spatial dependency is relatively less explored due to its large variations, especially in the urban environment. In this study, a novel graph convolutional network model, Multi-Weight Traffic Graph C… ▽ More Traffic forecasting problem remains a challenging task in the intelligent transportation system due to its spatio-temporal complexity. Although temporal dependency has been well studied and discussed, spatial dependency is relatively less explored due to its large variations, especially in the urban environment. In this study, a novel graph convolutional network model, Multi-Weight Traffic Graph Convolutional (MW-TGC) network, is proposed and applied to two urban networks with contrasting geometric constraints. The model conducts graph convolution operations on speed data with multi-weighted adjacency matrices to combine the features, including speed limit, distance, and angle. The spatially isolated dimension reduction operation is conducted on the combined features to learn the dependencies among the features and reduce the size of the output to a computationally feasible level. The output of multi-weight graph convolution is applied to the sequence-to-sequence model with Long Short-Term Memory units to learn temporal dependencies. When applied to two urban sites, urban-core and urban-mix, MW-TGC network not only outperformed the comparative models in both sites but also reduced variance in the heterogeneous urban-mix network. We conclude that MW-TGC network can provide a robust traffic forecasting performance across the variations in spatial complexity, which can be a strong advantage in urban traffic forecasting. △ Less

Submitted 26 May, 2021; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: 11 pages, 7 figures, Accepted to IEEE Transactions on Intelligent Transportation Systems (2020)

MSC Class: 68T99

Journal ref: IEEE Trans. Intell. Transp. Syst., 0 (2020) 1-11

arXiv:1907.09696 [pdf, other]

doi 10.1615/.2020034126

Trainability of ReLU networks and Data-dependent Initialization

Authors: Yeonjong Shin, George Em Karniadakis

Abstract: In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network… ▽ More In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings. △ Less

Submitted 31 March, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1903.06733 [pdf, other]

doi 10.4208/cicp.OA-2020-0165

Dying ReLU and Initialization: Theory and Numerical Examples

Authors: Lu Lu, Yeonjong Shin, Yanhui Su, George Em Karniadakis

Abstract: The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed t… ▽ More The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure. △ Less

Submitted 21 October, 2020; v1 submitted 15 March, 2019; originally announced March 2019.

arXiv:1901.07375 [pdf]

Extension of Convolutional Neural Network with General Image Processing Kernels

Authors: Jay Hoon Jung, Yousun Shin, YoungMin Kwon

Abstract: We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general… ▽ More We applied pre-defined kernels also known as filters or masks developed for image processing to convolution neural network. Instead of letting neural networks find its own kernels, we used 41 different general-purpose kernels of blurring, edge detecting, sharpening, discrete cosine transformation, etc. for the first layer of the convolution neural networks. This architecture, thus named as general filter convolutional neural network (GFNN), can reduce training time by 30% with a better accuracy compared to the regular convolutional neural network (CNN). GFNN also can be trained to achieve 90% accuracy with only 500 samples. Furthermore, even though these kernels are not specialized for the MNIST dataset, we achieved 99.56% accuracy without ensemble nor any other special algorithms. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: 4 pages, 6 figures

Journal ref: TENCON 2018

arXiv:1811.08083 [pdf, other]

doi 10.1093/ectj/utaa033

Complete Subset Averaging with Many Instruments

Authors: Seojeong Lee, Youngki Shin

Abstract: We propose a two-stage least squares (2SLS) estimator whose first stage is the equal-weighted average over a complete subset with $k$ instruments among $K$ available, which we call the complete subset averaging (CSA) 2SLS. The approximate mean squared error (MSE) is derived as a function of the subset size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing the sample counte… ▽ More We propose a two-stage least squares (2SLS) estimator whose first stage is the equal-weighted average over a complete subset with $k$ instruments among $K$ available, which we call the complete subset averaging (CSA) 2SLS. The approximate mean squared error (MSE) is derived as a function of the subset size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing the sample counterpart of the approximate MSE. We show that this method achieves the asymptotic optimality among the class of estimators with different subset sizes. To deal with averaging over a growing set of irrelevant instruments, we generalize the approximate MSE to find that the optimal $k$ is larger than otherwise. An extensive simulation experiment shows that the CSA-2SLS estimator outperforms the alternative estimators when instruments are correlated. As an empirical illustration, we estimate the logistic demand function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate is better supported by economic theory than the alternative estimates. △ Less

Submitted 26 August, 2020; v1 submitted 20 November, 2018; originally announced November 2018.

Comments: 56 pages, 3 figures, 10 tables

Journal ref: Econometrics Journal, 24(2), 2021, pp. 290-314

arXiv:1809.00758 [pdf]

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights

Authors: Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, June-Woo Kim, Soo-Young Lee

Abstract: Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient cons… ▽ More Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks. △ Less

Submitted 2 October, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: IROS 2018 Workshop on Crossmodal Learning for Intelligent Robotics

MSC Class: 68T05

arXiv:1802.00912 [pdf, other]

doi 10.1016/j.media.2021.101997

Active, Continual Fine Tuning of Convolutional Neural Networks for Reducing Annotation Efforts

Authors: Zongwei Zhou, Jae Y. Shin, Suryakanth R. Gurudu, Michael B. Gotway, Jianming Liang

Abstract: The splendid success of convolutional neural networks (CNNs) in computer vision is largely attributable to the availability of massive annotated datasets, such as ImageNet and Places. However, in medical imaging, it is challenging to create such large annotated datasets, as annotating medical images is not only tedious, laborious, and time consuming, but it also demands costly, specialty-oriented… ▽ More The splendid success of convolutional neural networks (CNNs) in computer vision is largely attributable to the availability of massive annotated datasets, such as ImageNet and Places. However, in medical imaging, it is challenging to create such large annotated datasets, as annotating medical images is not only tedious, laborious, and time consuming, but it also demands costly, specialty-oriented skills, which are not easily accessible. To dramatically reduce annotation cost, this paper presents a novel method to naturally integrate active learning and transfer learning (fine-tuning) into a single framework, which starts directly with a pre-trained CNN to seek "worthy" samples for annotation and gradually enhances the (fine-tuned) CNN via continual fine-tuning. We have evaluated our method using three distinct medical imaging applications, demonstrating that it can reduce annotation efforts by at least half compared with random selection. △ Less

Submitted 10 April, 2021; v1 submitted 3 February, 2018; originally announced February 2018.

arXiv:1603.03141 [pdf]

Calibrar: an R package for fitting complex ecological models

Authors: Ricardo Oliveros-Ramos, Yunne-Jai Shin

Abstract: The fitting or parameter estimation of complex ecological models is a challenging optimisation task, with a notable lack of tools for fitting complex, long runtime or stochastic models. calibrar is an R package that is dedicated to the fitting of complex models to data. It is a generic tool that can be used for any type of model, especially those with non-differentiable objective functions and lon… ▽ More The fitting or parameter estimation of complex ecological models is a challenging optimisation task, with a notable lack of tools for fitting complex, long runtime or stochastic models. calibrar is an R package that is dedicated to the fitting of complex models to data. It is a generic tool that can be used for any type of model, especially those with non-differentiable objective functions and long runtime, including Individual Based Models. calibrar supports multiple phases and constrained optimisation, includes 18 optimisation algorithms, including derivative-based and heuristic ones. It supports any type of parallelization, the restart of interrupted optimisations for long runtime models and the combination of different optimisation methods during the multiple phases of the calibration. User-level expertise in R is necessary to handle calibration experiments with calibrar, but there is no need to modify the model's code, which can be programmed in any language. It implements maximum likelihood estimation methods and automated construction of the objective function from simulated model outputs. For more experienced users, calibrar allows the implementation of user-defined objective functions. The package source code is fully accessible and can be installed directly from CRAN. △ Less

Submitted 27 April, 2024; v1 submitted 9 March, 2016; originally announced March 2016.

Comments: 15 pages

arXiv:1603.00235 [pdf, other]

doi 10.1080/01621459.2017.1319840

Oracle Estimation of a Change Point in High Dimensional Quantile Regression

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Abstract: In this paper, we consider a high-dimensional quantile regression model where the sparsity structure may differ between two sub-populations. We develop $\ell_1$-penalized estimators of both regression coefficients and the threshold parameter. Our penalized estimators not only select covariates but also discriminate between a model with homogeneous sparsity and a model with a change point. As a res… ▽ More In this paper, we consider a high-dimensional quantile regression model where the sparsity structure may differ between two sub-populations. We develop $\ell_1$-penalized estimators of both regression coefficients and the threshold parameter. Our penalized estimators not only select covariates but also discriminate between a model with homogeneous sparsity and a model with a change point. As a result, it is not necessary to know or pretest whether the change point is present, or where it occurs. Our estimator of the change point achieves an oracle property in the sense that its asymptotic distribution is the same as if the unknown active sets of regression coefficients were known. Importantly, we establish this oracle property without a perfect covariate selection, thereby avoiding the need for the minimum level condition on the signals of active covariates. Dealing with high-dimensional quantile regression with an unknown change point calls for a new proof technique since the quantile loss function is non-smooth and furthermore the corresponding objective function is non-convex with respect to the change point. The technique developed in this paper is applicable to a general M-estimation framework with a change point, which may be of independent interest. The proposed methods are then illustrated via Monte Carlo experiments and an application to tipping in the dynamics of racial segregation. △ Less

Submitted 16 December, 2016; v1 submitted 1 March, 2016; originally announced March 2016.

Comments: 128 pages, 12 figures. A part of this paper was circulated under the title "Structural Change in Sparsity" arXiv:1411.3062

Journal ref: JASA 113 (2018) 1184-1194

arXiv:1509.06123 [pdf]

doi 10.1016/j.pocean.2017.01.002

A sequential approach to calibrate ecosystem models with multiple time series data

Authors: Ricardo Oliveros-Ramos, Philippe Verley, Yunne-Jai Shin

Abstract: Ecosystem approach to fisheries requires a thorough understanding of fishing impacts on ecosystem status and processes as well as predictive tools such as ecosystem models to provide useful information for management. The credibility of such models is essential when used as decision making tools, and model fitting to observed data is one major criterion to assess such credibility. However, more at… ▽ More Ecosystem approach to fisheries requires a thorough understanding of fishing impacts on ecosystem status and processes as well as predictive tools such as ecosystem models to provide useful information for management. The credibility of such models is essential when used as decision making tools, and model fitting to observed data is one major criterion to assess such credibility. However, more attention has been given to the exploration of model behavior than to a rigorous confrontation to observations, as the calibration of ecosystem models is challenging in many ways. First, ecosystem models can only be simulated numerically and are generally too complex for mathematical analysis and explicit parameter estimation; secondly, the complex dynamics represented in ecosystem models allow species-specific parameters to impact other species parameters through ecological interactions; thirdly, critical data about non-commercial species are often poor; lastly, technical aspects can be impediments to the calibration with regard to the high computational cost potentially involved and the scarce documentation published on fitting complex ecosystem models to data. This work highlights some issues related to the confrontation of complex ecosystem models to data and proposes a methodology for a sequential multi-phases calibration of ecosystem models. We propose criteria to classify the parameters of a model: model dependency and time variability of the parameters. These criteria and the availability of approximate initial estimates are used as decision rules to determine which parameters need to be estimated, and their precedence order in the sequential calibration process. The end-to-end ecosystem model ROMS-PISCES-OSMOSE applied to the Northern Humboldt Current Ecosystem is used as an illustrative case study. △ Less

Submitted 21 September, 2015; originally announced September 2015.

Comments: 33 pages, 4 tables, 13 figures, 2 appendices

arXiv:1411.3062 [pdf, ps, other]

Structural Change in Sparsity

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Abstract: In the high-dimensional sparse modeling literature, it has been crucially assumed that the sparsity structure of the model is homogeneous over the entire population. That is, the identities of important regressors are invariant across the population and across the individuals in the collected sample. In practice, however, the sparsity structure may not always be invariant in the population, due to… ▽ More In the high-dimensional sparse modeling literature, it has been crucially assumed that the sparsity structure of the model is homogeneous over the entire population. That is, the identities of important regressors are invariant across the population and across the individuals in the collected sample. In practice, however, the sparsity structure may not always be invariant in the population, due to heterogeneity across different sub-populations. We consider a general, possibly non-smooth M-estimation framework, allowing a possible structural change regarding the identities of important regressors in the population. Our penalized M-estimator not only selects covariates but also discriminates between a model with homogeneous sparsity and a model with a structural change in sparsity. As a result, it is not necessary to know or pretest whether the structural change is present, or where it occurs. We derive asymptotic bounds on the estimation loss of the penalized M-estimators, and achieve the oracle properties. We also show that when there is a structural change, the estimator of the threshold parameter is super-consistent. If the signal is relatively strong, the rates of convergence can be further improved and asymptotic distributional properties of the estimators including the threshold estimator can be established using an adaptive penalization. The proposed methods are then applied to quantile regression and logistic regression models and are illustrated via Monte Carlo experiments. △ Less

Submitted 19 November, 2014; v1 submitted 11 November, 2014; originally announced November 2014.

Comments: 65 pages

Showing 1–35 of 35 results for author: Shin, Y