Skip to main content

Showing 1–50 of 61 results for author: Cho, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.15617  [pdf

    stat.AP stat.OT

    Spatiotemporal Assessment of Aircraft Noise Exposure Using Mobile Phone-Derived Population Estimates and High-Resolution Noise Measurements

    Authors: Soohwan Oh, Hyunsoo Cho, Jungwoo Cho

    Abstract: Aircraft noise exposure has traditionally been assessed using static residential population data and long-term average noise metrics, often overlooking the dynamic nature of human mobility and temporal variations in operational conditions. This study proposes a data-driven framework that integrates high-resolution noise measurements from airport monitoring terminals with mobile phone-derived de fa… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  2. arXiv:2412.19517  [pdf, other

    cs.LG cs.AI math.NA q-bio.PE stat.ML

    Estimation of System Parameters Including Repeated Cross-Sectional Data through Emulator-Informed Deep Generative Model

    Authors: Hyunwoo Cho, Sung Woong Cho, Hyeontae Jo, Hyung Ju Hwang

    Abstract: Differential equations (DEs) are crucial for modeling the evolution of natural or engineered systems. Traditionally, the parameters in DEs are adjusted to fit data from system observations. However, in fields such as politics, economics, and biology, available data are often independently collected at distinct time points from different subjects (i.e., repeated cross-sectional (RCS) data). Convent… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    MSC Class: 62F30; 65Z05; 68T09 ACM Class: G.1.7; I.2.m; J.2

  3. arXiv:2411.04281  [pdf, other

    cs.LG cs.AI stat.ML

    Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking

    Authors: Xingran Chen, Zhenke Wu, Xu Shi, Hyunghoon Cho, Bhramar Mukherjee

    Abstract: We conduct a scoping review of existing approaches for synthetic EHR data generation, and benchmark major methods with proposed open-source software to offer recommendations for practitioners. We search three academic databases for our scoping review. Methods are benchmarked on open-source EHR datasets, MIMIC-III/IV. Seven existing methods covering major categories and two baseline methods are imp… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  4. arXiv:2410.22488  [pdf, other

    stat.ML cs.AI cs.CR cs.LG

    Privacy-Preserving Dynamic Assortment Selection

    Authors: Young Hyun Cho, Will Wei Sun

    Abstract: With the growing demand for personalized assortment recommendations, concerns over data privacy have intensified, highlighting the urgent need for effective privacy-preserving strategies. This paper presents a novel framework for privacy-preserving dynamic assortment selection using the multinomial logit (MNL) bandits model. Our approach employs a perturbed upper confidence bound method, integrati… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  5. arXiv:2410.17468  [pdf, other

    cs.CR stat.AP

    Formal Privacy Guarantees with Invariant Statistics

    Authors: Young Hyun Cho, Jordan Awan

    Abstract: Motivated by the 2020 US Census products, this paper extends differential privacy (DP) to address the joint release of DP outputs and nonprivate statistics, referred to as invariant. Our framework, Semi-DP, redefines adjacency by focusing on datasets that conform to the given invariant, ensuring indistinguishability between adjacent datasets within invariant-conforming datasets. We further develop… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  6. arXiv:2410.02918  [pdf, other

    stat.ME

    Moving sum procedure for multiple change point detection in large factor models

    Authors: Matteo Barigozzi, Haeran Cho, Lorenzo Trapani

    Abstract: The paper proposes a moving sum methodology for detecting multiple change points in high-dimensional time series under a factor model, where changes are attributed to those in loadings as well as emergence or disappearance of factors. We establish the asymptotic null distribution of the proposed test for family-wise error control, and show the consistency of the procedure for multiple change point… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  7. arXiv:2407.09390  [pdf, other

    stat.ME

    Tail-robust factor modelling of vector and tensor time series in high dimensions

    Authors: Matteo Barigozzi, Haeran Cho, Hyeyoung Maeng

    Abstract: We study the problem of factor modelling vector- and tensor-valued time series in the presence of heavy tails in the data, which produce anomalous observations with non-negligible probability. For this, we propose to combine a two-step procedure for tensor data decomposition with data truncation, which is easy to implement and does not require an iterative search for a numerical solution. Departin… ▽ More

    Submitted 26 February, 2025; v1 submitted 12 July, 2024; originally announced July 2024.

  8. arXiv:2405.05459  [pdf, other

    stat.ME math.ST

    Estimation and Inference for Change Points in Functional Regression Time Series

    Authors: Shivam Kumar, Haotian Xu, Haeran Cho, Daren Wang

    Abstract: In this paper, we study the estimation and inference of change points under a functional linear regression model with changes in the slope function. We present a novel Functional Regression Binary Segmentation (FRBS) algorithm which is computationally efficient as well as achieving consistency in multiple change point detection. This algorithm utilizes the predictive power of piece-wise constant f… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  9. arXiv:2404.17734  [pdf, other

    stat.ME stat.AP

    Manipulating a Continuous Instrumental Variable in an Observational Study of Premature Babies: Algorithm, Partial Identification Bounds, and Inference under Randomization and Biased Randomization Assumptions

    Authors: Zhe Chen, Min Haeng Cho, Bo Zhang

    Abstract: Regionalization of intensive care for premature babies refers to a triage system of mothers with high-risk pregnancies to hospitals of varied capabilities based on risks faced by infants. Due to the limited capacity of high-level hospitals, which are equipped with advanced expertise to provide critical care, understanding the effect of delivering premature babies at such hospitals on infant mortal… ▽ More

    Submitted 27 September, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  10. Interval-censored linear quantile regression

    Authors: Taehwa Choi, Seohyeon Park, Hunyong Cho, Sangbum Choi

    Abstract: Censored quantile regression has emerged as a prominent alternative to classical Cox's proportional hazards model or accelerated failure time model in both theoretical and applied statistics. While quantile regression has been extensively studied for right-censored survival data, methodologies for analyzing interval-censored data remain limited in the survival analysis literature. This paper intro… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: under revision

  11. arXiv:2402.06915  [pdf, other

    stat.ME math.ST

    Detection and inference of changes in high-dimensional linear regression with non-sparse structures

    Authors: Haeran Cho, Tobias Kley, Housen Li

    Abstract: For data segmentation in high-dimensional linear regression settings, the regression parameters are often assumed to be sparse segment-wise, which enables many existing methods to estimate the parameters locally via $\ell_1$-regularised maximum likelihood-type estimation and then contrast them for change point detection. Contrary to this common practice, we show that the exact sparsity of neither… ▽ More

    Submitted 13 May, 2025; v1 submitted 10 February, 2024; originally announced February 2024.

  12. arXiv:2310.18593  [pdf, other

    stat.ML cs.CY cs.LG

    Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

    Authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun

    Abstract: Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practica… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 42 pages, 5 figures, 4 tables. Accepted to the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  13. arXiv:2305.07581  [pdf, other

    stat.ME

    Nonparametric data segmentation in multivariate time series via joint characteristic functions

    Authors: Euan T. McGonigle, Haeran Cho

    Abstract: Modern time series data often exhibit complex dependence and structural changes which are not easily characterised by shifts in the mean or model parameters. We propose a nonparametric data segmentation methodology for multivariate time series termed NP-MOJO. By considering joint characteristic functions between the time series and its lagged values, NP-MOJO is able to detect change points in the… ▽ More

    Submitted 6 March, 2025; v1 submitted 12 May, 2023; originally announced May 2023.

  14. arXiv:2301.11675  [pdf, other

    stat.CO

    fnets: An R Package for Network Estimation and Forecasting via Factor-Adjusted VAR Modelling

    Authors: Dom Owens, Haeran Cho, Matteo Barigozzi

    Abstract: The package fnets for the R language implements the suite of methodologies proposed by Barigozzi et al. (2022) for the network estimation and forecasting of high-dimensional time series under a factor-adjusted vector autoregressive model, which permits strong spatial and temporal correlations in the data. Additionally, we provide tools for visualising the networks underlying the time series data a… ▽ More

    Submitted 4 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    MSC Class: 62-04

  15. arXiv:2210.05995  [pdf, other

    math.OC stat.ML

    SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

    Authors: Hanseul Cho, Chulhee Yun

    Abstract: Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monot… ▽ More

    Submitted 20 February, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 camera-ready version; 46 pages

  16. arXiv:2209.08892  [pdf, other

    stat.ME

    High-dimensional data segmentation in regression settings permitting temporal dependence and non-Gaussianity

    Authors: Haeran Cho, Dom Owens

    Abstract: We propose a data segmentation methodology for the high-dimensional linear regression problem where regression parameters are allowed to undergo multiple changes. The proposed methodology, MOSEG, proceeds in two stages: first, the data are scanned for multiple change points using a moving window-based procedure, which is followed by a location refinement stage. MOSEG enjoys computational efficienc… ▽ More

    Submitted 31 October, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

  17. arXiv:2208.08150  [pdf, other

    stat.AP

    Capturing usage patterns in bike sharing system via multilayer network fused Lasso

    Authors: Yunjin Choi, Haeran Cho, Hyelim Son

    Abstract: Data collected from a bike-sharing system exhibit complex temporal and spatial features. We analyze shared-bike usage data collected in three large cities at the level of individual stations, accounting for station-specific behavior and covariate effects. For this, we adopt a penalized regression approach with a multilayer network fused Lasso penalty. These fusion penalties are imposed on networks… ▽ More

    Submitted 25 August, 2024; v1 submitted 17 August, 2022; originally announced August 2022.

  18. arXiv:2208.04900  [pdf, other

    stat.ME

    Moving sum procedure for change point detection under piecewise linearity

    Authors: Joonpyo Kim, Hee-Seok Oh, Haeran Cho

    Abstract: We propose a computationally and statistically efficient procedure for segmenting univariate data under piecewise linearity. The proposed moving sum (MOSUM) methodology detects multiple change points where the underlying signal undergoes discontinuous jumps and/or slope changes. Theoretically, it controls the family-wise error rate at a given significance level asymptotically and achieves consiste… ▽ More

    Submitted 24 August, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

  19. arXiv:2207.13423  [pdf, other

    cs.CV cs.AI stat.ML

    Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

    Authors: Yooshin Cho, Youngsoo Kim, Hanbyel Cho, Jaesung Ahn, Hyeong Gwon Hong, Junmo Kim

    Abstract: Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the atte… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: ICIP 2022

  20. Robust multiscale estimation of time-average variance for time series segmentation

    Authors: Euan T. McGonigle, Haeran Cho

    Abstract: There exist several methods developed for the canonical change point problem of detecting multiple mean shifts, which search for changes over sections of the data at multiple scales. In such methods, estimation of the noise level is often required in order to distinguish genuine changes from random fluctuations due to the noise. When serial dependence is present, using a single estimator of the no… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Journal ref: Computational Statistics & Data Analysis 179 (2023), 107648

  21. arXiv:2204.02724  [pdf, other

    stat.ME

    High-dimensional time series segmentation via factor-adjusted vector autoregressive modelling

    Authors: Haeran Cho, Hyeyoung Maeng, Idris A. Eckley, Paul Fearnhead

    Abstract: Vector autoregressive (VAR) models are popularly adopted for modelling high-dimensional time series, and their piecewise extensions allow for structural changes in the data. In VAR modelling, the number of parameters grow quadratically with the dimensionality which necessitates the sparsity assumption in high dimensions. However, it is debatable whether such an assumption is adequate for handling… ▽ More

    Submitted 20 January, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

  22. FNETS: Factor-adjusted network estimation and forecasting for high-dimensional time series

    Authors: Matteo Barigozzi, Haeran Cho, Dom Owens

    Abstract: We propose FNETS, a methodology for network estimation and forecasting of high-dimensional time series exhibiting strong serial- and cross-sectional correlations. We operate under a factor-adjusted vector autoregressive (VAR) model which, after accounting for pervasive co-movements of the variables by {\it common} factors, models the remaining {\it idiosyncratic} dynamic dependence between the var… ▽ More

    Submitted 4 March, 2025; v1 submitted 16 January, 2022; originally announced January 2022.

  23. arXiv:2108.10629  [pdf, other

    cs.CV cs.AI stat.ML

    Improving Generalization of Batch Whitening by Convolutional Unit Optimization

    Authors: Yooshin Cho, Hanbyel Cho, Youngsoo Kim, Junmo Kim

    Abstract: Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linear correlation between channels (Decorrelation). In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function. F… ▽ More

    Submitted 2 November, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  24. arXiv:2106.12844  [pdf, other

    stat.ME

    Bootstrap confidence intervals for multiple change points based on moving sum procedures

    Authors: Haeran Cho, Claudia Kirch

    Abstract: The problem of quantifying uncertainty about the locations of multiple change points by means of confidence intervals is addressed. The asymptotic distribution of the change point estimators obtained as the local maximisers of moving sum statistics is derived, where the limit distributions differ depending on whether the corresponding size of changes is local, i.e. tends to zero as the sample size… ▽ More

    Submitted 16 June, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

  25. arXiv:2103.01097  [pdf, other

    stat.ME stat.AP

    Tangent functional canonical correlation analysis for densities and shapes, with applications to multimodal imaging data

    Authors: Min Ho Cho, Sebastian Kurtek, Karthik Bharath

    Abstract: It is quite common for functional data arising from imaging data to assume values in infinite-dimensional manifolds. Uncovering associations between two or more such nonlinear functional data extracted from the same object across medical imaging modalities can assist development of personalized treatment strategies. We propose a method for canonical correlation analysis between paired probability… ▽ More

    Submitted 24 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  26. arXiv:2102.01194  [pdf, ps, other

    stat.ML cs.CY cs.LG

    A Statistician Teaches Deep Learning

    Authors: G. Jogesh Babu, David Banks, Hyunsoon Cho, David Han, Hailin Sang, Shouyi Wang

    Abstract: Deep learning (DL) has gained much attention and become increasingly popular in modern data science. Computer scientists led the way in developing deep learning techniques, so the ideas and perspectives can seem alien to statisticians. Nonetheless, it is important that statisticians become involved -- many of our students need this expertise for their careers. In this paper, developed as part of a… ▽ More

    Submitted 3 February, 2021; v1 submitted 28 January, 2021; originally announced February 2021.

    Comments: 19 pages, accepted by Journal of Statistical Theory and Practice

  27. arXiv:2101.01441  [pdf, other

    cs.LG stat.ML

    Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

    Authors: Hyeongmin Cho, Sangkyun Lee

    Abstract: Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning pr… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

  28. arXiv:2012.12814  [pdf, other

    stat.ME

    Data segmentation algorithms: Univariate mean change and beyond

    Authors: Haeran Cho, Claudia Kirch

    Abstract: Data segmentation a.k.a. multiple change point analysis has received considerable attention due to its importance in time series analysis and signal processing, with applications in a variety of fields including natural and social sciences, medicine, engineering and finance. In the first part of this survey, we review the existing literature on the canonical data segmentation problem which aims… ▽ More

    Submitted 8 July, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  29. arXiv:2012.03294  [pdf, ps, other

    stat.ME

    Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring

    Authors: Hunyong Cho, Shannon T. Holloway, David J. Couper, Michael R. Kosorok

    Abstract: We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the surviva… ▽ More

    Submitted 12 May, 2022; v1 submitted 6 December, 2020; originally announced December 2020.

  30. arXiv:2011.13884  [pdf, other

    stat.ME

    Multiple change point detection under serial dependence: Wild contrast maximisation and gappy Schwarz algorithm

    Authors: Haeran Cho, Piotr Fryzlewicz

    Abstract: We propose a methodology for detecting multiple change points in the mean of an otherwise stationary, autocorrelated, linear time series. It combines solution path generation based on the wild contrast maximisation principle, and an information criterion-based model selection strategy termed gappy Schwarz algorithm. The former is well-suited to separating shifts in the mean from fluctuations due t… ▽ More

    Submitted 12 April, 2023; v1 submitted 27 November, 2020; originally announced November 2020.

  31. arXiv:2010.01051  [pdf, other

    cs.LG stat.ML

    Neural Bootstrapper

    Authors: Minsuk Shin, Hyungjoo Cho, Hyun-seok Min, Sungbin Lim

    Abstract: Bootstrapping has been a primary tool for ensemble and uncertainty quantification in machine learning and statistics. However, due to its nature of multiple training and resampling, bootstrapping deep neural networks is computationally burdensome; hence it has difficulties in practical application to the uncertainty estimation and related tasks. To overcome this computational bottleneck, we propos… ▽ More

    Submitted 13 December, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: 19 pages, 13 figures. Accepted for NeurIPS 2021. Corresponding Author: Sungbin Lim

  32. arXiv:2006.14273  [pdf, ps, other

    stat.ME

    Discussion of 'Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection'

    Authors: Haeran Cho, Claudia Kirch

    Abstract: We discuss the theoretical guarantee provided by the WBS2.SDLL proposed in Fryzlewicz (2020) and explore an alternative, MOSUM-based candidate generation method for the SDLL.

    Submitted 2 July, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

  33. arXiv:2006.04920  [pdf, other

    cs.LG stat.ML

    Survival regression with accelerated failure time model in XGBoost

    Authors: Avinash Barnwal, Hyunsu Cho, Toby Dylan Hocking

    Abstract: Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However,… ▽ More

    Submitted 21 August, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

  34. arXiv:2005.13438  [pdf

    q-bio.BM cs.LG stat.ML

    InteractionNet: Modeling and Explaining of Noncovalent Protein-Ligand Interactions with Noncovalent Graph Neural Network and Layer-Wise Relevance Propagation

    Authors: Hyeoncheol Cho, Eok Kyun Lee, Insung S. Choi

    Abstract: Expanding the scope of graph-based, deep-learning models to noncovalent protein-ligand interactions has earned increasing attention in structure-based drug design. Modeling the protein-ligand interactions with graph neural networks (GNNs) has experienced difficulties in the conversion of protein-ligand complex structures into the graph representation and left questions regarding whether the traine… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  35. arXiv:2001.01401  [pdf, other

    cs.LG cs.SD stat.ML

    Mel-spectrogram augmentation for sequence to sequence voice conversion

    Authors: Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, Seong-Whan Lee

    Abstract: For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies… ▽ More

    Submitted 15 June, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Comments: 5pages, 1 figures, 8 tables

  36. arXiv:1912.09983  [pdf, other

    stat.ME

    Interval censored recursive forests

    Authors: Hunyong Cho, Nicholas P. Jewell, Michael R. Kosorok

    Abstract: We propose the interval censored recursive forests (ICRF) which is an iterative tree ensemble method for interval censored survival data. This nonparametric regression estimator makes the best use of censored information by iteratively updating the survival estimate, and can be viewed as a self-consistent estimator with convergence monitored using out-of-bag samples. Splitting rules optimized for… ▽ More

    Submitted 20 May, 2021; v1 submitted 20 December, 2019; originally announced December 2019.

  37. arXiv:1910.12486  [pdf, other

    stat.ME math.ST

    Two-stage data segmentation permitting multiscale change points, heavy tails and dependence

    Authors: Haeran Cho, Claudia Kirch

    Abstract: The segmentation of a time series into piecewise stationary segments, a.k.a. multiple change point analysis, is an important problem both in time series analysis and signal processing. In the presence of multiscale change points with both large jumps over short intervals and small changes over long stationary intervals, multiscale methods achieve good adaptivity in their localisation but at the sa… ▽ More

    Submitted 3 July, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

    MSC Class: 62M10; 62G05

  38. arXiv:1907.10834  [pdf, other

    cs.LG eess.IV math.NA stat.ML

    Framelet Pooling Aided Deep Learning Network : The Method to Process High Dimensional Medical Data

    Authors: Chang Min Hyun, Kang Cheol Kim, Hyun Cheol Cho, Jae Kyu Choi, Jin Keun Seo

    Abstract: Machine learning-based analysis of medical images often faces several hurdles, such as the lack of training data, the curse of dimensionality problem, and the generalization issues. One of the main difficulties is that there exists computational cost problem in dealing with input data of large size matrices which represent medical images. The purpose of this paper is to introduce a framelet-poolin… ▽ More

    Submitted 25 July, 2019; originally announced July 2019.

  39. arXiv:1906.05956  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Scalable Neural Architecture Search for 3D Medical Image Segmentation

    Authors: Sungwoong Kim, Ildoo Kim, Sungbin Lim, Woonhyuk Baek, Chiheon Kim, Hyungjoo Cho, Boogeon Yoon, Taesup Kim

    Abstract: In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: 9 pages, 3 figures

  40. arXiv:1905.09680  [pdf, other

    cs.LG cs.DC stat.ML

    DEEP-BO for Hyperparameter Optimization of Deep Networks

    Authors: Hyunghun Cho, Yongjin Kim, Eunjung Lee, Daeyoung Choi, Yongjae Lee, Wonjong Rhee

    Abstract: The performance of deep neural networks (DNN) is very sensitive to the particular choice of hyper-parameters. To make it worse, the shape of the learning curve can be significantly affected when a technique like batchnorm is used. As a result, hyperparameter optimization of deep networks can be much more challenging than traditional machine learning models. In this work, we start from well known B… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

    Comments: 26 pages, NeurIPS19 under review

  41. arXiv:1904.10522  [pdf, other

    cs.LG cs.IR stat.ML

    Block-distributed Gradient Boosted Trees

    Authors: Theodore Vasiloudis, Hyunsu Cho, Henrik Boström

    Abstract: The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the nu… ▽ More

    Submitted 28 May, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

    Comments: SIGIR 2019

  42. arXiv:1901.07593  [pdf, other

    stat.ML cs.CV cs.LG

    Aggregated Pairwise Classification of Statistical Shapes

    Authors: Min Ho Cho, Sebastian Kurtek, Steven N. MacEachern

    Abstract: The classification of shapes is of great interest in diverse areas ranging from medical imaging to computer vision and beyond. While many statistical frameworks have been developed for the classification problem, most are strongly tied to early formulations of the problem - with an object to be classified described as a vector in a relatively low-dimensional Euclidean space. Statistical shape data… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  43. Three-Dimensionally Embedded Graph Convolutional Network (3DGCN) for Molecule Interpretation

    Authors: Hyeoncheol Cho, Insung S. Choi

    Abstract: We present a three-dimensional graph convolutional network (3DGCN), which predicts molecular properties and biochemical activities, based on 3D molecular graph. In the 3DGCN, graph convolution is unified with learning operations on the vector to handle the spatial information from molecular topology. The 3DGCN model exhibits significantly higher performance on various tasks compared with other dee… ▽ More

    Submitted 16 April, 2019; v1 submitted 24 November, 2018; originally announced November 2018.

    Comments: 39 pages, 14 figures, 5 tables

    Journal ref: ChemMedChem, 2019

  44. Consistent estimation of high-dimensional factor models when the factor number is over-estimated

    Authors: Matteo Barigozzi, Haeran Cho

    Abstract: A high-dimensional $r$-factor model for an $n$-dimensional vector time series is characterised by the presence of a large eigengap (increasing with $n$) between the $r$-th and the $(r+1)$-th largest eigenvalues of the covariance matrix. Consequently, Principal Component (PC) analysis is the most popular estimation method for factor models and its consistency, when $r$ is correctly estimated, is we… ▽ More

    Submitted 6 July, 2020; v1 submitted 1 November, 2018; originally announced November 2018.

  45. arXiv:1809.03721  [pdf, other

    cs.LG cs.CV cs.NE eess.SP stat.ML

    Deep Asymmetric Networks with a Set of Node-wise Variant Activation Functions

    Authors: Jinhyeok Jang, Hyunjoong Cho, Jaehong Kim, Jaeyeon Lee, Seungjoon Yang

    Abstract: This work presents deep asymmetric networks with a set of node-wise variant activation functions. The nodes' sensitivities are affected by activation function selections such that the nodes with smaller indices become increasingly more sensitive. As a result, features learned by the nodes are sorted by the node indices in the order of their importance. Asymmetric networks not only learn input feat… ▽ More

    Submitted 17 May, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

  46. arXiv:1806.00437  [pdf, other

    cs.LG stat.ML

    Large-Margin Classification in Hyperbolic Space

    Authors: Hyunghoon Cho, Benjamin DeMeo, Jian Peng, Bonnie Berger

    Abstract: Representing data in hyperbolic space can effectively capture latent hierarchical relationships. With the goal of enabling accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and elucidate through new theoretical work its connection to the Euclidean counterpart… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  47. arXiv:1804.05882  [pdf, other

    stat.AP

    Confidence intervals for the area under the receiver operating characteristic curve in the presence of ignorable missing data

    Authors: Hunyong Cho, Gregory J. Matthews, Ofer Harel

    Abstract: Receiver operating characteristic (ROC) curves are widely used as a measure of accuracy of diagnostic tests and can be summarized using the area under the ROC curve (AUC). Often, it is useful to construct a confidence intervals for the AUC, however, since there are a number of different proposed methods to measure variance of the AUC, there are thus many different resulting methods for constructin… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: 32 pages

  48. arXiv:1803.06249  [pdf, ps, other

    cs.DL physics.soc-ph stat.AP

    Link prediction for interdisciplinary collaboration via co-authorship network

    Authors: Haeran Cho, Yi Yu

    Abstract: We analyse the Publication and Research (PURE) data set of University of Bristol collected between $2008$ and $2013$. Using the existing co-authorship network and academic information thereof, we propose a new link prediction methodology, with the specific aim of identifying potential interdisciplinary collaboration in a university-wide collaboration network.

    Submitted 16 March, 2018; originally announced March 2018.

  49. arXiv:1706.01155  [pdf, other

    stat.ME

    High-dimensional GARCH process segmentation with an application to Value-at-Risk

    Authors: Haeran Cho, Karolos Korkas

    Abstract: Models for financial risk often assume that underlying asset returns are stationary. However, there is strong evidence that multivariate financial time series entail changes not only in their within-series dependence structure, but also in the cross-sectional dependence among them. In particular, the stressed Value-at-Risk of a portfolio, a popularly adopted measure of market risk, cannot be gauge… ▽ More

    Submitted 2 March, 2021; v1 submitted 4 June, 2017; originally announced June 2017.

  50. Simultaneous multiple change-point and factor analysis for high-dimensional time series

    Authors: Matteo Barigozzi, Haeran Cho, Piotr Fryzlewicz

    Abstract: We propose the first comprehensive treatment of high-dimensional time series factor models with multiple change-points in their second-order structure. We operate under the most flexible definition of piecewise stationarity, and estimate the number and locations of change-points consistently as well as identifying whether they originate in the common or idiosyncratic components. Through the use of… ▽ More

    Submitted 29 May, 2018; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: 64 pages, to appear in the Journal of Econometrics