Skip to main content

Showing 1–49 of 49 results for author: Kang, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2507.00480  [pdf, ps, other

    cs.LG stat.ML

    Posterior Inference in Latent Space for Scalable Constrained Black-box Optimization

    Authors: Kiyoung Om, Kyuil Sim, Taeyoung Yun, Hyeongyu Kang, Jinkyoo Park

    Abstract: Optimizing high-dimensional black-box functions under black-box constraints is a pervasive task in a wide range of scientific and engineering problems. These problems are typically harder than unconstrained problems due to hard-to-find feasible regions. While Bayesian optimization (BO) methods have been developed to solve such problems, they often struggle with the curse of dimensionality. Recentl… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 25 pages, 11 figures, 5 tables. Equal contribution by Kiyoung Om, Kyuil Sim, and Taeyoung Yun

  2. arXiv:2505.07067  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves theory for hierarchically compositional data with power-law distributed features

    Authors: Francesco Cagnetta, Hyunmo Kang, Matthieu Wyart

    Abstract: Recent theories suggest that Neural Scaling Laws arise whenever the task is linearly decomposed into power-law distributed units. Alternatively, scaling laws also emerge when data exhibit a hierarchically compositional structure, as is thought to occur in language and images. To unify these views, we consider classification and next-token prediction tasks based on probabilistic context-free gramma… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  3. arXiv:2504.19419  [pdf, other

    cs.LG stat.ML

    Graph-based Semi-supervised and Unsupervised Methods for Local Clustering

    Authors: Zhaiming Shen, Sung Ha Kang

    Abstract: Local clustering aims to identify specific substructures within a large graph without requiring full knowledge of the entire graph. These substructures are typically small compared to the overall graph, enabling the problem to be approached by finding a sparse solution to a linear system associated with the graph Laplacian. In this work, we first propose a method for identifying specific local clu… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  4. arXiv:2411.01100  [pdf, other

    stat.AP stat.ME

    Transfer Learning Between U.S. Presidential Elections: How Should We Learn From A 2020 Ad Campaign To Inform 2024 Ad Campaigns?

    Authors: Xinran Miao, Jiwei Zhao, Hyunseung Kang

    Abstract: For the 2024 U.S. presidential election, would negative, digital ads against Donald Trump impact voter turnout in Pennsylvania (PA), a key "tipping point'' state? The gold standard to address this question, a randomized experiment where voters get randomized to different ads, yields unbiased estimates of the ad effect, but is very expensive. Instead, we propose a less-than-ideal, but significantly… ▽ More

    Submitted 12 March, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

  5. arXiv:2410.04359  [pdf, other

    stat.ME

    Efficient estimation of semiparametric spatial point processes with V-fold random thinning

    Authors: Xindi Lin, Hyunseung Kang

    Abstract: We study a broad class of models called semiparametric spatial point processes where the intensity function contains both a parametric component and a nonparametric component. We propose a novel estimator of the parametric component based on random thinning, a common sampling technique in point processes. The proposed estimator of the parametric component is shown to be consistent and asymptotical… ▽ More

    Submitted 17 April, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

  6. arXiv:2408.07372  [pdf, other

    stat.ML cs.LG stat.CO

    An Adaptive Importance Sampling for Locally Stable Point Processes

    Authors: Hee-Geon Kang, Sunggon Kim

    Abstract: The problem of finding the expected value of a statistic of a locally stable point process in a bounded region is addressed. We propose an adaptive importance sampling for solving the problem. In our proposal, we restrict the importance point process to the family of homogeneous Poisson point processes, which enables us to generate quickly independent samples of the importance point process. The o… ▽ More

    Submitted 1 March, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2407.19558  [pdf, other

    stat.ME math.ST stat.AP

    Identification and Inference with Invalid Instruments

    Authors: Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small

    Abstract: Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that is (A1) associated with the exposure, (A2) has no direct effect on the outcome except through the exposure, and (A3) is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditio… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 23 pages, 1 figure

  8. arXiv:2403.14152  [pdf, ps, other

    stat.ME math.ST

    Generalized Rosenbaum Bounds Sensitivity Analysis for Matched Observational Studies with Treatment Doses: Sufficiency, Consistency, and Efficiency

    Authors: Siyu Heng, Hyunseung Kang

    Abstract: In matched observational studies with binary treatments, the Rosenbaum bounds framework is arguably the most widely used sensitivity analysis framework for assessing sensitivity to unobserved covariates. Unlike the binary treatment case, although widely needed in practice, sensitivity analysis for matched observational studies with treatment doses (i.e., non-binary treatments such as ordinal treat… ▽ More

    Submitted 23 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2402.00307  [pdf, ps, other

    stat.ME

    A More Robust Approach to Multivariable Mendelian Randomization

    Authors: Yinxiang Wu, Hyunseung Kang, Ting Ye

    Abstract: Multivariable Mendelian randomization (MVMR) uses genetic variants as instrumental variables to infer the direct effects of multiple exposures on an outcome. However, unlike univariable Mendelian randomization, MVMR often faces greater challenges with many weak instruments, which can lead to bias not necessarily toward zero and inflation of type I errors. In this work, we introduce a new asymptoti… ▽ More

    Submitted 12 June, 2025; v1 submitted 31 January, 2024; originally announced February 2024.

  10. arXiv:2401.00634  [pdf, other

    stat.ME stat.AP

    A scalable two-stage Bayesian approach accounting for exposure measurement error in environmental epidemiology

    Authors: Changwoo J. Lee, Elaine Symanski, Amal Rammah, Dong Hun Kang, Philip K. Hopke, Eun Sug Park

    Abstract: Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as… ▽ More

    Submitted 13 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 34 pages, 8 figures

  11. arXiv:2310.11479  [pdf, other

    cs.LG stat.ML

    On the Temperature of Bayesian Graph Neural Networks for Conformal Prediction

    Authors: Seohyeon Cha, Honggu Kang, Joonhyuk Kang

    Abstract: Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a… ▽ More

    Submitted 3 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  12. arXiv:2309.04047  [pdf, other

    stat.ME

    Fully Latent Principal Stratification With Measurement Models

    Authors: Sooyong Lee, Adam C Sales, Hyeon-Ah Kang, Tiffany A. Whittaker

    Abstract: There is wide agreement on the importance of implementation data from randomized effectiveness studies in behavioral science; however, there are few methods available to incorporate these data into causal models, especially when they are multivariate or longitudinal, and interest is in low-dimensional summaries. We introduce a framework for studying how treatment effects vary between subjects who… ▽ More

    Submitted 15 May, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: In Submission

  13. arXiv:2211.02020  [pdf, other

    stat.AP

    Bayesian Causal Forests & the 2022 ACIC Data Challenge: Scalability and Sensitivity

    Authors: Ajinkya H. Kokandakar, Hyunseung Kang, Sameer K. Deshpande

    Abstract: We demonstrate how Hahn et al.'s Bayesian Causal Forests model (BCF) can be used to estimate conditional average treatment effects for the longitudinal dataset in the 2022 American Causal Inference Conference Data Challenge. Unfortunately, existing implementations of BCF do not scale to the size of the challenge data. Therefore, we developed flexBCF -- a more scalable and flexible implementation o… ▽ More

    Submitted 11 May, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Journal ref: Observational Studies 9(3), 29-41 (2023). https://www.muse.jhu.edu/article/895651

  14. arXiv:2208.06533  [pdf, ps, other

    stat.ME

    Propensity Score Modeling: Key Challenges When Moving Beyond the No-Interference Assumption

    Authors: Hyunseung Kang, Chan Park, Ralph Trane

    Abstract: The paper presents some models for the propensity score. Considerable attention is given to a recently popular, but relatively under-explored setting in causal inference where the no-interference assumption does not hold. We lay out some key challenges in propensity score modeling under interference and present a few promising models based on existing works on mixed effects models.

    Submitted 12 August, 2022; originally announced August 2022.

  15. arXiv:2205.11573  [pdf, other

    stat.ME

    Semiparametric Efficient Dimension Reduction in multivariate regression with an Inner Envelope

    Authors: Linquan Ma, Hyunseung Kang, Lan Liu

    Abstract: Recently, Su and Cook proposed a dimension reduction technique called the inner envelope which can be substantially more efficient than the original envelope or existing dimension reduction techniques for multivariate regression. However, their technique relied on a linear model with normally distributed error, which may be violated in practice. In this work, we propose a semiparametric variant of… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  16. arXiv:2112.02452  [pdf, other

    stat.AP stat.ME

    A Robust, Differentially Private Randomized Experiment for Evaluating Online Educational Programs With Sensitive Student Data

    Authors: Manjusha Kancharla, Hyunseung Kang

    Abstract: Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, espec… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: 33 pages, 2 figures, 2 tables

  17. arXiv:2111.09932  [pdf, other

    stat.AP

    Minimum Resource Threshold Policy Under Partial Interference

    Authors: Chan Park, Guanhua Chen, Menggang Yu, Hyunseung Kang

    Abstract: When developing policies for prevention of infectious diseases, policymakers often set specific, outcome-oriented targets to achieve. For example, when developing a vaccine allocation policy, policymakers may want to distribute them so that at least a certain fraction of individuals in a census block are disease-free and spillover effects due to interference within blocks are accounted for. The pa… ▽ More

    Submitted 23 October, 2023; v1 submitted 18 November, 2021; originally announced November 2021.

  18. arXiv:2110.07740  [pdf, other

    stat.ME

    A More Efficient, Doubly Robust, Nonparametric Estimator of Treatment Effects in Multilevel Studies

    Authors: Chan Park, Hyunseung Kang

    Abstract: When studying treatment effects in multilevel studies, investigators commonly use (semi-)parametric estimators, which make strong parametric assumptions about the outcome, the treatment, and/or the correlation structure between study units in a cluster. We propose a novel estimator of treatment effects that does not make such assumptions. Specifically, the new estimator is shown to be doubly robus… ▽ More

    Submitted 10 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  19. arXiv:2101.09394  [pdf, other

    econ.EM stat.ML

    Yield Spread Selection in Predicting Recession Probabilities: A Machine Learning Approach

    Authors: Jaehyuk Choi, Desheng Ge, Kyu Ho Kang, Sungbin Sohn

    Abstract: The literature on using yield curves to forecast recessions customarily uses 10-year--three-month Treasury yield spread without verification on the pair selection. This study investigates whether the predictive ability of spread can be improved by letting a machine learning algorithm identify the best maturity pair and coefficients. Our comprehensive analysis shows that, despite the likelihood gai… ▽ More

    Submitted 5 January, 2022; v1 submitted 22 January, 2021; originally announced January 2021.

    Journal ref: Journal of Forecasting, 42(7): 1772-1785, 2023

  20. Assumption-Lean Analysis of Cluster Randomized Trials in Infectious Diseases for Intent-to-Treat Effects and Network Effects

    Authors: Chan Park, Hyunseung Kang

    Abstract: Cluster randomized trials (CRTs) are a popular design to study the effect of interventions in infectious disease settings. However, standard analysis of CRTs primarily relies on strong parametric methods, usually mixed-effect models to account for the clustering structure, and focuses on the overall intent-to-treat (ITT) effect to evaluate effectiveness. The paper presents two assumption-lean meth… ▽ More

    Submitted 22 September, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

  21. arXiv:2006.01393  [pdf, other

    stat.ME stat.AP

    Two Robust Tools for Inference about Causal Effects with Invalid Instruments

    Authors: Hyunseung Kang, Youjin Lee, T. Tony Cai, Dylan S. Small

    Abstract: Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. H… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

  22. arXiv:2004.08950  [pdf, other

    stat.ME

    Efficient Semiparametric Estimation of Network Treatment Effects Under Partial Interference

    Authors: Chan Park, Hyunseung Kang

    Abstract: Recently, many estimators for network treatment effects have been proposed. But, their optimality properties in terms of semiparametric efficiency have yet to be resolved. We present a simple, yet flexible asymptotic framework to derive the efficient influence function and the semiparametric efficiency lower bound for a family of network causal effects under partial interference. An important coro… ▽ More

    Submitted 24 November, 2021; v1 submitted 19 April, 2020; originally announced April 2020.

  23. arXiv:2003.06723  [pdf, other

    stat.ME

    Inferring Treatment Effects After Testing Instrument Strength in Linear Models

    Authors: Nan Bi, Hyunseung Kang, Jonathan Taylor

    Abstract: A common practice in IV studies is to check for instrument strength, i.e. its association to the treatment, with an F-test from regression. If the F-statistic is above some threshold, usually 10, the instrument is deemed to satisfy one of the three core IV assumptions and used to test for the treatment effect. However, in many cases, the inference on the treatment effect does not take into account… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: 24 pages, 3 figures

  24. arXiv:2002.08457  [pdf, other

    stat.AP

    ivmodel: An R Package for Inference and Sensitivity Analysis of Instrumental Variables Models with One Endogenous Variable

    Authors: Hyunseung Kang, Yang Jiang, Qingyuan Zhao, Dylan S. Small

    Abstract: We present a comprehensive R software ivmodel for analyzing instrumental variables with one endogenous variable. The package implements a general class of estimators called k- class estimators and two confidence intervals that are fully robust to weak instruments. The package also provides power formulas for various test statistics in instrumental variables. Finally, the package contains methods f… ▽ More

    Submitted 7 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 24 pages, 2 figures, 3 tables

  25. arXiv:1911.09802  [pdf, other

    stat.ME

    Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization

    Authors: Ting Ye, Jun Shao, Hyunseung Kang

    Abstract: Mendelian randomization (MR) has become a popular approach to study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables. A challenge in MR is that each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments. To this end, we provide a theoretical c… ▽ More

    Submitted 10 October, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

  26. arXiv:1911.03985  [pdf, other

    stat.ME stat.AP

    Inference After Selecting Plausibly Valid Instruments with Application to Mendelian Randomization

    Authors: Nan Bi, Hyunseung Kang, Jonathan Taylor

    Abstract: Mendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome by using genetic instruments. These instruments are often selected from a combination of prior knowledge from genome wide association studies (GWAS) and data-driven instrument selection procedures or tests. Unfortunately, when testing for the exposure effect, the instrument… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

  27. arXiv:1909.06950  [pdf, ps, other

    stat.ME

    Weak-Instrument Robust Tests in Two-Sample Summary-Data Mendelian Randomization

    Authors: Sheng Wang, Hyunseung Kang

    Abstract: Mendelian randomization (MR) has been a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two-sample summary-data MR being the most popular. Unfortunately, instruments in MR studies are often weakly associated with the exposure, which can bias effect estimates and inflate Type I errors. In this wor… ▽ More

    Submitted 7 June, 2021; v1 submitted 15 September, 2019; originally announced September 2019.

  28. arXiv:1909.03200  [pdf

    cs.LG stat.ML

    Mature GAIL: Imitation Learning for Low-level and High-dimensional Input using Global Encoder and Cost Transformation

    Authors: Wonsup Shin, Hyolim Kang, Sunghoon Hong

    Abstract: Recently, GAIL framework and various variants have shown remarkable possibilities for solving practical MDP problems. However, detailed researches of low-level, and high-dimensional state input in this framework, such as image sequences, has not been conducted. Furthermore, the cost function learned in the traditional GAIL frame-work only lies on a negative range, acting as a non-penalized reward… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

    Comments: 7 pages

  29. arXiv:1908.04427  [pdf, other

    stat.ME

    A Groupwise Approach for Inferring Heterogeneous Treatment Effects in Causal Inference

    Authors: Chan Park, Hyunseung Kang

    Abstract: Recently, there has been great interest in estimating the conditional average treatment effect using flexible machine learning methods. However, in practice, investigators often have working hypotheses about effect heterogeneity across pre-defined subgroups of study units, which we call the groupwise approach. The paper compares two modern ways to estimate groupwise treatment effects, a nonparamet… ▽ More

    Submitted 11 September, 2023; v1 submitted 12 August, 2019; originally announced August 2019.

  30. arXiv:1908.03652  [pdf, other

    stat.ME stat.AP stat.ML

    Detecting Heterogeneous Treatment Effect with Instrumental Variables

    Authors: Michael Johnson, Jiongyi Cao, Hyunseung Kang

    Abstract: There is an increasing interest in estimating heterogeneity in causal effects in randomized and observational studies. However, little research has been conducted to understand heterogeneity in an instrumental variables study. In this work, we present a method to estimate heterogeneous causal effects using an instrumental variable approach. The method has two parts. The first part uses subject-mat… ▽ More

    Submitted 19 January, 2021; v1 submitted 9 August, 2019; originally announced August 2019.

  31. arXiv:1907.06770  [pdf, other

    stat.ME

    Increasing Power for Observational Studies of Aberrant Response: An Adaptive Approach

    Authors: Siyu Heng, Hyunseung Kang, Dylan S. Small, Colin B. Fogarty

    Abstract: In many observational studies, the interest is in the effect of treatment on bad, aberrant outcomes rather than the average outcome. For such settings, the traditional approach is to define a dichotomous outcome indicating aberration from a continuous score and use the Mantel-Haenszel test with matched data. For example, studies of determinants of poor child growth use the World Health Organizatio… ▽ More

    Submitted 14 October, 2020; v1 submitted 15 July, 2019; originally announced July 2019.

    Comments: 83 pages, 1 figure, 8 tables

  32. arXiv:1905.12204  [pdf, other

    cs.LG cs.AI cs.MA cs.RO stat.ML

    Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning

    Authors: Hyunwook Kang, Taehwan Kwon, Jinkyoo Park, James R. Morrison

    Abstract: This paper explores the possibility of near-optimally solving multi-agent, multi-task NP-hard planning problems with time-dependent rewards using a learning-based algorithm. In particular, we consider a class of robot/machine scheduling problems called the multi-robot reward collection problem (MRRC). Such MRRC problems well model ride-sharing, pickup-and-delivery, and a variety of related problem… ▽ More

    Submitted 13 August, 2023; v1 submitted 29 May, 2019; originally announced May 2019.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2022

  33. arXiv:1809.10203  [pdf, other

    cs.CV cs.LG stat.ML

    Multi-Scale Fully Convolutional Network for Cardiac Left Ventricle Segmentation

    Authors: Han Kang, Defeng Chen

    Abstract: The morphological structure of left ventricle segmented from cardiac magnetic resonance images can be used to calculate key clinical parameters, and it is of great significance to the accurate and efficient diagnosis of cardiovascular diseases. Compared with traditional methods, the segmentation algorithms based on fully convolutional neural network greatly improve the accuracy of semantic segment… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 7 pages, 9 figures

  34. arXiv:1808.06418  [pdf, other

    stat.ME

    Spillover Effects in Cluster Randomized Trials with Noncompliance

    Authors: Hyunseung Kang, Luke Keele

    Abstract: Cluster randomized trials (CRTs) are popular in public health and in the social sciences to evaluate a new treatment or policy where the new policy is randomly allocated to clusters of units rather than individual units. CRTs often feature both noncompliance, when individuals within a cluster are not exposed to the intervention, and individuals within a cluster may influence each other through tre… ▽ More

    Submitted 14 August, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

  35. arXiv:1805.03744  [pdf, other

    stat.ME

    Estimation Methods for Cluster Randomized Trials with Noncompliance: A Study of A Biometric Smartcard Payment System in India

    Authors: Hyunseung Kang, Luke Keele

    Abstract: Many policy evaluations occur in settings where treatment is randomized at the cluster level, and there is treatment noncompliance within each cluster. For example, villages might be assigned to treatment and control, but residents in each village may choose to comply or not with their assigned treatment status. When noncompliance is present, the instrumental variables framework can be used to ide… ▽ More

    Submitted 14 August, 2019; v1 submitted 9 May, 2018; originally announced May 2018.

  36. Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis

    Authors: Yang Shi, Mengqiao Wang, Weiping Shi, Ji-Hyun Lee, Huining Kang, Hui Jiang

    Abstract: $\textbf{Motivation:}$ Small $p… ▽ More

    Submitted 25 August, 2023; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: 34 pages, 1 figure, 4 tables

    Journal ref: Bioinformatics, 2019, 35(14):2441-2448

  37. arXiv:1801.03783  [pdf, other

    physics.soc-ph stat.AP

    Quantifying Gerrymandering in North Carolina

    Authors: Gregory Herschlag, Han Sung Kang, Justin Luo, Christy Vaughn Graves, Sachet Bangia, Robert Ravier, Jonathan C. Mattingly

    Abstract: Using an ensemble of redistricting plans, we evaluate whether a given political districting faithfully represents the geo-political landscape. Redistricting plans are sampled by a Monte Carlo algorithm from a probability distribution that adheres to realistic and non-partisan criteria. Using the sampled redistricting plans and historical voting data, we produce an ensemble of elections that reveal… ▽ More

    Submitted 10 January, 2018; originally announced January 2018.

    Comments: This is a revised and expanded version of arxiv:1704.03360, entitled "Redistricting: Drawing the Line."

  38. arXiv:1710.01619  [pdf, other

    stat.ME

    Manifold Data Analysis with Applications to High-Frequency 3D Imaging

    Authors: Hyun Bin Kang, Matthew Reimherr, Mark Shriver, Peter Claes

    Abstract: Many scientific areas are faced with the challenge of extracting information from large, complex, and highly structured data sets. A great deal of modern statistical work focuses on developing tools for handling such data. This paper presents a new subfield of functional data analysis, FDA, which we call Manifold Data Analysis, or MDA. MDA is concerned with the statistical analysis of samples wher… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

  39. arXiv:1707.06318  [pdf, other

    stat.ME

    Markov Network for Modeling Local Item Dependence in Cognitively Diagnostic Classification Models

    Authors: Hyeon-Ah Kang, Jingchen Liu, Zhiliang Ying

    Abstract: The study presents an exploratory graphical modeling approach for evaluating local item dependency within cognitively diagnostic classification models (DCMs). Current approaches to modeling local dependence require known item structure and have limited utility when such information is not available. In this study, we propose an exploratory approach to modeling local dependence so that items' own i… ▽ More

    Submitted 26 May, 2023; v1 submitted 19 July, 2017; originally announced July 2017.

  40. arXiv:1704.03360  [pdf, other

    stat.AP

    Redistricting: Drawing the Line

    Authors: Sachet Bangia, Christy Vaughn Graves, Gregory Herschlag, Han Sung Kang, Justin Luo, Jonathan C. Mattingly, Robert Ravier

    Abstract: We develop methods to evaluate whether a political districting accurately represents the will of the people. To explore and showcase our ideas, we concentrate on the congressional districts for the U.S. House of representatives and use the state of North Carolina and its redistrictings since the 2010 census. Using a Monte Carlo algorithm, we randomly generate over 24,000 redistrictings that are no… ▽ More

    Submitted 8 May, 2017; v1 submitted 9 April, 2017; originally announced April 2017.

    Comments: Corrected typos from previous version; added new plots showing stability; corrected error in EG plots and analysis

    MSC Class: 91F10 ACM Class: G.3; K.4.1

  41. arXiv:1609.04464  [pdf, ps, other

    stat.ME

    Peer Encouragement Designs in Causal Inference with Partial Interference and Identification of Local Average Network Effects

    Authors: Hyunseung Kang, Guido Imbens

    Abstract: In non-network settings, encouragement designs have been widely used to analyze causal effects of a treatment, policy, or intervention on an outcome of interest when randomizing the treatment was considered impractical or when compliance to treatment cannot be perfectly enforced. Unfortunately, such questions related to treatment compliance have received less attention in network settings and the… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.

  42. Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

    Authors: Yang Shi, Huining Kang, Ji-Hyun Lee, Hui Jiang

    Abstract: Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very sma… ▽ More

    Submitted 25 August, 2023; v1 submitted 29 July, 2016; originally announced August 2016.

    Comments: 31 pages, 6 tables

    Journal ref: Statistical Applications in Genetics and Molecular Biology, 2023, 22(1):20210067

  43. arXiv:1606.04146  [pdf, other

    stat.ME stat.AP

    Inference for Instrumental Variables: A Randomization Inference Approach

    Authors: Hyunseung Kang, Laura Peck, Luke Keele

    Abstract: The method of instrumental variables (IV) provides a framework to study causal effects in both randomized experiments with noncompliance and in observational studies where natural circumstances produce as-if random nudges to accept treatment. Traditionally, inference for IV relied on asymptotic approximations of the distribution of the Wald estimator or two-stage least squares, often with structur… ▽ More

    Submitted 6 February, 2018; v1 submitted 13 June, 2016; originally announced June 2016.

  44. arXiv:1603.09326  [pdf, other

    stat.ME econ.EM stat.ML

    Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index

    Authors: Susan Athey, Raj Chetty, Guido Imbens, Hyunseung Kang

    Abstract: Estimating the long-term effects of treatments is of interest in many fields. A common challenge in estimating such treatment effects is that long-term outcomes are unobserved in the time frame needed to make policy decisions. One approach to overcome this missing data problem is to analyze treatments effects on an intermediate outcome, often called a statistical surrogate, if it satisfies the con… ▽ More

    Submitted 21 August, 2024; v1 submitted 30 March, 2016; originally announced March 2016.

  45. arXiv:1603.05224  [pdf, other

    math.ST stat.ME

    Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting

    Authors: Zijian Guo, Hyunseung Kang, T. Tony Cai, Dylan S. Small

    Abstract: A major challenge in instrumental variables (IV) analysis is to find instruments that are valid, or have no direct effect on the outcome and are ignorable. Typically one is unsure whether all of the putative IVs are in fact valid. We propose a general inference procedure in the presence of invalid IVs, called Two-Stage Hard Thresholding (TSHT) with voting. TSHT uses two hard thresholding steps to… ▽ More

    Submitted 8 August, 2017; v1 submitted 16 March, 2016; originally announced March 2016.

    Comments: The title is revised to highlight the two important parts of the proposed method, Two-Stage Hard Thresholding and Voting

  46. arXiv:1504.03718  [pdf, ps, other

    stat.ME

    A simple and robust confidence interval for causal effects with possibly invalid instruments

    Authors: Hyunseung Kang, T. Tony Cai, Dylan S. Small

    Abstract: Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome. Existing confidence intervals for causal effects based on instrumental variables assume that all of the putative instrumental variables are valid; a valid instrumental variable is a variable that affects the outcome only by affecting the treatment and is not related to unmeasured confounders. H… ▽ More

    Submitted 12 July, 2016; v1 submitted 14 April, 2015; originally announced April 2015.

  47. arXiv:1411.7342  [pdf, other

    stat.AP stat.ME

    Full Matching Approach to Instrumental Variables Estimation with Application to the Effect of Malaria on Stunting

    Authors: Hyunseung Kang, Benno Kreuels, Jürgen May, Dylan S. Small

    Abstract: Most previous studies of the causal relationship between malaria and stunting have been studies where potential confounders are controlled via regression-based methods, but these studies may have been biased by unobserved confounders. Instrumental variables (IV) regression offers a way to control for unmeasured confounders where, in our case, the sickle cell trait can be used as an instrument. How… ▽ More

    Submitted 10 November, 2015; v1 submitted 26 November, 2014; originally announced November 2014.

  48. arXiv:1401.5755  [pdf, other

    stat.ME

    Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization

    Authors: Hyunseung Kang, Anru Zhang, T. Tony Cai, Dylan S. Small

    Abstract: Instrumental variables have been widely used for estimating the causal effect between exposure and outcome. Conventional estimation methods require complete knowledge about all the instruments' validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders. Often, this is impractical as highlighted by Mendelian randomization studies where gen… ▽ More

    Submitted 21 September, 2014; v1 submitted 22 January, 2014; originally announced January 2014.

    Comments: 99 pages, 29 figures, 14 tables

  49. arXiv:1306.4615  [pdf, other

    stat.AP

    K-Adaptive Partitioning for Survival Data, with an Application to Cancer Staging

    Authors: Soo-Heang Eo, Hyo Jeong Kang, Seung-Mo Hong, HyungJun Cho

    Abstract: In medical research, it is often needed to obtain subgroups with heterogeneous survivals, which have been predicted from a prognostic factor. For this purpose, a binary split has often been used once or recursively; however, binary partitioning may not provide an optimal set of well separated subgroups. We propose a multi-way partitioning algorithm, which divides the data into K heterogeneous subg… ▽ More

    Submitted 1 November, 2014; v1 submitted 19 June, 2013; originally announced June 2013.

    Comments: 26 pages