Skip to main content

Showing 1–21 of 21 results for author: Lee, K H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.12334  [pdf, other

    stat.ME

    Inference for Log-Gaussian Cox Point Processes using Bayesian Deep Learning: Application to Human Oral Microbiome Image Data

    Authors: Shuwan Wang, Christopher K. Wikle, Athanasios C. Micheas, Jessica L. Mark Welch, Jacqueline R. Starr, Kyu Ha Lee

    Abstract: It is common in nature to see aggregation of objects in space. Exploring the mechanism associated with the locations of such clustered observations can be essential to understanding the phenomenon, such as the source of spatial heterogeneity, or comparison to other event generating processes in the same domain. Log-Gaussian Cox processes (LGCPs) represent an important class of models for quantifyi… ▽ More

    Submitted 18 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  2. arXiv:2502.10513  [pdf, other

    stat.ME

    A Bayesian Multivariate Spatial Point Pattern Model: Application to Oral Microbiome FISH Image Data

    Authors: Kyu Ha Lee, Brent A. Coull, Suman Majumder, Patrick J. La Riviere, Jessica L. Mark Welch, Jacqueline R. Starr

    Abstract: Advances in cellular imaging technologies, especially those based on fluorescence in situ hybridization (FISH) now allow detailed visualization of the spatial organization of human or bacterial cells. Quantifying this spatial organization is crucial for understanding the function of multicellular tissues or biofilms, with implications for human health and disease. To address the need for better me… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  3. arXiv:2411.17910  [pdf, other

    stat.ME stat.AP

    Bayesian Variable Selection for High-Dimensional Mediation Analysis: Application to Metabolomics Data in Epidemiological Studies

    Authors: Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee

    Abstract: In epidemiological research, causal models incorporating potential mediators along a pathway are crucial for understanding how exposures influence health outcomes. This work is motivated by integrated epidemiological and blood biomarker studies, investigating the relationship between long-term adherence to a Mediterranean diet and cardiometabolic health, with plasma metabolomes as potential mediat… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  4. arXiv:2401.04832  [pdf, other

    stat.ME

    Group lasso priors for Bayesian accelerated failure time models with left-truncated and interval-censored data

    Authors: Harrison T. Reeder, Sebastien Haneuse, Kyu Ha Lee

    Abstract: An important task in health research is to characterize time-to-event outcomes such as disease onset or mortality in terms of a potentially high-dimensional set of risk factors. For example, prospective cohort studies of Alzheimer's disease typically enroll older adults for observation over several decades to assess the long-term impact of genetic and other factors on cognitive decline and mortali… ▽ More

    Submitted 11 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  5. arXiv:2310.10614  [pdf, ps, other

    stat.CO

    Understanding an Acquisition Function Family for Bayesian Optimization

    Authors: Jiajie Kong, Tony Pourmohamad, Herbert K. H. Lee

    Abstract: Bayesian optimization (BO) developed as an approach for the efficient optimization of expensive black-box functions without gradient information. A typical BO paper introduces a new approach and compares it to some alternatives on simulated and possibly real examples to show its efficacy. Yet on a different example, this new algorithm might not be as effective as the alternatives. This paper looks… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  6. Characterizing quantile-varying covariate effects under the accelerated failure time model

    Authors: Harrison T. Reeder, Kyu Ha Lee, Sebastien Haneuse

    Abstract: An important task in survival analysis is choosing a structure for the relationship between covariates of interest and the time-to-event outcome. For example, the accelerated failure time (AFT) model structures each covariate effect as a constant multiplicative shift in the outcome distribution across all survival quantiles. Though parsimonious, this structure cannot detect or capture effects that… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

    Comments: This is the pre-peer reviewed, "submitted" version of the manuscript published in final form in Biostatistics by Oxford University Press at the below citation/doi. This upload will be updated with the final peer-reviewed "accepted" version of the manuscript following a 24 month embargo period

    Journal ref: Biostatistics (Oxford, England), kxac052 (2023)

  7. arXiv:2202.04198  [pdf, other

    stat.AP

    Multivariate cluster point process to quantify and explore multi-entity configurations: Application to biofilm image data

    Authors: Suman Majumder, Brent A. Coull, Jessica L. Mark Welch, Patrick J. La Riviere, Floyd E. Dewhirst, Jacqueline R. Starr, Kyu Ha Lee

    Abstract: Clusters of similar or dissimilar objects are encountered in many fields. Frequently used approaches treat the central object of each cluster as latent. Yet, often objects of one or more types cluster around objects of another type. Such arrangements are common in biomedical images of cells, in which nearby cell types likely interact. Quantifying spatial relationships may elucidate biological mech… ▽ More

    Submitted 8 October, 2024; v1 submitted 8 February, 2022; originally announced February 2022.

    MSC Class: 62

  8. arXiv:2105.08776  [pdf, other

    stat.ME stat.AP

    Measuring performance for end-of-life care

    Authors: Sebastien Haneuse, Deborah Schrag, Francesca Dominici, Sharon-Lise Normand, Kyu Ha Lee

    Abstract: Although not without controversy, readmission is entrenched as a hospital quality metric, with statistical analyses generally based on fitting a logistic-Normal generalized linear mixed model. Such analyses, however, ignore death as a competing risk, although doing so for clinical conditions with high mortality can have profound effects; a hospitals seemingly good performance for readmission may b… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

  9. arXiv:2008.02204  [pdf, ps, other

    stat.ME stat.CO

    Bayesian Survival Analysis Using Gamma Processes with Adaptive Time Partition

    Authors: Yi Li, Sumi Seo, Kyu Ha Lee

    Abstract: In Bayesian semi-parametric analyses of time-to-event data, non-parametric process priors are adopted for the baseline hazard function or the cumulative baseline hazard function for a given finite partition of the time axis. However, it would be controversial to suggest a general guideline to construct an optimal time partition. While a great deal of research has been done to relax the assumption… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  10. arXiv:2006.05213  [pdf, other

    cs.LG cs.CL stat.ML

    Graph-Aware Transformer: Is Attention All Graphs Need?

    Authors: Sanghyun Yoo, Young-Seok Kim, Kang Hyun Lee, Kuhwan Jeong, Junhwi Choi, Hoshik Lee, Young Sang Choi

    Abstract: Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

  11. arXiv:2003.07611  [pdf, other

    cs.LG stat.ML

    A comprehensive study on the prediction reliability of graph neural networks for virtual screening

    Authors: Soojung Yang, Kyung Hoon Lee, Seongok Ryu

    Abstract: Prediction models based on deep neural networks are increasingly gaining attention for fast and accurate virtual screening systems. For decision makings in virtual screening, researchers find it useful to interpret an output of classification system as probability, since such interpretation allows them to filter out more desirable compounds. However, probabilistic interpretation cannot be correct… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

  12. arXiv:1801.03567  [pdf, other

    stat.CO

    SemiCompRisks: An R Package for Independent and Cluster-Correlated Analyses of Semi-Competing Risks Data

    Authors: Danilo Alvares, Sebastien Haneuse, Catherine Lee, Kyu Ha Lee

    Abstract: Semi-competing risks refer to the setting where primary scientific interest lies in estimation and inference with respect to a non-terminal event, the occurrence of which is subject to a terminal event. In this paper, we present the R package SemiCompRisks that provides functions to perform the analysis of independent/clustered semi-competing risks data under the illness-death multi-state model. T… ▽ More

    Submitted 5 April, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

    Comments: 35 pages, 3 figures

  13. arXiv:1712.07325  [pdf, other

    stat.ME cs.SI stat.AP stat.CO stat.ML

    Model-Based Clustering of Time-Evolving Networks through Temporal Exponential-Family Random Graph Models

    Authors: Kevin H. Lee, Lingzhou Xue, David R. Hunter

    Abstract: Dynamic networks are a general language for describing time-evolving complex systems, and discrete time network models provide an emerging statistical technique for various applications. It is a fundamental research question to detect the community structure in time-evolving networks. However, due to significant computational challenges and difficulties in modeling communities of time-evolving net… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

    Comments: 30 pages, 4 figures

  14. arXiv:1711.00157  [pdf, ps, other

    stat.AP

    Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

    Authors: Kyu Ha Lee, Brent A. Coull, Anna-Barbara Moscicki, Bruce J. Paster, Jacqueline R. Starr

    Abstract: Microorganisms play critical roles in human health and disease. It is well known that microbes live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, multivariate statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa,… ▽ More

    Submitted 20 May, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

  15. Hierarchical models for semi-competing risks data with application to quality of end-of-life care for pancreatic cancer

    Authors: Kyu Ha Lee, Francesca Dominici, Deborah Schrag, Sebastien Haneuse

    Abstract: Readmission following discharge from an initial hospitalization is a key marker of quality of health care in the United States. For the most part, readmission has been used to study quality of care for patients with acute health conditions, such as pneumonia and heart failure, with analyses typically based on a logistic-Normal generalized linear mixed model. Applying this model to the study readmi… ▽ More

    Submitted 5 August, 2015; v1 submitted 2 February, 2015; originally announced February 2015.

    Journal ref: Journal of the American Statistical Association 2016, Volume 111, Issue 515, pages 1075-1095

  16. arXiv:1403.4890  [pdf, other

    stat.ME stat.CO

    Modeling an Augmented Lagrangian for Blackbox Constrained Optimization

    Authors: Robert B. Gramacy, Genetha A. Gray, Sebastien Le Digabel, Herbert K. H. Lee, Pritam Ranjan, Garth Wells, Stefan M. Wild

    Abstract: Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth.… ▽ More

    Submitted 3 March, 2015; v1 submitted 19 March, 2014; originally announced March 2014.

    Comments: 22 Pages, 2 additional supplementary, 5 figures

  17. arXiv:1007.4580  [pdf, other

    stat.CO

    Cases for the nugget in modeling computer experiments

    Authors: Robert B. Gramacy, Herbert K. H. Lee

    Abstract: Most surrogate models for computer experiments are interpolators, and the most common interpolator is a Gaussian process (GP) that deliberately omits a small-scale (measurement) error term called the nugget. The explanation is that computer experiments are, by definition, "deterministic", and so there is no measurement error. We think this is too narrow a focus for a computer experiment and a stat… ▽ More

    Submitted 21 November, 2010; v1 submitted 26 July, 2010; originally announced July 2010.

    Comments: 17 pages, 4 figures, 3 tables; revised

  18. arXiv:1004.4027  [pdf, ps, other

    stat.ME stat.AP

    Optimization Under Unknown Constraints

    Authors: Robert B. Gramacy, Herbert K. H. Lee

    Abstract: Optimization of complex functions, such as the output of computer simulators, is a difficult task that has received much attention in the literature. A less studied problem is that of optimization under unknown constraints, i.e., when the simulator must be invoked both to determine the typical real-valued response and to determine if a constraint has been violated, either for physical or policy re… ▽ More

    Submitted 5 July, 2010; v1 submitted 22 April, 2010; originally announced April 2010.

    Comments: 19 pages, 8 figures, Valencia discussion paper

  19. arXiv:0805.4359  [pdf, ps, other

    stat.AP stat.ME

    Adaptive design and analysis of supercomputer experiments

    Authors: Robert B. Gramacy, Herbert K. H. Lee

    Abstract: Computer experiments are often performed to allow modeling of a response surface of a physical experiment that can be too costly or difficult to run except using a simulator. Running the experiment over a dense grid can be prohibitively expensive, yet running over a sparse design chosen in advance can result in obtaining insufficient information in parts of the space, particularly when the surfa… ▽ More

    Submitted 25 May, 2009; v1 submitted 28 May, 2008; originally announced May 2008.

    Comments: 42 pages, 8 Figures, 2 tables, to appear in Technometrics

  20. arXiv:0804.4685  [pdf, ps, other

    stat.ME stat.ML

    Gaussian Processes and Limiting Linear Models

    Authors: Robert B. Gramacy, Herbert K. H. Lee

    Abstract: Gaussian processes retain the linear model either as a special case, or in the limit. We show how this relationship can be exploited when the data are at least partially linear. However from the perspective of the Bayesian posterior, the Gaussian processes which encode the linear model either have probability of nearly zero or are otherwise unattainable without the explicit construction of a pri… ▽ More

    Submitted 13 July, 2008; v1 submitted 29 April, 2008; originally announced April 2008.

    Comments: 31 pages, 10 figures, 4 tables, accepted by CSDA, earlier version in JSM06 proceedings

  21. arXiv:0710.4536  [pdf, ps, other

    stat.ME stat.AP stat.CO

    Bayesian treed Gaussian process models with an application to computer modeling

    Authors: Robert B. Gramacy, Herbert K. H. Lee

    Abstract: Motivated by a computer experiment for the design of a rocket booster, this paper explores nonstationary modeling methodologies that couple stationary Gaussian processes with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. The methodological developments and statistical computing details which make this approach efficient are described in detai… ▽ More

    Submitted 17 March, 2009; v1 submitted 24 October, 2007; originally announced October 2007.

    Comments: 32 pages, 9 figures, to appear in the Journal of the American Statistical Association