-
Convergence Rates of Constrained Expected Improvement
Authors:
Haowei Wang,
Jingyi Wang,
Zhongxiang Dai,
Nai-Yuan Chiang,
Szu Hui Ng,
Cosmin G. Petra
Abstract:
Constrained Bayesian optimization (CBO) methods have seen significant success in black-box optimization with constraints, and one of the most commonly used CBO methods is the constrained expected improvement (CEI) algorithm. CEI is a natural extension of the expected improvement (EI) when constraints are incorporated. However, the theoretical convergence rate of CEI has not been established. In th…
▽ More
Constrained Bayesian optimization (CBO) methods have seen significant success in black-box optimization with constraints, and one of the most commonly used CBO methods is the constrained expected improvement (CEI) algorithm. CEI is a natural extension of the expected improvement (EI) when constraints are incorporated. However, the theoretical convergence rate of CEI has not been established. In this work, we study the convergence rate of CEI by analyzing its simple regret upper bound. First, we show that when the objective function $f$ and constraint function $c$ are assumed to each lie in a reproducing kernel Hilbert space (RKHS), CEI achieves the convergence rates of $\mathcal{O} \left(t^{-\frac{1}{2}}\log^{\frac{d+1}{2}}(t) \right) \ \text{and }\ \mathcal{O}\left(t^{\frac{-ν}{2ν+d}} \log^{\fracν{2ν+d}}(t)\right)$ for the commonly used squared exponential and Matérn kernels, respectively. Second, we show that when $f$ and $c$ are assumed to be sampled from Gaussian processes (GPs), CEI achieves the same convergence rates with a high probability. Numerical experiments are performed to validate the theoretical analysis.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Clinical ModernBERT: An efficient and long context encoder for biomedical text
Authors:
Simon A. Lee,
Anthony Wu,
Jeffrey N. Chiang
Abstract:
We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional e…
▽ More
We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional embeddings (RoPE), Flash Attention, and extended context length up to 8,192 tokens our model adapts these innovations specifically for biomedical and clinical domains. Clinical ModernBERT excels at producing semantically rich representations tailored for long context tasks. We validate this both by analyzing its pretrained weights and through empirical evaluation on a comprehensive suite of clinical NLP benchmarks.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
On the convergence rate of noisy Bayesian Optimization with Expected Improvement
Authors:
Jingyi Wang,
Haowei Wang,
Nai-Yuan Chiang,
Cosmin G. Petra
Abstract:
Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we contribute to the convergence theory of EI in three novel and critical areas. First, we consider objective functions that fit…
▽ More
Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we contribute to the convergence theory of EI in three novel and critical areas. First, we consider objective functions that fit under the Gaussian process (GP) prior assumption, whereas existing works mostly focus on functions in the reproducing kernel Hilbert space (RKHS). Second, we establish for the first time the asymptotic error bound and its corresponding rate for GP-EI with noisy observations under the GP prior assumption. Third, by investigating the exploration and exploitation properties of the non-convex EI function, we establish improved error bounds of GP-EI for both the noise-free and noisy cases.
△ Less
Submitted 12 February, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.
-
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise
Authors:
Jingyi Wang,
Haowei Wang,
Cosmin G. Petra,
Nai-Yuan Chiang
Abstract:
Bayesian optimization (BO) with Gaussian process (GP) surrogate models is a powerful black-box optimization method. Acquisition functions are a critical part of a BO algorithm as they determine how the new samples are selected. Some of the most widely used acquisition functions include upper confidence bound (UCB) and Thompson sampling (TS). The convergence analysis of BO algorithms has focused on…
▽ More
Bayesian optimization (BO) with Gaussian process (GP) surrogate models is a powerful black-box optimization method. Acquisition functions are a critical part of a BO algorithm as they determine how the new samples are selected. Some of the most widely used acquisition functions include upper confidence bound (UCB) and Thompson sampling (TS). The convergence analysis of BO algorithms has focused on the cumulative regret under both the Bayesian and frequentist settings for the objective. In this paper, we establish new pointwise bounds on the prediction error of GP under the frequentist setting with Gaussian noise. Consequently, we prove improved convergence rates of cumulative regret bound for both GP-UCB and GP-TS. Of note, the new prediction error bound under Gaussian noise can be applied to general BO algorithms and convergence analysis, e.g., the asymptotic convergence of expected improvement (EI) with noise.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
FEET: A Framework for Evaluating Embedding Techniques
Authors:
Simon A. Lee,
John Lee,
Jeffrey N. Chiang
Abstract:
In this study, we introduce FEET, a standardized protocol designed to guide the development and benchmarking of foundation models. While numerous benchmark datasets exist for evaluating these models, we propose a structured evaluation protocol across three distinct scenarios to gain a comprehensive understanding of their practical performance. We define three primary use cases: frozen embeddings,…
▽ More
In this study, we introduce FEET, a standardized protocol designed to guide the development and benchmarking of foundation models. While numerous benchmark datasets exist for evaluating these models, we propose a structured evaluation protocol across three distinct scenarios to gain a comprehensive understanding of their practical performance. We define three primary use cases: frozen embeddings, few-shot embeddings, and fully fine-tuned embeddings. Each scenario is detailed and illustrated through two case studies: one in sentiment analysis and another in the medical domain, demonstrating how these evaluations provide a thorough assessment of foundation models' effectiveness in research applications. We recommend this protocol as a standard for future research aimed at advancing representation learning models.
△ Less
Submitted 2 November, 2024;
originally announced November 2024.
-
TFT-multi: simultaneous forecasting of vital sign trajectories in the ICU
Authors:
Rosemary Y. He,
Jeffrey N. Chiang
Abstract:
Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, whi…
▽ More
Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, which is unrealistic in a clinical setting where multiple measures are taken at once. In this work, we extend the framework temporal fusion transformer (TFT), a multi-horizon time series prediction tool, and propose TFT-multi, an end-to-end framework that can predict multiple vital trajectories simultaneously. We apply TFT-multi to forecast 5 vital signs recorded in the intensive care unit: blood pressure, pulse, SpO2, temperature and respiratory rate. We hypothesize that by jointly predicting these measures, which are often correlated with one another, we can make more accurate predictions, especially in variables with large missingness. We validate our model on the public MIMIC dataset and an independent institutional dataset, and demonstrate that this approach outperforms state-of-the-art univariate prediction tools including the original TFT and Prophet, as well as vector regression modeling for multivariate prediction. Furthermore, we perform a study case analysis by applying our pipeline to forecast blood pressure changes in response to actual and hypothetical pressor administration.
△ Less
Submitted 6 December, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Enhancing Antibiotic Stewardship using a Natural Language Approach for Better Feature Representation
Authors:
Simon A. Lee,
Trevor Brokowski,
Jeffrey N. Chiang
Abstract:
The rapid emergence of antibiotic-resistant bacteria is recognized as a global healthcare crisis, undermining the efficacy of life-saving antibiotics. This crisis is driven by the improper and overuse of antibiotics, which escalates bacterial resistance. In response, this study explores the use of clinical decision support systems, enhanced through the integration of electronic health records (EHR…
▽ More
The rapid emergence of antibiotic-resistant bacteria is recognized as a global healthcare crisis, undermining the efficacy of life-saving antibiotics. This crisis is driven by the improper and overuse of antibiotics, which escalates bacterial resistance. In response, this study explores the use of clinical decision support systems, enhanced through the integration of electronic health records (EHRs), to improve antibiotic stewardship. However, EHR systems present numerous data-level challenges, complicating the effective synthesis and utilization of data. In this work, we transform EHR data into a serialized textual representation and employ pretrained foundation models to demonstrate how this enhanced feature representation can aid in antibiotic susceptibility predictions. Our results suggest that this text representation, combined with foundation models, provides a valuable tool to increase interpretability and support antibiotic stewardship efforts.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Authors:
Yu Yang,
Siddhartha Mishra,
Jeffrey N Chiang,
Baharan Mirzasoleiman
Abstract:
Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), whic…
▽ More
Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), which leverages training trajectories from small models to guide the data selection for larger models. We demonstrate through extensive experiments that S2L significantly improves data efficiency in SFT for mathematical problem-solving, reducing the training data to just 11% of the original MathInstruct dataset (Yue et al., 2023) to match full dataset performance while outperforming state-of-the-art data selection algorithms by an average of 4.7% across 6 in- and out-domain evaluation datasets. Remarkably, selecting only 50K data for SFT, S2L achieves a 32.7% accuracy on the most challenging MATH (Hendrycks et al., 2021) benchmark, improving Phi-2 (Li et al., 2023b) by 16.6%. In clinical text summarization on the MIMIC-III dataset (Johnson et al., 2016), S2L again outperforms training on the full dataset using only 50% of the data. Notably, S2L can perform data selection using a reference model 40x smaller than the target model, proportionally reducing the cost of data selection.
△ Less
Submitted 5 December, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Emergency Department Decision Support using Clinical Pseudo-notes
Authors:
Simon A. Lee,
Sujay Jain,
Alex Chen,
Kyoka Ono,
Jennifer Fang,
Akos Rudas,
Jeffrey N. Chiang
Abstract:
In this work, we introduce the Multiple Embedding Model for EHR (MEME), an approach that serializes multimodal EHR tabular data into text using pseudo-notes, mimicking clinical text generation. This conversion not only preserves better representations of categorical data and learns contexts but also enables the effective employment of pretrained foundation models for rich feature representation. T…
▽ More
In this work, we introduce the Multiple Embedding Model for EHR (MEME), an approach that serializes multimodal EHR tabular data into text using pseudo-notes, mimicking clinical text generation. This conversion not only preserves better representations of categorical data and learns contexts but also enables the effective employment of pretrained foundation models for rich feature representation. To address potential issues with context length, our framework encodes embeddings for each EHR modality separately. We demonstrate the effectiveness of MEME by applying it to several decision support tasks within the Emergency Department across multiple hospital systems. Our findings indicate that MEME outperforms traditional machine learning, EHR-specific foundation models, and general LLMs, highlighting its potential as a general and extendible EHR representation strategy.
△ Less
Submitted 29 April, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Are Macula or Optic Nerve Head Structures better at Diagnosing Glaucoma? An Answer using AI and Wide-Field Optical Coherence Tomography
Authors:
Charis Y. N. Chiang,
Fabian Braeu,
Thanadet Chuangsuwanich,
Royston K. Y. Tan,
Jacqueline Chua,
Leopold Schmetterer,
Alexandre Thiery,
Martin Buist,
Michaël J. A. Girard
Abstract:
Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field sw…
▽ More
Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field swept-source OCT scans from 319 glaucoma subjects and 298 non-glaucoma subjects. All scans were compensated to improve deep-tissue visibility. We developed a deep learning algorithm to automatically label all major ONH tissue structures by using 270 manually annotated B-scans for training. The performance of our algorithm was assessed using the Dice coefficient (DC). A glaucoma classification algorithm (3D CNN) was then designed using a combination of 500 OCT volumes and their corresponding automatically segmented masks. This algorithm was trained and tested on 3 datasets: OCT scans cropped to contain the macular tissues only, those to contain the ONH tissues only, and the full wide-field OCT scans. The classification performance for each dataset was reported using the AUC. Results: Our segmentation algorithm was able to segment ONH and macular tissues with a DC of 0.94 $\pm$ 0.003. The classification algorithm was best able to diagnose glaucoma using wide-field 3D-OCT volumes with an AUC of 0.99 $\pm$ 0.01, followed by ONH volumes with an AUC of 0.93 $\pm$ 0.06, and finally macular volumes with an AUC of 0.91 $\pm$ 0.11. Conclusions: this study showed that using wide-field OCT as compared to the typical OCT images containing just the ONH or macular may allow for a significantly improved glaucoma diagnosis. This may encourage the mainstream adoption of 3D wide-field OCT scans. For clinical AI studies that use traditional machines, we would recommend the use of ONH scans as opposed to macula scans.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
A Hybrid Direct-Iterative Method for Solving KKT Linear Systems
Authors:
Shaked Regev,
Nai-Yuan Chiang,
Eric Darve,
Cosmin G. Petra,
Michael A. Saunders,
Kasia Świrydowicz,
Slaven Peleš
Abstract:
We propose a solution strategy for linear systems arising in interior method optimization, which is suitable for implementation on hardware accelerators such as graphical processing units (GPUs). The current gold standard for solving these systems is the LDL^T factorization. However, LDL^T requires pivoting during factorization, which substantially increases communication cost and degrades perform…
▽ More
We propose a solution strategy for linear systems arising in interior method optimization, which is suitable for implementation on hardware accelerators such as graphical processing units (GPUs). The current gold standard for solving these systems is the LDL^T factorization. However, LDL^T requires pivoting during factorization, which substantially increases communication cost and degrades performance on GPUs. Our novel approach solves a large indefinite system by solving multiple smaller positive definite systems, using an iterative solve for the Schur complement and an inner direct solve (via Cholesky factorization) within each iteration. Cholesky is stable without pivoting, thereby reducing communication and allowing reuse of the symbolic factorization. We demonstrate the practicality of our approach and show that on large systems it can efficiently utilize GPUs and outperform LDL^T factorization of the full system.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.