Search | arXiv e-print repository

Textual Bayes: Quantifying Uncertainty in LLM-Based Systems

Authors: Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell

Abstract: Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensiti… ▽ More Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model's textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference, a difficult problem even for well-studied data modalities, we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2407.12588 [pdf, other]

Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks

Authors: Antoni Kowalczuk, Jan Dubiński, Atiyeh Ashari Ghomi, Yi Sui, George Stein, Jiapeng Wu, Jesse C. Cresswell, Franziska Boenisch, Adam Dziedzic

Abstract: Large-scale vision models have become integral in many applications due to their unprecedented performance and versatility across downstream tasks. However, the robustness of these foundation models has primarily been explored for a single task, namely image classification. The vulnerability of other common vision tasks, such as semantic segmentation and depth estimation, remains largely unknown.… ▽ More Large-scale vision models have become integral in many applications due to their unprecedented performance and versatility across downstream tasks. However, the robustness of these foundation models has primarily been explored for a single task, namely image classification. The vulnerability of other common vision tasks, such as semantic segmentation and depth estimation, remains largely unknown. We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks. Our attacks operate in the encoder embedding space and at the downstream task output level. In both cases, current state-of-the-art adversarial fine-tuning techniques tested only for classification significantly degrade clean and robust performance on other tasks. Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly. Our code is available at ${github.com/layer6ai-labs/ssl-robustness}$. △ Less

Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted at the ICML 2024 Workshop on Foundation Models in the Wild

arXiv:2306.08656 [pdf, other]

Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Authors: Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot

Abstract: Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differe… ▽ More Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differentially private model training is insufficient for providing strong certified robustness guarantees. Indeed, combining differential privacy and certified robustness in a single system is non-trivial, leading previous works to introduce complex training schemes that lack flexibility. In this work, we present DP-CERT, a simple and effective method that achieves both privacy and robustness guarantees simultaneously by integrating randomized smoothing into standard differentially private model training. Compared to the leading prior work, DP-CERT gives up to a 2.5% increase in certified accuracy for the same differential privacy guarantee on CIFAR10. Through in-depth per-sample metric analysis, we find that larger certifiable radii correlate with smaller local Lipschitz constants, and show that DP-CERT effectively reduces Lipschitz constants compared to other differentially private training methods. The code is available at github.com/layer6ai-labs/dp-cert. △ Less

Submitted 20 December, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 29 pages, 19 figures. Accepted at TMLR in 2024. Link: https://openreview.net/forum?id=YN0IcnXqsr

arXiv:2206.07737 [pdf, other]

Disparate Impact in Differential Privacy from Gradient Misalignment

Authors: Maria S. Esipova, Atiyeh Ashari Ghomi, Yaqiao Luo, Jesse C. Cresswell

Abstract: As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies can worsen unfair tendencies in models. In particular, one of the most widely used techniques for private model training, differential… ▽ More As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies can worsen unfair tendencies in models. In particular, one of the most widely used techniques for private model training, differentially private stochastic gradient descent (DPSGD), frequently intensifies disparate impact on groups within data. In this work we study the fine-grained causes of unfairness in DPSGD and identify gradient misalignment due to inequitable gradient clipping as the most significant source. This observation leads us to a new method for reducing unfairness by preventing gradient misalignment in DPSGD. △ Less

Submitted 23 February, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: ICLR 2023 notable top 25%, https://openreview.net/forum?id=qLOaeRvteqbx. Our code is available at https://github.com/layer6ai-labs/fair-dp

arXiv:1801.02263 [pdf, other]

Seasonal Goods and Spoiled Milk: Pricing for a Limited Shelf-Life

Authors: Atiyeh Ashari Ghomi, Allan Borodin, Omer Lev

Abstract: We examine the case of items with a limited shelf-life where storing an item (before consumption) may carry a cost to a buyer (or distributor). For example, eggs, milk, or Groupon coupons have a fixed expiry date, and seasonal goods can suffer a decrease in value. We show how this setting contrasts with recent results by Berbeglia et al (arXiv:1509.07330(v5)) for items with infinite shelf-life.… ▽ More We examine the case of items with a limited shelf-life where storing an item (before consumption) may carry a cost to a buyer (or distributor). For example, eggs, milk, or Groupon coupons have a fixed expiry date, and seasonal goods can suffer a decrease in value. We show how this setting contrasts with recent results by Berbeglia et al (arXiv:1509.07330(v5)) for items with infinite shelf-life. We prove tight bounds on the seller's profits showing how they relate to the items' shelf-life. We show, counterintuitively, that in our limited shelf-life setting, increasing storage costs can sometimes lead to less profit for the seller which cannot happen when items have unlimited shelf-life. We also provide an algorithm that calculates optimal prices. Finally, we examine empirically the relationship between profits and buyer utility as the storage cost and shelf-life duration change, and observe properties, some of which are unique to the limited shelf-life setting. △ Less

Submitted 6 May, 2018; v1 submitted 7 January, 2018; originally announced January 2018.

Showing 1–5 of 5 results for author: Ghomi, A A