AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software

Yang, Rui; Fu, Michael; Tantithamthavorn, Chakkrit; Arora, Chetan; Gulmammadova, Gunel; Chua, Joey

Computer Science > Cryptography and Security

arXiv:2509.16861 (cs)

[Submitted on 21 Sep 2025]

Title:AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software

Authors:Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua

View PDF HTML (experimental)

Abstract:Guardrails are critical for the safe deployment of Large Language Models (LLMs)-powered software. Unlike traditional rule-based systems with limited, predefined input-output spaces that inherently constrain unsafe behavior, LLMs enable open-ended, intelligent interactions--opening the door to jailbreak attacks through user inputs. Guardrails serve as a protective layer, filtering unsafe prompts before they reach the LLM. However, prior research shows that jailbreak attacks can still succeed over 70% of the time, even against advanced models like GPT-4o. While guardrails such as LlamaGuard report up to 95% accuracy, our preliminary analysis shows their performance can drop sharply--to as low as 12%--when confronted with unseen attacks. This highlights a growing software engineering challenge: how to build a post-deployment guardrail that adapts dynamically to emerging threats? To address this, we propose AdaptiveGuard, an adaptive guardrail that detects novel jailbreak attacks as out-of-distribution (OOD) inputs and learns to defend against them through a continual learning framework. Through empirical evaluation, AdaptiveGuard achieves 96% OOD detection accuracy, adapts to new attacks in just two update steps, and retains over 85% F1-score on in-distribution data post-adaptation, outperforming other baselines. These results demonstrate that AdaptiveGuard is a guardrail capable of evolving in response to emerging jailbreak strategies post deployment. We release our AdaptiveGuard and studied datasets at this https URL to support further research.

Comments:	Accepted to the ASE 2025 International Conference on Automated Software Engineering, Industry Showcase Track
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as:	arXiv:2509.16861 [cs.CR]
	(or arXiv:2509.16861v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2509.16861

Submission history

From: Michael Fu [view email]
[v1] Sun, 21 Sep 2025 01:22:42 UTC (6,052 KB)

Computer Science > Cryptography and Security

Title:AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators