RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

Zhong, Peter Yong; Chen, Siyuan; Wang, Ruiqi; McCall, McKenna; Titzer, Ben L.; Miller, Heather; Gibbons, Phillip B.

Computer Science > Cryptography and Security

arXiv:2502.08966 (cs)

[Submitted on 13 Feb 2025 (v1), last revised 14 Feb 2025 (this version, v2)]

Title:RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

Authors:Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller, Phillip B. Gibbons

View PDF HTML (experimental)

Abstract:Tool-Based Agent Systems (TBAS) allow Language Models (LMs) to use external tools for tasks beyond their standalone capabilities, such as searching websites, booking flights, or making financial transactions. However, these tools greatly increase the risks of prompt injection attacks, where malicious content hijacks the LM agent to leak confidential data or trigger harmful actions. Existing defenses (OpenAI GPTs) require user confirmation before every tool call, placing onerous burdens on users. We introduce Robust TBAS (RTBAS), which automatically detects and executes tool calls that preserve integrity and confidentiality, requiring user confirmation only when these safeguards cannot be ensured. RTBAS adapts Information Flow Control to the unique challenges presented by TBAS. We present two novel dependency screeners, using LM-as-a-judge and attention-based saliency, to overcome these challenges. Experimental results on the AgentDojo Prompt Injection benchmark show RTBAS prevents all targeted attacks with only a 2% loss of task utility when under attack, and further tests confirm its ability to obtain near-oracle performance on detecting both subtle and direct privacy leaks.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.08966 [cs.CR]
	(or arXiv:2502.08966v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2502.08966

Submission history

From: Peter Yong Zhong [view email]
[v1] Thu, 13 Feb 2025 05:06:22 UTC (830 KB)
[v2] Fri, 14 Feb 2025 04:16:40 UTC (830 KB)

Computer Science > Cryptography and Security

Title:RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators