Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

Pasquini, Dario; Kornaropoulos, Evgenios M.; Ateniese, Giuseppe

Computer Science > Cryptography and Security

arXiv:2410.20911 (cs)

[Submitted on 28 Oct 2024 (v1), last revised 18 Nov 2024 (this version, v2)]

Title:Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

Authors:Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker's LLM to disrupt their own operations (passive defense) or even compromise the attacker's machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker's LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven attacks. To foster further research and collaboration, Mantis is available as an open-source tool: this https URL

Comments:	v0.2 (evaluated on more agents)
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.20911 [cs.CR]
	(or arXiv:2410.20911v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2410.20911

Submission history

From: Dario Pasquini [view email]
[v1] Mon, 28 Oct 2024 10:43:34 UTC (11,405 KB)
[v2] Mon, 18 Nov 2024 09:15:46 UTC (11,550 KB)

Computer Science > Cryptography and Security

Title:Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators