Skip to main content

Showing 1–1 of 1 results for author: Pascual-Ortiz, D

.
  1. arXiv:2501.07927  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Gandalf the Red: Adaptive Security for LLMs

    Authors: Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Yun-Han Wu , et al. (1 additional authors not shown)

    Abstract: Current evaluations of defenses against prompt attacks in large language model (LLM) applications often overlook two critical factors: the dynamic nature of adversarial behavior and the usability penalties imposed on legitimate users by restrictive defenses. We propose D-SEC (Dynamic Security Utility Threat Model), which explicitly separates attackers from legitimate users, models multi-step inter… ▽ More

    Submitted 2 February, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: Niklas Pfister, Václav Volhejn and Manuel Knott contributed equally