Search | arXiv e-print repository

Gandalf the Red: Adaptive Security for LLMs

Authors: Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Yun-Han Wu , et al. (1 additional authors not shown)

Abstract: Current evaluations of defenses against prompt attacks in large language model (LLM) applications often overlook two critical factors: the dynamic nature of adversarial behavior and the usability penalties imposed on legitimate users by restrictive defenses. We propose D-SEC (Dynamic Security Utility Threat Model), which explicitly separates attackers from legitimate users, models multi-step inter… ▽ More Current evaluations of defenses against prompt attacks in large language model (LLM) applications often overlook two critical factors: the dynamic nature of adversarial behavior and the usability penalties imposed on legitimate users by restrictive defenses. We propose D-SEC (Dynamic Security Utility Threat Model), which explicitly separates attackers from legitimate users, models multi-step interactions, and expresses the security-utility in an optimizable form. We further address the shortcomings in existing evaluations by introducing Gandalf, a crowd-sourced, gamified red-teaming platform designed to generate realistic, adaptive attack. Using Gandalf, we collect and release a dataset of 279k prompt attacks. Complemented by benign user data, our analysis reveals the interplay between security and utility, showing that defenses integrated in the LLM (e.g., system prompts) can degrade usability even without blocking requests. We demonstrate that restricted application domains, defense-in-depth, and adaptive defenses are effective strategies for building secure and useful LLM applications. △ Less

Submitted 2 February, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

Comments: Niklas Pfister, Václav Volhejn and Manuel Knott contributed equally

arXiv:2408.06398 [pdf, other]

Synthetic Photography Detection: A Visual Guidance for Identifying Synthetic Images Created by AI

Authors: Melanie Mathys, Marco Willi, Raphael Meier

Abstract: Artificial Intelligence (AI) tools have become incredibly powerful in generating synthetic images. Of particular concern are generated images that resemble photographs as they aspire to represent real world events. Synthetic photographs may be used maliciously by a broad range of threat actors, from scammers to nation-state actors, to deceive, defraud, and mislead people. Mitigating this threat us… ▽ More Artificial Intelligence (AI) tools have become incredibly powerful in generating synthetic images. Of particular concern are generated images that resemble photographs as they aspire to represent real world events. Synthetic photographs may be used maliciously by a broad range of threat actors, from scammers to nation-state actors, to deceive, defraud, and mislead people. Mitigating this threat usually involves answering a basic analytic question: Is the photograph real or synthetic? To address this, we have examined the capabilities of recent generative diffusion models and have focused on their flaws: visible artifacts in generated images which reveal their synthetic origin to the trained eye. We categorize these artifacts, provide examples, discuss the challenges in detecting them, suggest practical applications of our work, and outline future research directions. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: 27 pages, 25 figures

ACM Class: K.4.0; I.2.0; I.4.0

arXiv:2403.12207 [pdf, other]

Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat?

Authors: Melanie Mathys, Marco Willi, Michael Graber, Raphael Meier

Abstract: The evolution of artificial intelligence (AI) has catalyzed a transformation in digital content generation, with profound implications for cyber influence operations. This report delves into the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images. We critically assess the accessibility, practicality, and output quality… ▽ More The evolution of artificial intelligence (AI) has catalyzed a transformation in digital content generation, with profound implications for cyber influence operations. This report delves into the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images. We critically assess the accessibility, practicality, and output quality of these tools and their implications in threat scenarios of deception, influence, and subversion. Notably, the report generates content for several hypothetical cyber influence operations to demonstrate the current capabilities and limitations of these AI-driven methods for threat actors. While generative models excel at producing illustrations and non-realistic imagery, creating convincing photo-realistic content remains a significant challenge, limited by computational resources and the necessity for human-guided refinement. Our exploration underscores the delicate balance between technological advancement and its potential for misuse, prompting recommendations for ongoing research, defense mechanisms, multi-disciplinary collaboration, and policy development. These recommendations aim to leverage AI's potential for positive impact while safeguarding against its risks to the integrity of information, especially in the context of cyber influence. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 44 pages, 56 figures

ACM Class: K.4.0; I.2.0; I.4.0

arXiv:2206.06761 [pdf, other]

Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO

Authors: Javier Rando, Nasib Naimi, Thomas Baumann, Max Mathys

Abstract: This work conducts the first analysis on the robustness against adversarial attacks on self-supervised Vision Transformers trained using DINO. First, we evaluate whether features learned through self-supervision are more robust to adversarial attacks than those emerging from supervised learning. Then, we present properties arising for attacks in the latent space. Finally, we evaluate whether three… ▽ More This work conducts the first analysis on the robustness against adversarial attacks on self-supervised Vision Transformers trained using DINO. First, we evaluate whether features learned through self-supervision are more robust to adversarial attacks than those emerging from supervised learning. Then, we present properties arising for attacks in the latent space. Finally, we evaluate whether three well-known defense strategies can increase adversarial robustness in downstream tasks by only fine-tuning the classification head to provide robustness even in view of limited compute resources. These defense strategies are: Adversarial Training, Ensemble Adversarial Training and Ensemble of Specialized Networks. △ Less

Submitted 8 September, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: ICML 2022 Workshop paper accepted at AdvML Frontiers

arXiv:2108.05236 [pdf, other]

A Limitlessly Scalable Transaction System

Authors: Max Mathys, Roland Schmid, Jakub Sliwinski, Roger Wattenhofer

Abstract: We present Accept, a simple, asynchronous transaction system that achieves perfect horizontal scaling. Usual blockchain-based transaction systems come with a fundamental throughput limitation as they require that all (potentially unrelated) transactions must be totally ordered. Such solutions thus require serious compromises or are outright unsuitable for large-scale applications, such as global… ▽ More We present Accept, a simple, asynchronous transaction system that achieves perfect horizontal scaling. Usual blockchain-based transaction systems come with a fundamental throughput limitation as they require that all (potentially unrelated) transactions must be totally ordered. Such solutions thus require serious compromises or are outright unsuitable for large-scale applications, such as global retail payments. Accept provides efficient horizontal scaling without any limitation. To that end, Accept satisfies a relaxed form of consensus and does not establish an ordering of unrelated transactions. Furthermore, Accept achieves instant finality and does not depend on a source of randomness. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: 11 pages, 3 figures

Showing 1–5 of 5 results for author: Mathys, M