Skip to main content

Showing 1–15 of 15 results for author: Zanella-Béguelin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05445  [pdf, ps, other

    cs.CR

    A Systematization of Security Vulnerabilities in Computer Use Agents

    Authors: Daniel Jones, Giorgio Severi, Martin Pouliot, Gary Lopez, Joris de Gruyter, Santiago Zanella-Beguelin, Justin Song, Blake Bullwinkel, Pamela Cortez, Amanda Minnich

    Abstract: Computer Use Agents (CUAs), autonomous systems that interact with software interfaces via browsers or virtual machines, are rapidly being deployed in consumer and enterprise environments. These agents introduce novel attack surfaces and trust boundaries that are not captured by traditional threat models. Despite their growing capabilities, the security boundaries of CUAs remain poorly understood.… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.02956  [pdf, ps, other

    cs.CR cs.AI

    A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

    Authors: Blake Bullwinkel, Mark Russinovich, Ahmed Salem, Santiago Zanella-Beguelin, Daniel Jones, Giorgio Severi, Eugenia Kim, Keegan Hines, Amanda Minnich, Yonatan Zunger, Ram Shankar Siva Kumar

    Abstract: Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant threat to the safe and secure deployment of LLM-based systems. We study the effectiveness of the Crescendo multi-turn jailbreak at the level of intermediate model… ▽ More

    Submitted 29 June, 2025; originally announced July 2025.

  3. arXiv:2505.23643  [pdf, ps, other

    cs.CR cs.AI

    Securing AI Agents with Information-Flow Control

    Authors: Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of prop… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  4. arXiv:2502.14921  [pdf, ps, other

    cs.CL cs.CR cs.LG

    The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text

    Authors: Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Béguelin, Shruti Tople, Reza Shokri

    Abstract: How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target th… ▽ More

    Submitted 6 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 42nd International Conference on Machine Learning (ICML 2025)

  5. arXiv:2410.03055  [pdf, other

    cs.LG cs.AI

    Permissive Information-Flow Analysis for Large Language Models

    Authors: Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Béguelin

    Abstract: Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the syst… ▽ More

    Submitted 22 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

  6. arXiv:2406.07954  [pdf, other

    cs.CR cs.AI

    Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

    Authors: Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr

    Abstract: Large language model systems face important security risks from maliciously crafted messages that aim to overwrite the system's original instructions or leak private data. To study this problem, we organized a capture-the-flag competition at IEEE SaTML 2024, where the flag is a secret string in the LLM system prompt. The competition was organized in two phases. In the first phase, teams developed… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2402.14397  [pdf, other

    cs.CR cs.LG

    Closed-Form Bounds for DP-SGD against Record-level Inference

    Authors: Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,δ)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. T… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  8. arXiv:2311.15792  [pdf, other

    cs.LG cs.CR

    Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

    Authors: Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle

    Abstract: Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  9. arXiv:2302.01190  [pdf, other

    stat.ML cs.CR cs.LG

    On the Efficacy of Differentially Private Few-shot Image Classification

    Authors: Marlon Tobaben, Aliaksandra Shysheya, John Bronskill, Andrew Paverd, Shruti Tople, Santiago Zanella-Beguelin, Richard E Turner, Antti Honkela

    Abstract: There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-tuned on private downstream datasets that are relatively large and similar in distribution to the pretraining data. However, in many applications including person… ▽ More

    Submitted 19 December, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 49 pages, 24 figures; published in TMLR 12/2023 https://openreview.net/forum?id=hFsr59Imzm

    Journal ref: Transactions on Machine Learning Research, ISSN 2835-8856, 2023

  10. arXiv:2302.00539  [pdf, other

    cs.LG

    Analyzing Leakage of Personally Identifiable Information in Language Models

    Authors: Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scr… ▽ More

    Submitted 23 April, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: IEEE Symposium on Security and Privacy (S&P) 2023

  11. arXiv:2212.10986  [pdf, other

    cs.LG cs.CR cs.GT

    SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

    Authors: Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

    Abstract: Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy i… ▽ More

    Submitted 20 April, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 20 pages, to appear in 2023 IEEE Symposium on Security and Privacy

  12. arXiv:2206.05199  [pdf, other

    cs.LG cs.CR

    Bayesian Estimation of Differential Privacy

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

    Abstract: Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the p… ▽ More

    Submitted 15 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 17 pages, 8 figures. Joint main authors: Santiago Zanella-Béguelin, Lukas Wutschitz, and Shruti Tople

  13. arXiv:1912.07942  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Analyzing Information Leakage of Updates to Natural Language Models

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd, Olga Ohrimenko, Boris Köpf, Marc Brockschmidt

    Abstract: To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{diffe… ▽ More

    Submitted 5 August, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

  14. arXiv:1703.00055  [pdf, other

    cs.PL cs.CR

    A Monadic Framework for Relational Verification: Applied to Information Security, Program Equivalence, and Optimizations

    Authors: Niklas Grimm, Kenji Maillard, Cédric Fournet, Catalin Hritcu, Matteo Maffei, Jonathan Protzenko, Tahina Ramananandro, Aseem Rastogi, Nikhil Swamy, Santiago Zanella-Béguelin

    Abstract: Relational properties describe multiple runs of one or more programs. They characterize many useful notions of security, program refinement, and equivalence for programs with diverse computational effects, and they have received much attention in the recent literature. Rather than developing separate tools for special classes of effects and relational properties, we advocate using a general purpos… ▽ More

    Submitted 12 October, 2019; v1 submitted 28 February, 2017; originally announced March 2017.

    Comments: CPP'18 extended version with the missing ERC acknowledgement

  15. arXiv:1703.00053  [pdf, other

    cs.PL cs.CR

    Verified Low-Level Programming Embedded in F*

    Authors: Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago Zanella-Béguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, Nikhil Swamy

    Abstract: We present Low*, a language for low-level programming and verification, and its application to high-assurance optimized cryptographic libraries. Low* is a shallow embedding of a small, sequential, well-behaved subset of C in F*, a dependently-typed variant of ML aimed at program verification. Departing from ML, Low* does not involve any garbage collection or implicit heap allocation; instead, it h… ▽ More

    Submitted 11 December, 2018; v1 submitted 28 February, 2017; originally announced March 2017.

    Comments: extended version of ICFP final camera ready version; only Acknowledgements differ from 30 Aug 2017 version