Skip to main content

Showing 1–13 of 13 results for author: Wutschitz, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04245  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

    Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim

    Abstract: As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to rea… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

    ACM Class: I.2.6; I.2.7

  2. arXiv:2505.23643  [pdf, ps, other

    cs.CR cs.AI

    Securing AI Agents with Information-Flow Control

    Authors: Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of prop… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  3. arXiv:2502.14921  [pdf, ps, other

    cs.CL cs.CR cs.LG

    The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text

    Authors: Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Béguelin, Shruti Tople, Reza Shokri

    Abstract: How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target th… ▽ More

    Submitted 6 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 42nd International Conference on Machine Learning (ICML 2025)

  4. arXiv:2410.03055  [pdf, other

    cs.LG cs.AI

    Permissive Information-Flow Analysis for Large Language Models

    Authors: Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Béguelin

    Abstract: Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the syst… ▽ More

    Submitted 22 May, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

  5. arXiv:2402.14397  [pdf, other

    cs.CR cs.LG

    Closed-Form Bounds for DP-SGD against Record-level Inference

    Authors: Giovanni Cherubin, Boris Köpf, Andrew Paverd, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Machine learning models trained with differentially-private (DP) algorithms such as DP-SGD enjoy resilience against a wide range of privacy attacks. Although it is possible to derive bounds for some attacks based solely on an $(\varepsilon,δ)$-DP guarantee, meaningful bounds require a small enough privacy budget (i.e., injecting a large amount of noise), which results in a large loss in utility. T… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2311.15792  [pdf, other

    cs.LG cs.CR

    Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

    Authors: Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle

    Abstract: Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2302.00539  [pdf, other

    cs.LG

    Analyzing Leakage of Personally Identifiable Information in Language Models

    Authors: Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

    Abstract: Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scr… ▽ More

    Submitted 23 April, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: IEEE Symposium on Security and Privacy (S&P) 2023

  8. arXiv:2206.05199  [pdf, other

    cs.LG cs.CR

    Bayesian Estimation of Differential Privacy

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

    Abstract: Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the p… ▽ More

    Submitted 15 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 17 pages, 8 figures. Joint main authors: Santiago Zanella-Béguelin, Lukas Wutschitz, and Shruti Tople

  9. arXiv:2206.01838  [pdf, other

    cs.LG cs.CR

    Differentially Private Model Compression

    Authors: Fatemehsadat Mireshghallah, Arturs Backurs, Huseyin A Inan, Lukas Wutschitz, Janardhan Kulkarni

    Abstract: Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, c… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  10. arXiv:2110.06500  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Differentially Private Fine-tuning of Language Models

    Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

    Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

  11. arXiv:2106.02848  [pdf, ps, other

    cs.DS cs.CR cs.LG

    Numerical Composition of Differential Privacy

    Authors: Sivakanth Gopi, Yin Tat Lee, Lukas Wutschitz

    Abstract: We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy. Our method is based on the notion of privacy loss random variables to quantify the privacy loss of DP algorithms. The running time and memory needed for our algorithm to approximate the privacy curve of a DP algorithm composed with itself $k$ times is… ▽ More

    Submitted 26 October, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Spotlight

  12. arXiv:2101.05405  [pdf, other

    cs.CR cs.CL cs.LG

    Training Data Leakage Analysis in Language Models

    Authors: Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim

    Abstract: Recent advances in neural network based language models lead to successful deployments of such models, improving user experience in various applications. It has been demonstrated that strong performance of language models comes along with the ability to memorize rare training samples, which poses serious privacy threats in case the model is trained on confidential user content. In this work, we in… ▽ More

    Submitted 22 February, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

  13. arXiv:1912.07942  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Analyzing Information Leakage of Updates to Natural Language Models

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd, Olga Ohrimenko, Boris Köpf, Marc Brockschmidt

    Abstract: To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{diffe… ▽ More

    Submitted 5 August, 2021; v1 submitted 17 December, 2019; originally announced December 2019.