Skip to main content

Showing 1–4 of 4 results for author: Glukhov, D

.
  1. arXiv:2504.05147  [pdf, other

    cs.CR cs.LG

    Pr$εε$mpt: Sanitizing Sensitive Prompts for LLMs

    Authors: Amrita Roy Chowdhury, David Glukhov, Divyam Anshumaan, Prasad Chalasani, Nicolas Papernot, Somesh Jha, Mihir Bellare

    Abstract: The rise of large language models (LLMs) has introduced new privacy challenges, particularly during inference where sensitive information in prompts may be exposed to proprietary LLM APIs. In this paper, we address the problem of formally protecting the sensitive information contained in a prompt while maintaining response quality. To this end, first, we introduce a cryptographically inspired noti… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  2. arXiv:2407.02551  [pdf, other

    cs.CR cs.AI cs.CY

    Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses

    Authors: David Glukhov, Ziwen Han, Ilia Shumailov, Vardan Papyan, Nicolas Papernot

    Abstract: Vulnerability of Frontier language models to misuse and jailbreaks has prompted the development of safety measures like filters and alignment training in an effort to ensure safety through robustness to adversarially crafted prompts. We assert that robustness is fundamentally insufficient for ensuring safety goals, and current defenses and evaluation methods fail to account for risks of dual-inten… ▽ More

    Submitted 30 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2307.10719  [pdf, other

    cs.AI cs.CL cs.CR cs.LG

    LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

    Authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan

    Abstract: Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  4. arXiv:2306.08656  [pdf, other

    cs.LG cs.CR

    Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

    Authors: Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, Nicolas Papernot

    Abstract: Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differe… ▽ More

    Submitted 20 December, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 29 pages, 19 figures. Accepted at TMLR in 2024. Link: https://openreview.net/forum?id=YN0IcnXqsr