Skip to main content

Showing 1–15 of 15 results for author: Creţu, A

.
  1. arXiv:2506.10117  [pdf, ps, other

    cs.CV cs.ET

    A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild

    Authors: Klim Kireev, Ana-Maria Creţu, Raphael Meier, Sarah Adel Bargal, Elissa Redmiles, Carmela Troncoso

    Abstract: Platforms and the law regulate digital content depicting minors (defined as individuals under 18 years of age) differently from other types of content. Given the sheer amount of content that needs to be assessed, machine learning-based automation tools are commonly used to detect content depicting minors. To our knowledge, no dataset or benchmark currently exists for detecting these identification… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures

  2. arXiv:2506.08837  [pdf, ps, other

    cs.LG cs.CR

    Design Patterns for Securing LLM Agents against Prompt Injections

    Authors: Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn

    Abstract: As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle s… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  3. QueryCheetah: Fast Automated Discovery of Attribute Inference Attacks Against Query-Based Systems

    Authors: Bozhidar Stevanoski, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Query-based systems (QBSs) are one of the key approaches for sharing data. QBSs allow analysts to request aggregate information from a private protected dataset. Attacks are a crucial part of ensuring QBSs are truly privacy-preserving. The development and testing of attacks is however very labor-intensive and unable to cope with the increasing complexity of systems. Automated approaches have been… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: This is an extended version of the ACM CCS paper which includes appendices

  4. arXiv:2406.18671  [pdf, other

    cs.CR cs.LG

    A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

    Authors: Vincent Guan, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: To be published in PETS 2024

  5. Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks

    Authors: Ana-Maria Cretu, Miruna Rusu, Yves-Alexandre de Montjoye

    Abstract: Smart meters, devices measuring the electricity and gas consumption of a household, are currently being deployed at a fast rate throughout the world. The data they collect are extremely useful, including in the fight against climate change. However, these data and the information that can be inferred from them are highly sensitive. Re-pseudonymization, i.e., the frequent replacement of random iden… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Extended version, including the Appendix, of a paper with the same title which will appear in the Proceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy (CODASPY '24). The first two authors contributed equally

  6. arXiv:2307.01701  [pdf, other

    cs.CR cs.AI

    Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

    Authors: Florent Guépin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Synthetic data is emerging as one of the most promising solutions to share individual-level data while safeguarding privacy. While membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data, they currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. Thi… ▽ More

    Submitted 21 September, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Journal ref: ESORICS 2023 workshop Data Privacy Management (DPM) 2023

  7. Deep perceptual hashing algorithms with hidden dual purpose: when client-side scanning does facial recognition

    Authors: Shubham Jain, Ana-Maria Cretu, Antoine Cully, Yves-Alexandre de Montjoye

    Abstract: End-to-end encryption (E2EE) provides strong technical protections to individuals from interferences. Governments and law enforcement agencies around the world have however raised concerns that E2EE also allows illegal content to be shared undetected. Client-side scanning (CSS), using perceptual hashing (PH) to detect known illegal content before it is shared, is seen as a promising solution to pr… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Published at IEEE S&P 2023

    Journal ref: 2023 IEEE Symposium on Security and Privacy (SP), 234-252

  8. Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing

    Authors: Matthieu Meeus, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: Synthetic data is seen as the most promising solution to share individual-level data while preserving privacy. Shadow modeling-based Membership Inference Attacks (MIAs) have become the standard approach to evaluate the privacy risk of synthetic data. While very effective, they require a large number of datasets to be created and models trained to evaluate the risk posed by a single record. The pri… ▽ More

    Submitted 21 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Journal ref: Computer Security ESORICS 2023

  9. arXiv:2306.05093  [pdf, other

    cs.CR cs.LG

    Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting

    Authors: Ana-Maria Cretu, Daniel Jones, Yves-Alexandre de Montjoye, Shruti Tople

    Abstract: Machine learning models have been shown to leak sensitive information about their training datasets. Models are increasingly deployed on devices, raising concerns that white-box access to the model parameters increases the attack surface compared to black-box access which only provides query access. Directly extending the shadow modelling technique from the black-box to the white-box setting has b… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs 2024)

  10. QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems

    Authors: Ana-Maria Cretu, Florimond Houssiau, Antoine Cully, Yves-Alexandre de Montjoye

    Abstract: Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities.… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Published at the ACM CCS 2022 conference. This is an extended version that includes the Appendix

  11. Correlation inference attacks against machine learning models

    Authors: Ana-Maria Creţu, Florent Guépin, Yves-Alexandre de Montjoye

    Abstract: Despite machine learning models being widely used today, the relationship between a model and its training dataset is not well understood. We explore correlation inference attacks, whether and when a model leaks information about the correlations between the input variables of its training dataset. We first propose a model-less attack, where an adversary exploits the spherical parametrization of c… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published in Science Advances. This version contains both the main paper and supplementary material. There are minor editorial differences between this version and the published version. The first two authors contributed equally

    Journal ref: Science Advances, Volume 10, Issue 28, 2024

  12. arXiv:2106.09820  [pdf, other

    cs.CR

    Adversarial Detection Avoidance Attacks: Evaluating the robustness of perceptual hashing-based client-side scanning

    Authors: Shubham Jain, Ana-Maria Cretu, Yves-Alexandre de Montjoye

    Abstract: End-to-end encryption (E2EE) by messaging platforms enable people to securely and privately communicate with one another. Its widespread adoption however raised concerns that illegal content might now be shared undetected. Following the global pushback against key escrow systems, client-side scanning based on perceptual hashing has been recently proposed by tech companies, governments and research… ▽ More

    Submitted 2 August, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: This is a revised version of the paper published at USENIX Security 2022. We now use a semi-automated procedure to remove duplicates from the ImageNet dataset

    Journal ref: 31st USENIX Security Symposium (USENIX Security 22), 2022

  13. arXiv:1910.06042  [pdf, ps, other

    physics.hist-ph

    Diagnosing Disagreements: The Authentication of the Positron 1931-1934

    Authors: Ana-Maria Cretu

    Abstract: This paper bridges a historiographical gap in accounts of the prediction and discovery of the positron by combining three ingredients. First, the prediction and discovery of the positron are situated in the broader context of a period of 'crystallisation' of a research tradition. Second, the prediction and discovery of the positron are discussed in the context of the 'authentication' of the partic… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: Forthcoming in Studies in History and Philosophy of Modern Physics

  14. arXiv:1908.08025  [pdf, other

    cs.CL

    WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

    Authors: Vid Kocijan, Oana-Maria Camburu, Ana-Maria Cretu, Yordan Yordanov, Phil Blunsom, Thomas Lukasiewicz

    Abstract: Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM… ▽ More

    Submitted 13 October, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: Accepted to the EMNLP 2019 conference

    Journal ref: IJCNLP-EMNLP 2019

  15. A Surprisingly Robust Trick for Winograd Schema Challenge

    Authors: Vid Kocijan, Ana-Maria Cretu, Oana-Maria Camburu, Yordan Yordanov, Thomas Lukasiewicz

    Abstract: The Winograd Schema Challenge (WSC) dataset WSC273 and its inference counterpart WNLI are popular benchmarks for natural language understanding and commonsense reasoning. In this paper, we show that the performance of three language models on WSC273 strongly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR). We additionally generate a large unsupervised WS… ▽ More

    Submitted 4 August, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Appeared as part of the ACL 2019 conference