Skip to main content

Showing 1–14 of 14 results for author: Ghalebikesabi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2505.18773  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

    Authors: Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper

    Abstract: State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, wea… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  3. arXiv:2409.13903  [pdf, other

    cs.AI

    CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data

    Authors: Zhao Cheng, Diane Wan, Matthew Abueg, Sahra Ghalebikesabi, Ren Yi, Eugene Bagdasarian, Borja Balle, Stefan Mellem, Shawn O'Banion

    Abstract: Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal informatio… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  4. arXiv:2408.02373  [pdf, other

    cs.AI

    Operationalizing Contextual Integrity in Privacy-Conscious Assistants

    Authors: Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

    Abstract: Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-shar… ▽ More

    Submitted 13 September, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  5. arXiv:2405.05175  [pdf, other

    cs.CR cs.CL cs.LG

    AirGapAgent: Protecting Privacy-Conscious Conversational Agents

    Authors: Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

    Abstract: The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into re… ▽ More

    Submitted 18 September, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: at CCS'24

  6. arXiv:2402.00072  [pdf, ps, other

    cs.LG stat.ME stat.ML

    Explainable AI for survival analysis: a median-SHAP approach

    Authors: Lucile Ter-Minassian, Sahra Ghalebikesabi, Karla Diaz-Ordaz, Chris Holmes

    Abstract: With the adoption of machine learning into routine clinical practice comes the need for Explainable AI methods tailored to medical applications. Shapley values have sparked wide interest for locally explaining models. Here, we demonstrate their interpretation strongly depends on both the summary statistic and the estimator for it, which in turn define what we identify as an 'anchor point'. We show… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: Accepted to the Interpretable Machine Learning for Healthcare (IMLH) workshop of the ICML 2022 Conference

  7. arXiv:2307.05194  [pdf, other

    stat.ML cs.AI cs.CR cs.LG math.ST

    Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

    Authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes

    Abstract: Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian post… ▽ More

    Submitted 27 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  8. arXiv:2305.15027  [pdf, other

    stat.ML cs.AI cs.LG math.ST stat.ME

    A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods

    Authors: Veit David Wild, Sahra Ghalebikesabi, Dino Sejdinovic, Jeremias Knoblauch

    Abstract: We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the l… ▽ More

    Submitted 22 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  9. arXiv:2304.13737  [pdf, other

    q-bio.QM cs.LG

    AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires

    Authors: Melanie F. Pradier, Niranjani Prasad, Paidamoyo Chapfuwa, Sahra Ghalebikesabi, Max Ilse, Steven Woodhouse, Rebecca Elyanow, Javier Zazo, Javier Gonzalez, Julia Greissl, Edward Meeds

    Abstract: Recent advances in immunomics have shown that T-cell receptor (TCR) signatures can accurately predict active or recent infection by leveraging the high specificity of TCR binding to disease antigens. However, the extreme diversity of the adaptive immune repertoire presents challenges in reliably identifying disease-specific TCRs. Population genetics and sequencing depth can also have strong system… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  10. arXiv:2302.13861  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Differentially Private Diffusion Models Generate Useful Synthetic Images

    Authors: Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

    Abstract: The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do n… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  11. arXiv:2206.06462  [pdf, other

    stat.ML cs.LG stat.ME

    Quasi-Bayesian Nonparametric Density Estimation via Autoregressive Predictive Updates

    Authors: Sahra Ghalebikesabi, Chris Holmes, Edwin Fong, Brieuc Lehmann

    Abstract: Bayesian methods are a popular choice for statistical inference in small-data regimes due to the regularization effect induced by the prior. In the context of density estimation, the standard nonparametric Bayesian approach is to target the posterior predictive of the Dirichlet process mixture model. In general, direct estimation of the posterior predictive is intractable and so methods typically… ▽ More

    Submitted 18 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

  12. arXiv:2108.10934  [pdf, other

    stat.ML cs.CR cs.LG

    Mitigating Statistical Bias within Differentially Private Synthetic Data

    Authors: Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes

    Abstract: Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for generating private synthetic data from undisclosed real data. However, mechanisms of privacy preservation can significantly reduce the utility of synthetic data, which in turn impacts downstream tasks such as learning predictive models or inference. We propose several re-weighting strategies using… ▽ More

    Submitted 19 May, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

  13. arXiv:2106.14648  [pdf, other

    cs.LG stat.CO stat.ME stat.ML

    On Locality of Local Explanation Models

    Authors: Sahra Ghalebikesabi, Lucile Ter-Minassian, Karla Diaz-Ordaz, Chris Holmes

    Abstract: Shapley values provide model agnostic feature attributions for model outcome at a particular instance by simulating feature absence under a global population distribution. The use of a global population can lead to potentially misleading results when local model behaviour is of interest. Hence we consider the formulation of neighbourhood reference distributions that improve the local interpretabil… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Submitted to NeurIPS 2021

  14. arXiv:2103.03532  [pdf, other

    stat.ML cs.LG

    Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness

    Authors: Sahra Ghalebikesabi, Rob Cornish, Luke J. Kelly, Chris Holmes

    Abstract: We propose a variational autoencoder architecture to model both ignorable and nonignorable missing data using pattern-set mixtures as proposed by Little (1993). Our model explicitly learns to cluster the missing data into missingness pattern sets based on the observed data and missingness masks. Underpinning our approach is the assumption that the data distribution under missingness is probabilist… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: International Conference on Artificial Intelligence and Statistics (AISTATS)

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021