Skip to main content

Showing 1–4 of 4 results for author: Kosak-Hine, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09712  [pdf, ps, other

    cs.CR cs.AI cs.CV

    The Structural Safety Generalization Problem

    Authors: Julius Broomfield, Tom Gibbs, Ethan Kosak-Hine, George Ingebretsen, Tia Nasir, Jason Zhang, Reihaneh Iranmanesh, Sara Pieri, Reihaneh Rabbany, Kellin Pelrine

    Abstract: LLM jailbreaks are a widespread safety challenge. Given this problem has not yet been tractable, we suggest targeting a key failure mechanism: the failure of safety to generalize across semantically equivalent inputs. We further focus the target by requiring desirable tractability properties of attacks to study: explainability, transferability between models, and transferability between goals. We… ▽ More

    Submitted 30 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  2. arXiv:2501.10387  [pdf, other

    cs.CY

    Online Influence Campaigns: Strategies and Vulnerabilities

    Authors: Andreea Musulan, Veronica Xia, Ethan Kosak-Hine, Tom Gibbs, Vidya Sujaya, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: In order to combat the creation and spread of harmful content online, this paper defines and contextualizes the concept of inauthentic, societal-scale manipulation by malicious actors. We review the literature on societally harmful content and how it proliferates to analyze the manipulation strategies used by such actors and the vulnerabilities they target. We also provide an overview of three cas… ▽ More

    Submitted 18 December, 2024; originally announced January 2025.

    ACM Class: A.1

  3. arXiv:2410.13915  [pdf, other

    cs.SI cs.AI cs.CY

    A Simulation System Towards Solving Societal-Scale Manipulation

    Authors: Maximilian Puelma Touzel, Sneheel Sarangi, Austin Welch, Gayatri Krishnakumar, Dan Zhao, Zachary Yang, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Andreea Musulan, Camille Thibault, Busra Tugce Gurbuz, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine

    Abstract: The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-world settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to ad… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2409.00137  [pdf, other

    cs.CR cs.AI cs.CL

    Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks

    Authors: Tom Gibbs, Ethan Kosak-Hine, George Ingebretsen, Jason Zhang, Julius Broomfield, Sara Pieri, Reihaneh Iranmanesh, Reihaneh Rabbany, Kellin Pelrine

    Abstract: Large language models (LLMs) are improving at an exceptional rate. However, these models are still susceptible to jailbreak attacks, which are becoming increasingly dangerous as models become increasingly powerful. In this work, we introduce a dataset of jailbreaks where each example can be input in both a single or a multi-turn format. We show that while equivalent in content, they are not equiva… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.