Skip to main content

Showing 1–2 of 2 results for author: Pernisi, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04522  [pdf, other

    cs.CL

    Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

    Authors: Fabio Pernisi, Dirk Hovy, Paul Röttger

    Abstract: As diverse linguistic communities and users adopt large language models (LLMs), assessing their safety across languages becomes critical. Despite ongoing efforts to make LLMs safe, they can still be made to behave unsafely with jailbreaking, a technique in which models are prompted to act outside their operational guidelines. Research on LLM safety and jailbreaking, however, has so far mostly focu… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted at ACL 2024 (Student Research Workshop)

  2. arXiv:2404.05399  [pdf, other

    cs.CL cs.AI

    SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

    Authors: Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

    Abstract: The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic… ▽ More

    Submitted 10 January, 2025; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at AAAI 2025 (Special Track on AI Alignment)