Skip to main content

Showing 1–1 of 1 results for author: Langberg, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.01446  [pdf, other

    cs.CL cs.CV cs.NE

    Open Sesame! Universal Black Box Jailbreaking of Large Language Models

    Authors: Raz Lapid, Ron Langberg, Moshe Sipper

    Abstract: Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignment techniques to align with user intent and social guidelines. Unfortunately, this alignment can be exploited by malicious actors seeking to manipulate an LLM's outputs for unintended purposes. In this paper we introduce a novel approach that employs a genetic algorithm (GA) to manipulate LLMs when m… ▽ More

    Submitted 5 August, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: Accepted at SeT-LLM @ ICLR 2024

    Journal ref: ICLR 2024 Workshop on Secure and Trustworthy Large Language Models