Skip to main content

Showing 1–2 of 2 results for author: Wahréus, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.21598  [pdf, other

    cs.CR cs.AI cs.LG

    Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing

    Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos

    Abstract: Large Language Models (LLMs) have transformed task automation and content generation across various domains while incorporating safety filters to prevent misuse. We introduce a novel jailbreaking framework that employs distributed prompt processing combined with iterative refinements to bypass these safety measures, particularly in generating malicious code. Our architecture consists of four key m… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 22 pages; 26 figures

  2. arXiv:2501.01335  [pdf, other

    cs.CR cs.AI cs.LG

    CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

    Authors: Johan Wahréus, Ahmed Mohamed Hussain, Panos Papadimitratos

    Abstract: Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, p… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.