Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Wang, Zi; Anshumaan, Divyam; Hooda, Ashish; Chen, Yudong; Jha, Somesh

Computer Science > Machine Learning

arXiv:2410.04234 (cs)

[Submitted on 5 Oct 2024 (v1), last revised 16 Feb 2025 (this version, v2)]

Title:Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Authors:Zi Wang, Divyam Anshumaan, Ashish Hooda, Yudong Chen, Somesh Jha

View PDF HTML (experimental)

Abstract:Optimization methods are widely employed in deep learning to identify and mitigate undesired model responses. While gradient-based techniques have proven effective for image models, their application to language models is hindered by the discrete nature of the input space. This study introduces a novel optimization approach, termed the \emph{functional homotopy} method, which leverages the functional duality between model training and input generation. By constructing a series of easy-to-hard optimization problems, we iteratively solve these problems using principles derived from established homotopy methods. We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a $20\%-30\%$ improvement in success rate over existing methods in circumventing established safe open-source models such as Llama-2 and Llama-3.

Comments:	Published at ICLR 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2410.04234 [cs.LG]
	(or arXiv:2410.04234v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.04234

Submission history

From: Zi Wang [view email]
[v1] Sat, 5 Oct 2024 17:22:39 UTC (817 KB)
[v2] Sun, 16 Feb 2025 02:16:15 UTC (1,154 KB)

Computer Science > Machine Learning

Title:Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators