Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Kumar, Divyanshu; Kumar, Anurakt; Agarwal, Sahil; Harshangi, Prashanth

Computer Science > Cryptography and Security

arXiv:2404.04392 (cs)

[Submitted on 5 Apr 2024 (v1), last revised 9 Sep 2024 (this version, v3)]

Title:Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Authors:Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have gained widespread adoption across various domains, including chatbots and auto-task completion agents. However, these models are susceptible to safety vulnerabilities such as jailbreaking, prompt injection, and privacy leakage attacks. These vulnerabilities can lead to the generation of malicious content, unauthorized actions, or the disclosure of confidential information. While foundational LLMs undergo alignment training and incorporate safety measures, they are often subject to fine-tuning, or doing quantization resource-constrained environments. This study investigates the impact of these modifications on LLM safety, a critical consideration for building reliable and secure AI systems. We evaluate foundational models including Mistral, Llama series, Qwen, and MosaicML, along with their fine-tuned variants. Our comprehensive analysis reveals that fine-tuning generally increases the success rates of jailbreak attacks, while quantization has variable effects on attack success rates. Importantly, we find that properly implemented guardrails significantly enhance resistance to jailbreak attempts. These findings contribute to our understanding of LLM vulnerabilities and provide insights for developing more robust safety strategies in the deployment of language models.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.04392 [cs.CR]
	(or arXiv:2404.04392v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2404.04392

Submission history

From: Divyanshu Kumar [view email]
[v1] Fri, 5 Apr 2024 20:31:45 UTC (1,440 KB)
[v2] Mon, 29 Jul 2024 07:24:49 UTC (21,433 KB)
[v3] Mon, 9 Sep 2024 06:25:33 UTC (10,017 KB)

Computer Science > Cryptography and Security

Title:Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators