Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

Bappy, Md. Azizul Hakim; Mustafa, Hossen A; Saha, Prottoy; Salehat, Rajinus

Computer Science > Cryptography and Security

arXiv:2504.16584 (cs)

[Submitted on 23 Apr 2025]

Title:Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

Authors:Md. Azizul Hakim Bappy (Institute of Information and Communication Technology, Bangladesh University of Engineering Technology, Dhaka, Bangladesh), Hossen A Mustafa (Institute of Information and Communication Technology, Bangladesh University of Engineering Technology, Dhaka, Bangladesh), Prottoy Saha (Institute of Information and Communication Technology, Bangladesh University of Engineering Technology, Dhaka, Bangladesh), Rajinus Salehat (Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh)

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated significant capabilities in understanding and analyzing code for security vulnerabilities, such as Common Weakness Enumerations (CWEs). However, their reliance on cloud infrastructure and substantial computational requirements pose challenges for analyzing sensitive or proprietary codebases due to privacy concerns and inference costs. This work explores the potential of Small Language Models (SLMs) as a viable alternative for accurate, on-premise vulnerability detection. We investigated whether a 350-million parameter pre-trained code model (codegen-mono) could be effectively fine-tuned to detect the MITRE Top 25 CWEs specifically within Python code. To facilitate this, we developed a targeted dataset of 500 examples using a semi-supervised approach involving LLM-driven synthetic data generation coupled with meticulous human review. Initial tests confirmed that the base codegen-mono model completely failed to identify CWEs in our samples. However, after applying instruction-following fine-tuning, the specialized SLM achieved remarkable performance on our test set, yielding approximately 99% accuracy, 98.08% precision, 100% recall, and a 99.04% F1-score. These results strongly suggest that fine-tuned SLMs can serve as highly accurate and efficient tools for CWE detection, offering a practical and privacy-preserving solution for integrating advanced security analysis directly into development workflows.

Comments:	11 pages, 2 figures, 3 tables. Dataset available at this https URL. Model available at this https URL. Keywords: Small Language Models (SLMs), Vulnerability Detection, CWE, Fine-tuning, Python Security, Privacy-Preserving Code Analysis
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.16584 [cs.CR]
	(or arXiv:2504.16584v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2504.16584

Submission history

From: Md. Azizul Hakim Bappy [view email]
[v1] Wed, 23 Apr 2025 10:05:27 UTC (478 KB)

Computer Science > Cryptography and Security

Title:Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators