Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

Lin, Jie; Mohaisen, David

Computer Science > Cryptography and Security

arXiv:2502.00064 (cs)

[Submitted on 30 Jan 2025]

Title:Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

Authors:Jie Lin, David Mohaisen

View PDF HTML (experimental)

Abstract:This study examines the impact of tokenized Java code length on the accuracy and explicitness of ten major LLMs in vulnerability detection. Using chi-square tests and known ground truth, we found inconsistencies across models: some, like GPT-4, Mistral, and Mixtral, showed robustness, while others exhibited a significant link between tokenized length and performance. We recommend future LLM development focus on minimizing the influence of input length for better vulnerability detection. Additionally, preprocessing techniques that reduce token count while preserving code structure could enhance LLM accuracy and explicitness in these tasks.

Comments:	5 pages, 2 tables. Appeared in ICMLA 2024
Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2502.00064 [cs.CR]
	(or arXiv:2502.00064v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2502.00064

Submission history

From: David Mohaisen [view email]
[v1] Thu, 30 Jan 2025 20:44:46 UTC (79 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CR

< prev | next >

new | recent | 2025-02

Change to browse by:

cs
cs.LG

References & Citations

export BibTeX citation

Computer Science > Cryptography and Security

Title:Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators