Search | arXiv e-print repository

Using LLMs for Security Advisory Investigations: How Far Are We?

Authors: Bayu Fedra Abdullah, Yusuf Sulistyo Nugroho, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto

Abstract: Large Language Models (LLMs) are increasingly used in software security, but their trustworthiness in generating accurate vulnerability advisories remains uncertain. This study investigates the ability of ChatGPT to (1) generate plausible security advisories from CVE-IDs, (2) differentiate real from fake CVE-IDs, and (3) extract CVE-IDs from advisory descriptions. Using a curated dataset of 100 re… ▽ More Large Language Models (LLMs) are increasingly used in software security, but their trustworthiness in generating accurate vulnerability advisories remains uncertain. This study investigates the ability of ChatGPT to (1) generate plausible security advisories from CVE-IDs, (2) differentiate real from fake CVE-IDs, and (3) extract CVE-IDs from advisory descriptions. Using a curated dataset of 100 real and 100 fake CVE-IDs, we manually analyzed the credibility and consistency of the model's outputs. The results show that ChatGPT generated plausible security advisories for 96% of given input real CVE-IDs and 97% of given input fake CVE-IDs, demonstrating a limitation in differentiating between real and fake IDs. Furthermore, when these generated advisories were reintroduced to ChatGPT to identify their original CVE-ID, the model produced a fake CVE-ID in 6% of cases from real advisories. These findings highlight both the strengths and limitations of ChatGPT in cybersecurity applications. While the model demonstrates potential for automating advisory generation, its inability to reliably authenticate CVE-IDs or maintain consistency upon re-evaluation underscores the risks associated with its deployment in critical security tasks. Our study emphasizes the importance of using LLMs with caution in cybersecurity workflows and suggests the need for further improvements in their design to improve reliability and applicability in security advisory generation. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 6 pages, 6 figures, 8 tables, conference paper

arXiv:2403.02610 [pdf, ps, other]

ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level Generation

Authors: Pittawat Taveekitworachai, Febri Abdullah, Mury F. Dewantoro, Yi Xia, Pratch Suntichaikul, Ruck Thawonmas, Julian Togelius, Jochen Renz

Abstract: This paper presents the second ChatGPT4PCG competition at the 2024 IEEE Conference on Games. In this edition of the competition, we follow the first edition, but make several improvements and changes. We introduce a new evaluation metric along with allowing a more flexible format for participants' submissions and making several improvements to the evaluation pipeline. Continuing from the first edi… ▽ More This paper presents the second ChatGPT4PCG competition at the 2024 IEEE Conference on Games. In this edition of the competition, we follow the first edition, but make several improvements and changes. We introduce a new evaluation metric along with allowing a more flexible format for participants' submissions and making several improvements to the evaluation pipeline. Continuing from the first edition, we aim to foster and explore the realm of prompt engineering (PE) for procedural content generation (PCG). While the first competition saw success, it was hindered by various limitations; we aim to mitigate these limitations in this edition. We introduce diversity as a new metric to discourage submissions aimed at producing repetitive structures. Furthermore, we allow submission of a Python program instead of a prompt text file for greater flexibility in implementing advanced PE approaches, which may require control flow, including conditions and iterations. We also make several improvements to the evaluation pipeline with a better classifier for similarity evaluation and better-performing function signatures. We thoroughly evaluate the effectiveness of the new metric and the improved classifier. Additionally, we perform an ablation study to select a function signature to instruct ChatGPT for level generation. Finally, we provide implementation examples of various PE techniques in Python and evaluate their preliminary performance. We hope this competition serves as a resource and platform for learning about PE and PCG in general. △ Less

Submitted 4 March, 2024; originally announced March 2024.

ACM Class: I.2.7; I.2.8

arXiv:2401.08273 [pdf, other]

Large Language Models are Null-Shot Learners

Authors: Pittawat Taveekitworachai, Febri Abdullah, Ruck Thawonmas

Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which th… ▽ More This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show that it is possible to utilize null-shot prompting as a way to detect degrees of hallucination in LLMs using existing benchmarking datasets. We also perform ablation studies, including experimenting with a modified version of null-shot prompting that incorporates ideas from zero-shot chain-of-thought prompting, which shows different trends of results. △ Less

Submitted 15 November, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 28 pages; v2: added Gemini Pro results, error analysis, and a discussion on confabulation; v3: see its extended version, an EMNLP 2024 paper, at https://aclanthology.org/2024.emnlp-main.740/

arXiv:2303.15662 [pdf, other]

ChatGPT4PCG Competition: Character-like Level Generation for Science Birds

Authors: Pittawat Taveekitworachai, Febri Abdullah, Mury F. Dewantoro, Ruck Thawonmas, Julian Togelius, Jochen Renz

Abstract: This paper presents the first ChatGPT4PCG Competition at the 2023 IEEE Conference on Games. The objective of this competition is for participants to create effective prompts for ChatGPT--enabling it to generate Science Birds levels with high stability and character-like qualities--fully using their creativity as well as prompt engineering skills. ChatGPT is a conversational agent developed by Open… ▽ More This paper presents the first ChatGPT4PCG Competition at the 2023 IEEE Conference on Games. The objective of this competition is for participants to create effective prompts for ChatGPT--enabling it to generate Science Birds levels with high stability and character-like qualities--fully using their creativity as well as prompt engineering skills. ChatGPT is a conversational agent developed by OpenAI. Science Birds is selected as the competition platform because designing an Angry Birds-like level is not a trivial task due to the in-game gravity; the quality of the levels is determined by their stability. To lower the entry barrier to the competition, we limit the task to the generation of capitalized English alphabetical characters. We also allow only a single prompt to be used for generating all the characters. Here, the quality of the generated levels is determined by their stability and similarity to the given characters. A sample prompt is provided to participants for their reference. An experiment is conducted to determine the effectiveness of several modified versions of this sample prompt on level stability and similarity by testing them on several characters. To the best of our knowledge, we believe that ChatGPT4PCG is the first competition of its kind and hope to inspire enthusiasm for prompt engineering in procedural content generation. △ Less

Submitted 20 March, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: This paper accepted for presentation at IEEE CoG 2023 is made available for participants of ChatGPT4PCG Competition (https://chatgpt4pcg.github.io/) and readers interested in relevant areas. In this PDF version, the affiliation symbol of Julian Togelius has been revised

ACM Class: I.2.7; I.2.8

arXiv:2102.08035 [pdf]

Design a Technology Based on the Fusion of Genetic Algorithm, Neural network and Fuzzy logic

Authors: Raid R. Al-Nima, Fawaz S. Abdullah, Ali N. Hamoodi

Abstract: This paper describes the design and development of a prototype technique for artificial intelligence based on the fusion of genetic algorithm, neural network and fuzzy logic. It starts by establishing a relationship between the neural network and fuzzy logic. Then, it combines the genetic algorithm with them. Information fusions are at the confidence level, where matching scores can be reported an… ▽ More This paper describes the design and development of a prototype technique for artificial intelligence based on the fusion of genetic algorithm, neural network and fuzzy logic. It starts by establishing a relationship between the neural network and fuzzy logic. Then, it combines the genetic algorithm with them. Information fusions are at the confidence level, where matching scores can be reported and discussed. The technique is called the Genetic Neuro-Fuzzy (GNF). It can be used for high accuracy real-time environments. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: 11 pages, 5 figures

arXiv:2102.07195 [pdf, other]

doi 10.1109/ACCESS.2020.3004049

HEVC Watermarking Techniques for Authentication and Copyright Applications: Challenges and Opportunities

Authors: Ali A. Elrowayati, Mohamed A. Alrshah, M. F. L. Abdullah, Rohaya Latip

Abstract: Recently, High-Efficiency Video Coding (HEVC/H.265) has been chosen to replace previous video coding standards, such as H.263 and H.264. Despite the efficiency of HEVC, it still lacks reliable and practical functionalities to support authentication and copyright applications. In order to provide this support, several watermarking techniques have been proposed by many researchers during the last fe… ▽ More Recently, High-Efficiency Video Coding (HEVC/H.265) has been chosen to replace previous video coding standards, such as H.263 and H.264. Despite the efficiency of HEVC, it still lacks reliable and practical functionalities to support authentication and copyright applications. In order to provide this support, several watermarking techniques have been proposed by many researchers during the last few years. However, those techniques are still suffering from many issues that need to be considered for future designs. In this paper, a Systematic Literature Review (SLR) is introduced to identify HEVC challenges and potential research directions for interested researchers and developers. The time scope of this SLR covers all research articles published during the last six years starting from January 2014 up to the end of April 2020. Forty-two articles have met the criteria of selection out of 343 articles published in this area during the mentioned time scope. A new classification has been drawn followed by an identification of the challenges of implementing HEVC watermarking techniques based on the analysis and discussion of those chosen articles. Eventually, recommendations for HEVC watermarking techniques have been listed to help researchers to improve the existing techniques or to design new efficient ones. △ Less

Submitted 14 February, 2021; originally announced February 2021.

Comments: Review article, 20 pages

arXiv:1610.05174 [pdf]

Spatio-temporal Co-Occurrence Characterizations for Human Action Classification

Authors: Aznul Qalid Md Sabri, Jacques Boonaert, Erma Rahayu Mohd Faizal Abdullah, Ali Mohammed Mansoor

Abstract: The human action classification task is a widely researched topic and is still an open problem. Many state-of-the-arts approaches involve the usage of bag-of-video-words with spatio-temporal local features to construct characterizations for human actions. In order to improve beyond this standard approach, we investigate the usage of co-occurrences between local features. We propose the usage of co… ▽ More The human action classification task is a widely researched topic and is still an open problem. Many state-of-the-arts approaches involve the usage of bag-of-video-words with spatio-temporal local features to construct characterizations for human actions. In order to improve beyond this standard approach, we investigate the usage of co-occurrences between local features. We propose the usage of co-occurrences information to characterize human actions. A trade-off factor is used to define an optimal trade-off between vocabulary size and classification rate. Next, a spatio-temporal co-occurrence technique is applied to extract co-occurrence information between labeled local features. Novel characterizations for human actions are then constructed. These include a vector quantized correlogram-elements vector, a highly discriminative PCA (Principal Components Analysis) co-occurrence vector and a Haralick texture vector. Multi-channel kernel SVM (support vector machine) is utilized for classification. For evaluation, the well known KTH as well as the challenging UCF-Sports action datasets are used. We obtained state-of-the-arts classification performance. We also demonstrated that we are able to fully utilize co-occurrence information, and improve the standard bag-of-video-words approach. △ Less

Submitted 1 August, 2016; originally announced October 2016.

Showing 1–7 of 7 results for author: Abdullah, F