-
A Preliminary Study on Using Large Language Models in Software Pentesting
Authors:
Kumar Shashwat,
Francis Hahn,
Xinming Ou,
Dmitry Goldgof,
Lawrence Hall,
Jay Ligatti,
S. Raj Rajgopalan,
Armin Ziaie Tabari
Abstract:
Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an L…
▽ More
Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only), and evaluate the final system on the testing data. We compare the AI agent's performance on the testing data against the performance of the agent without the prompt engineering. We also compare the AI agent's results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs -- Google's Gemini-pro, as well as OpenAI's GPT-3.5-Turbo and GPT-4-Turbo (with both chat completion and assistant APIs). The results show that using LLMs is a viable approach to build an AI agent for software pentesting that can improve through repeated use and prompt engineering.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
What are Attackers after on IoT Devices? An approach based on a multi-phased multi-faceted IoT honeypot ecosystem and data clustering
Authors:
Armin Ziaie Tabari,
Xinming Ou,
Anoop Singhal
Abstract:
The growing number of Internet of Things (IoT) devices makes it imperative to be aware of the real-world threats they face in terms of cybersecurity. While honeypots have been historically used as decoy devices to help researchers/organizations gain a better understanding of the dynamic of threats on a network and their impact, IoT devices pose a unique challenge for this purpose due to the variet…
▽ More
The growing number of Internet of Things (IoT) devices makes it imperative to be aware of the real-world threats they face in terms of cybersecurity. While honeypots have been historically used as decoy devices to help researchers/organizations gain a better understanding of the dynamic of threats on a network and their impact, IoT devices pose a unique challenge for this purpose due to the variety of devices and their physical connections. In this work, by observing real-world attackers' behavior in a low-interaction honeypot ecosystem, we (1) presented a new approach to creating a multi-phased, multi-faceted honeypot ecosystem, which gradually increases the sophistication of honeypots' interactions with adversaries, (2) designed and developed a low-interaction honeypot for cameras that allowed researchers to gain a deeper understanding of what attackers are targeting, and (3) devised an innovative data analytics method to identify the goals of adversaries. Our honeypots have been active for over three years. We were able to collect increasingly sophisticated attack data in each phase. Furthermore, our data analytics points to the fact that the vast majority of attack activities captured in the honeypots share significant similarity, and can be clustered and grouped to better understand the goals, patterns, and trends of IoT attacks in the wild.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
A First Step Towards Understanding Real-world Attacks on IoT Devices
Authors:
Armin Ziaie Tabari,
Xinming Ou
Abstract:
With the rapid growth of Internet of Things (IoT) devices, it is imperative to proactively understand the real-world cybersecurity threats posed to them. This paper describes our initial efforts towards building a honeypot ecosystem as a means to gathering and analyzing real attack data against IoT devices. A primary condition for a honeypot to yield useful insights is to let attackers believe the…
▽ More
With the rapid growth of Internet of Things (IoT) devices, it is imperative to proactively understand the real-world cybersecurity threats posed to them. This paper describes our initial efforts towards building a honeypot ecosystem as a means to gathering and analyzing real attack data against IoT devices. A primary condition for a honeypot to yield useful insights is to let attackers believe they are real systems used by humans and organizations. IoT devices pose unique challenges in this respect, due to the large variety of device types and the physical-connectedness nature. We thus create a multiphased approach in building a honeypot ecosystem, where researchers can gradually increase a low-interaction honeypot's sophistication in emulating an IoT device by observing real-world attackers' behaviors. We deployed honeypots both on-premise and in the cloud, with associated analysis and vetting infrastructures to ensure these honeypots cannot be easily identified as such and appear to be real systems. In doing so we were able to attract increasingly sophisticated attack data. We present the design of this honeypot ecosystem and our observation on the attack data so far. Our data shows that real-world attackers are explicitly going after IoT devices, and some captured activities seem to involve direct human interaction (as opposed to scripted automatic activities). We also build a low interaction honeypot for IoT cameras, called Honeycamera, that present to attackers seemingly real videos. This is our first step towards building a more comprehensive honeypot ecosystem that will allow researchers to gain concrete understanding of what attackers are going after on IoT devices, so as to more proactively protect them.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.