Search | arXiv e-print repository

arXiv:2504.09465 [pdf, other]

Evolutionary Defense: Advancing Moving Target Strategies with Bio-Inspired Reinforcement Learning to Secure Misconfigured Software Applications

Authors: Niloofar Heidarikohol, Shuvalaxmi Dass, Akbar Siami Namin

Abstract: Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually… ▽ More Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually changes the attack surface of the system, thwarting potential threats. In the previous research, we developed a proof of concept for a single-player MTD game model called RL-MTD, which utilizes Reinforcement Learning (RL) to generate dynamic secure configurations. While the model exhibited satisfactory performance in generating secure configurations, it grappled with an unoptimized and sparse search space, leading to performance issues. To tackle this obstacle, this paper addresses the search space optimization problem by leveraging two bio-inspired search algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Additionally, we extend our base RL-MTD model by integrating these algorithms, resulting in the creation of PSO-RL andGA-RL. We compare the performance of three models: base RL-MTD, GA-RL, and PSO-RL, across four misconfigured SUTs in terms of generating the most secure configuration. Results show that the optimal search space derived from both GA-RL and PSO-RL significantly enhances the performance of the base RL-MTD model compared to the version without optimized search space. While both GA-RL and PSO-RL demonstrate effective search capabilities, PSO-RL slightly outperforms GA-RL for most SUTs. Overall, both algorithms excel in seeking an optimal search space which in turn improves the performance of the model in generating optimal secure configuration. △ Less

Submitted 13 April, 2025; originally announced April 2025.

arXiv:2411.18731 [pdf, other]

The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data

Authors: Saroj Gopali, Sima Siami-Namini, Faranak Abri, Akbar Siami Namin

Abstract: As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of t… ▽ More As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual Information, and 4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset seperatly whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the ``temperature'' parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2405.19578 [pdf, other]

The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models

Authors: Denish Omondi Otieno, Faranak Abri, Sima Siami-Namini, Akbar Siami Namin

Abstract: Large language models (LLMs) have attracted considerable attention as they are capable of showcasing impressive capabilities generating comparable high-quality responses to human inputs. LLMs, can not only compose textual scripts such as emails and essays but also executable programming code. Contrary, the automated reasoning capability of these LLMs in performing statistically-driven descriptive… ▽ More Large language models (LLMs) have attracted considerable attention as they are capable of showcasing impressive capabilities generating comparable high-quality responses to human inputs. LLMs, can not only compose textual scripts such as emails and essays but also executable programming code. Contrary, the automated reasoning capability of these LLMs in performing statistically-driven descriptive analysis, particularly on user-specific data and as personal assistants to users with limited background knowledge in an application domain who would like to carry out basic, as well as advanced statistical and domain-specific analysis is not yet fully explored. More importantly, the performance of these LLMs has not been compared and discussed in detail when domain-specific data analysis tasks are needed. This study, consequently, explores whether LLMs can be used as generative AI-based personal assistants to users with minimal background knowledge in an application domain infer key data insights. To demonstrate the performance of the LLMs, the study reports a case study through which descriptive statistical analysis, as well as Natural Language Processing (NLP) based investigations, are performed on a number of phishing emails with the objective of comparing the accuracy of the results generated by LLMs to the ones produced by analysts. The experimental results show that LangChain and the Generative Pre-trained Transformer (GPT-4) excel in numerical reasoning tasks i.e., temporal statistical analysis, achieve competitive correlation with human judgments on feature engineering tasks while struggle to some extent on domain specific knowledge reasoning, where domain-specific knowledge is required. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2404.09802 [pdf, other]

The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs

Authors: Saroj Gopali, Akbar S. Namin, Faranak Abri, Keith S. Jones

Abstract: Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be hi… ▽ More Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be highly responsive, adaptive, and scalable. Fortunately, advances in the field of machine learning, coupled with access to vast amounts of data, have led to the adoption of various deep learning models for timely detection of these cyber crimes. This study focuses on the detection of phishing websites using deep learning models such as Multi-Head Attention, Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the phishing websites are treated as a sequence. The results demonstrate that Multi-Head Attention and BI-LSTM model outperform some other deep learning-based algorithms such as TCN and LSTM in producing better precision, recall, and F1-scores. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2311.14876 [pdf, other]

Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles

Authors: Sonali Singh, Faranak Abri, Akbar Siami Namin

Abstract: With the recent advent of Large Language Models (LLMs), such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these language models heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models t… ▽ More With the recent advent of Large Language Models (LLMs), such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these language models heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models to not only ensure its quality but also possible misuses of such models by potential adversaries for illegal activities such as hacking. This paper presents a novel study focusing on exploitation of such large language models against deceptive interactions. More specifically, the paper leverages widespread and borrows well-known techniques in deception theory to investigate whether these models are susceptible to deceitful interactions. This research aims not only to highlight these risks but also to pave the way for robust countermeasures that enhance the security and integrity of language models in the face of sophisticated social engineering tactics. Through systematic experiments and analysis, we assess their performance in these critical security domains. Our results demonstrate a significant finding in that these large language models are susceptible to deception and social engineering attacks. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 10 pages, 16 tables, 5 figures, IEEE BigData 2023 (Workshops)

arXiv:2306.17338 [pdf, ps, other]

A Survey on Blockchain-Based Federated Learning and Data Privacy

Authors: Bipin Chhetri, Saroj Gopali, Rukayat Olapojoye, Samin Dehbash, Akbar Siami Namin

Abstract: Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated le… ▽ More Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated learning has the drawback of data leakage due to the lack of privacy-preserving mechanisms employed during storage, transfer, and sharing, thus posing significant risks to data owners and suppliers. Blockchain technology has emerged as a promising technology for offering secure data-sharing platforms in federated learning, especially in Industrial Internet of Things (IIoT) settings. This survey aims to compare the performance and security of various data privacy mechanisms adopted in blockchain-based federated learning architectures. We conduct a systematic review of existing literature on secure data-sharing platforms for federated learning provided by blockchain technology, providing an in-depth overview of blockchain-based federated learning, its essential components, and discussing its principles, and potential applications. The primary contribution of this survey paper is to identify critical research questions and propose potential directions for future research in blockchain-based federated learning. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted in COMPSAC 2023IEEE 47th Annual Computers, Software, and Applications Conference Total 10 pages 9 pages and 1 page citation

arXiv:2112.09296 [pdf, other]

A Survey on the Applications of Blockchains in Security of IoT Systems

Authors: Zulfiqar Ali Khan, Akbar Siami Namin

Abstract: The Internet of Things (IoT) has already changed our daily lives by integrating smart devices together towards delivering high quality services to its clients. These devices when integrated together form a network through which massive amount of data can be produced, transferred, and shared. A critical concern is the security and integrity of such a complex platform to ensure the sustainability an… ▽ More The Internet of Things (IoT) has already changed our daily lives by integrating smart devices together towards delivering high quality services to its clients. These devices when integrated together form a network through which massive amount of data can be produced, transferred, and shared. A critical concern is the security and integrity of such a complex platform to ensure the sustainability and reliability of these IoT-based systems. Blockchain is an emerging technology that has demonstrated its unique features and capabilities for different problems and application domains including IoT-based systems. This survey paper reviews the adaptation of Blockchain in the context of IoT to represent how this technology is capable of addressing the integration and security problems of devices connected to IoT systems. The innovation of this survey is that we present a survey based upon the integration approaches and security issues of IoT data and discuss the role of Blockchain in connection with these issues. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 15 pages, IEEE Bigdata 2021

arXiv:2112.09293 [pdf, other]

A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and TCN Models

Authors: Saroj Gopali, Faranak Abri, Sima Siami-Namini, Akbar Siami Namin

Abstract: There exist several data-driven approaches that enable us model time series data including traditional regression-based modeling approaches (i.e., ARIMA). Recently, deep learning techniques have been introduced and explored in the context of time series analysis and prediction. A major research question to ask is the performance of these many variations of deep learning techniques in predicting ti… ▽ More There exist several data-driven approaches that enable us model time series data including traditional regression-based modeling approaches (i.e., ARIMA). Recently, deep learning techniques have been introduced and explored in the context of time series analysis and prediction. A major research question to ask is the performance of these many variations of deep learning techniques in predicting time series data. This paper compares two prominent deep learning modeling techniques. The Recurrent Neural Network (RNN)-based Long Short-Term Memory (LSTM) and the convolutional Neural Network (CNN)-based Temporal Convolutional Networks (TCN) are compared and their performance and training time are reported. According to our experimental results, both modeling techniques perform comparably having TCN-based models outperform LSTM slightly. Moreover, the CNN-based TCN model builds a stable model faster than the RNN-based LSTM models. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 15 pages, 3 figures, IEEE BigData 2021

arXiv:2106.02012 [pdf, other]

Attack Prediction using Hidden Markov Model

Authors: Shuvalaxmi Dass, Prerit Datta, Akbar Siami Namin

Abstract: It is important to predict any adversarial attacks and their types to enable effective defense systems. Often it is hard to label such activities as malicious ones without adequate analytical reasoning. We propose the use of Hidden Markov Model (HMM) to predict the family of related attacks. Our proposed model is based on the observations often agglomerated in the form of log files and from the ta… ▽ More It is important to predict any adversarial attacks and their types to enable effective defense systems. Often it is hard to label such activities as malicious ones without adequate analytical reasoning. We propose the use of Hidden Markov Model (HMM) to predict the family of related attacks. Our proposed model is based on the observations often agglomerated in the form of log files and from the target or the victim's perspective. We have built an HMM-based prediction model and implemented our proposed approach using Viterbi algorithm, which generates a sequence of states corresponding to stages of a particular attack. As a proof of concept and also to demonstrate the performance of the model, we have conducted a case study on predicting a family of attacks called Action Spoofing. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 20 pages, 4 figures, COMPSAC'21

arXiv:2106.01998 [pdf, other]

Toward Explainable Users: Using NLP to Enable AI to Understand Users' Perceptions of Cyber Attacks

Authors: Faranak Abri, Luis Felipe Gutierrez, Chaitra T. Kulkarni, Akbar Siami Namin, Keith S. Jones

Abstract: To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-particip… ▽ More To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-participant variation making the research team wonder how the consequences of security attacks were comprehended by the participants. As an exploration of whether it is possible to explain user's mental model and behavior through Artificial Intelligence (AI) techniques, the research team compared the card sorting data with the outputs of a number of Natural Language Processing (NLP) techniques with the goal of understanding how participants perceived and interpreted the consequences of cyber attacks written in natural languages. The results of the NLP-based exploration methods revealed an interesting observation implying that participants had mostly employed checking individual keywords in each sentence to group cyber attack consequences together and less considered the semantics behind the description of consequences of cyber attacks. The results reported in this paper are seemingly useful and important for cyber attacks comprehension from user's perspectives. To the best of our knowledge, this paper is the first introducing the use of AI techniques in explaining and modeling users' behavior and their perceptions about a context. The novel idea introduced here is about explaining users using AI. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: 20 pages, 3 figures, COMPSAC'21

arXiv:2012.14488 [pdf, other]

Phishing Detection through Email Embeddings

Authors: Luis Felipe Gutiérrez, Faranak Abri, Miriam Armstrong, Akbar Siami Namin, Keith S. Jones

Abstract: The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phis… ▽ More The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phishing features are contributing to variations of the classifiers. In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings, i.e., vectorizations. We then fed machine learning classifiers with the carefully crafted emails to find out about the performance of email embeddings developed. Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2012.14481 [pdf, other]

A Survey on Vulnerabilities of Ethereum Smart Contracts

Authors: Zulfiqar Ali Khan, Akbar Siami Namin

Abstract: Smart contract (SC) is an extension of BlockChain technology. Ethereum BlockChain was the first to incorporate SC and thus started a new era of crypto-currencies and electronic transactions. Solidity helps to program the SCs. Still, soon after Solidity's emergence in 2014, Solidity-based SCs suffered many attacks that deprived the SC account holders of their precious funds. The main reason for the… ▽ More Smart contract (SC) is an extension of BlockChain technology. Ethereum BlockChain was the first to incorporate SC and thus started a new era of crypto-currencies and electronic transactions. Solidity helps to program the SCs. Still, soon after Solidity's emergence in 2014, Solidity-based SCs suffered many attacks that deprived the SC account holders of their precious funds. The main reason for these attacks was the presence of vulnerabilities in SC. This paper discusses SC vulnerabilities and classifies them according to the domain knowledge of the faulty operations. This classification is a source of reminding developers and software engineers that for SC's safety, each SC requires proper testing with effective tools to catch those classes' vulnerabilities. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2012.02643 [pdf, other]

Predicting Emotions Perceived from Sounds

Authors: Faranak Abri, Luis Felipe Gutiérrez, Akbar Siami Namin, David R. W. Sears, Keith S. Jones

Abstract: Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as… ▽ More Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as a complement to visualization techniques. Through auditory perception it is possible to convey information related to temporal, spatial, or some other context-oriented information. An important research question is whether the emotions perceived from these auditory icons or earcons are predictable in order to build an automated sonification platform. This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed to study the prediction of emotions perceived from sounds. To do so, the key features of sounds are captured and then are modeled using machine learning algorithms using feature reduction techniques. We observe that it is possible to predict perceived emotions with high accuracy. In particular, the regression based on Random Forest demonstrated its superiority compared to other machine learning algorithms. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 10 pages

arXiv:2012.00648 [pdf, other]

Cyber-Attack Consequence Prediction

Authors: Prerit Datta, Natalie Lodinger, Akbar Siami Namin, Keith S. Jones

Abstract: Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety… ▽ More Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety of cyber-alerts on a daily basis many of which turn out to be false positives. In this paper, we use machine learning and natural language processing techniques to predict the consequences of cyberattacks. The idea is to enable security researchers to have tools at their disposal that makes it easier to communicate the attack consequences with various stakeholders who may have little to no cybersecurity expertise. Additionally, with the proposed approach researchers' cognitive load can be reduced by automatically predicting the consequences of attacks in case new attacks are discovered. We compare the performance through various machine learning models employing word vectors obtained using both tf-idf and Doc2Vec models. In our experiments, an accuracy of 60% was obtained using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model. △ Less

Submitted 2 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 9 pages. The pre-print of a paper to appear in the proceedings of the 3rd Workshop on Big Data Engineering and Analytics in Cyber-Physical Systems (BigEACPS'20), IEEE BigData Conference 2020

arXiv:2010.04260 [pdf, other]

Fake Reviews Detection through Analysis of Linguistic Features

Authors: Faranak Abri, Luis Felipe Gutierrez, Akbar Siami Namin, Keith S. Jones, David R. W. Sears

Abstract: Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We pre… ▽ More Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We present a detailed analysis of linguistic features for distinguishing fake and trustworthy online reviews. We study 15 linguistic features and measure their significance and importance towards the classification schemes employed in this study. Our results indicate that fake reviews tend to include more redundant terms and pauses, and generally contain longer sentences. The application of several machine learning classification algorithms revealed that we were able to discriminate fake from real reviews with high accuracy using these linguistic features. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: The pre-print of a paper to appear in the proceedings of the IEEE International Conference on Machine Learning Applications (ICMLA 2020), 11 pages, 3 figures, 5 tables

arXiv:2010.04257 [pdf, other]

Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition

Authors: Varsha Nair, Moitrayee Chatterjee, Neda Tavakoli, Akbar Siami Namin, Craig Snoeyink

Abstract: This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs involved in the Convolutional Neural Networks (CNNs) and thus reduces the overall computational costs. The proposed model identifies the object information fro… ▽ More This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs involved in the Convolutional Neural Networks (CNNs) and thus reduces the overall computational costs. The proposed model identifies the object information from the images. We apply the Fast Fourier transform algorithm on an image data set to obtain more accessible information about the image data, before segmenting them through the U-Net architecture. More specifically, we implement the FFT-based convolutional neural network to improve the training time of the network. The proposed approach was applied to publicly available Broad Bioimage Benchmark Collection (BBBC) dataset. Our model demonstrated improvement in training time during convolution from $600-700$ ms/step to $400-500$ ms/step. We evaluated the accuracy of our model using Intersection over Union (IoU) metric showing significant improvements. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Pre-print of a paper to appear in the proceedings of the IEEE International Conference on Machine Learning Applications (ICMLA 2020), 10 pages, 9 figures, 1 table

arXiv:2006.08606 [pdf, other]

Vulnerability Coverage as an Adequacy Testing Criterion

Authors: Shuvalaxmi Dass, Akbar Siami Namin

Abstract: Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools… ▽ More Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools make the exhaustive testing of these tools infeasible against these vulnerabilities infeasible. As an instance of general software testing problem, the research question to address is whether the system under test is robust and secure against these vulnerabilities. This paper introduces the idea of ``vulnerability coverage,'' a concept to adequately test a given application for a certain classes of vulnerabilities, as reported by the National Vulnerability Database (NVD). The deriving idea is to utilize the Common Vulnerability Scoring System (CVSS) as a means to measure the fitness of test inputs generated by evolutionary algorithms and then through pattern matching identify vulnerabilities that match the generated vulnerability vectors and then test the system under test for those identified vulnerabilities. We report the performance of two evolutionary algorithms (i.e., Genetic Algorithms and Particle Swarm Optimization) in generating the vulnerability pattern vectors. △ Less