-
Evolutionary Defense: Advancing Moving Target Strategies with Bio-Inspired Reinforcement Learning to Secure Misconfigured Software Applications
Authors:
Niloofar Heidarikohol,
Shuvalaxmi Dass,
Akbar Siami Namin
Abstract:
Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually…
▽ More
Improper configurations in software systems often create vulnerabilities, leaving them open to exploitation. Static architectures exacerbate this issue by allowing misconfigurations to persist, providing adversaries with opportunities to exploit them during attacks. To address this challenge, a dynamic proactive defense strategy known as Moving Target Defense (MTD) can be applied. MTD continually changes the attack surface of the system, thwarting potential threats. In the previous research, we developed a proof of concept for a single-player MTD game model called RL-MTD, which utilizes Reinforcement Learning (RL) to generate dynamic secure configurations. While the model exhibited satisfactory performance in generating secure configurations, it grappled with an unoptimized and sparse search space, leading to performance issues. To tackle this obstacle, this paper addresses the search space optimization problem by leveraging two bio-inspired search algorithms: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Additionally, we extend our base RL-MTD model by integrating these algorithms, resulting in the creation of PSO-RL andGA-RL. We compare the performance of three models: base RL-MTD, GA-RL, and PSO-RL, across four misconfigured SUTs in terms of generating the most secure configuration. Results show that the optimal search space derived from both GA-RL and PSO-RL significantly enhances the performance of the base RL-MTD model compared to the version without optimized search space. While both GA-RL and PSO-RL demonstrate effective search capabilities, PSO-RL slightly outperforms GA-RL for most SUTs. Overall, both algorithms excel in seeking an optimal search space which in turn improves the performance of the model in generating optimal secure configuration.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data
Authors:
Saroj Gopali,
Sima Siami-Namini,
Faranak Abri,
Akbar Siami Namin
Abstract:
As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of t…
▽ More
As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leverage LLMs to generate the required models. This paper investigates and compares the performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon, in generating deep learning models for analyzing time series data, an important and popular data type with its prevalent applications in many application domains including financial and stock market. This research conducts a set of controlled experiments where the prompts for generating deep learning-based models are controlled with respect to sensitivity levels of four criteria including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual Information, and 4) Format and Style. While the results are relatively mix, we observe some distinct patterns. We notice that using LLMs, we are able to generate deep learning-based models with executable codes for each dataset seperatly whose performance are comparable with the manually crafted and optimized LSTM models for predicting the whole time series dataset. We also noticed that ChatGPT outperforms the other LLMs in generating more accurate models. Furthermore, we observed that the goodness of the generated models vary with respect to the ``temperature'' parameter used in configuring LLMS. The results can be beneficial for data analysts and practitioners who would like to leverage generative AIs to produce good prediction models with acceptable goodness.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
The Accuracy of Domain Specific and Descriptive Analysis Generated by Large Language Models
Authors:
Denish Omondi Otieno,
Faranak Abri,
Sima Siami-Namini,
Akbar Siami Namin
Abstract:
Large language models (LLMs) have attracted considerable attention as they are capable of showcasing impressive capabilities generating comparable high-quality responses to human inputs. LLMs, can not only compose textual scripts such as emails and essays but also executable programming code. Contrary, the automated reasoning capability of these LLMs in performing statistically-driven descriptive…
▽ More
Large language models (LLMs) have attracted considerable attention as they are capable of showcasing impressive capabilities generating comparable high-quality responses to human inputs. LLMs, can not only compose textual scripts such as emails and essays but also executable programming code. Contrary, the automated reasoning capability of these LLMs in performing statistically-driven descriptive analysis, particularly on user-specific data and as personal assistants to users with limited background knowledge in an application domain who would like to carry out basic, as well as advanced statistical and domain-specific analysis is not yet fully explored. More importantly, the performance of these LLMs has not been compared and discussed in detail when domain-specific data analysis tasks are needed. This study, consequently, explores whether LLMs can be used as generative AI-based personal assistants to users with minimal background knowledge in an application domain infer key data insights. To demonstrate the performance of the LLMs, the study reports a case study through which descriptive statistical analysis, as well as Natural Language Processing (NLP) based investigations, are performed on a number of phishing emails with the objective of comparing the accuracy of the results generated by LLMs to the ones produced by analysts. The experimental results show that LangChain and the Generative Pre-trained Transformer (GPT-4) excel in numerical reasoning tasks i.e., temporal statistical analysis, achieve competitive correlation with human judgments on feature engineering tasks while struggle to some extent on domain specific knowledge reasoning, where domain-specific knowledge is required.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs
Authors:
Saroj Gopali,
Akbar S. Namin,
Faranak Abri,
Keith S. Jones
Abstract:
Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be hi…
▽ More
Cyber attacks continue to pose significant threats to individuals and organizations, stealing sensitive data such as personally identifiable information, financial information, and login credentials. Hence, detecting malicious websites before they cause any harm is critical to preventing fraud and monetary loss. To address the increasing number of phishing attacks, protective mechanisms must be highly responsive, adaptive, and scalable. Fortunately, advances in the field of machine learning, coupled with access to vast amounts of data, have led to the adoption of various deep learning models for timely detection of these cyber crimes. This study focuses on the detection of phishing websites using deep learning models such as Multi-Head Attention, Temporal Convolutional Network (TCN), BI-LSTM, and LSTM where URLs of the phishing websites are treated as a sequence. The results demonstrate that Multi-Head Attention and BI-LSTM model outperform some other deep learning-based algorithms such as TCN and LSTM in producing better precision, recall, and F1-scores.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles
Authors:
Sonali Singh,
Faranak Abri,
Akbar Siami Namin
Abstract:
With the recent advent of Large Language Models (LLMs), such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these language models heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models t…
▽ More
With the recent advent of Large Language Models (LLMs), such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these language models heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models to not only ensure its quality but also possible misuses of such models by potential adversaries for illegal activities such as hacking. This paper presents a novel study focusing on exploitation of such large language models against deceptive interactions. More specifically, the paper leverages widespread and borrows well-known techniques in deception theory to investigate whether these models are susceptible to deceitful interactions.
This research aims not only to highlight these risks but also to pave the way for robust countermeasures that enhance the security and integrity of language models in the face of sophisticated social engineering tactics. Through systematic experiments and analysis, we assess their performance in these critical security domains. Our results demonstrate a significant finding in that these large language models are susceptible to deception and social engineering attacks.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
A Survey on Blockchain-Based Federated Learning and Data Privacy
Authors:
Bipin Chhetri,
Saroj Gopali,
Rukayat Olapojoye,
Samin Dehbash,
Akbar Siami Namin
Abstract:
Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated le…
▽ More
Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated learning has the drawback of data leakage due to the lack of privacy-preserving mechanisms employed during storage, transfer, and sharing, thus posing significant risks to data owners and suppliers. Blockchain technology has emerged as a promising technology for offering secure data-sharing platforms in federated learning, especially in Industrial Internet of Things (IIoT) settings. This survey aims to compare the performance and security of various data privacy mechanisms adopted in blockchain-based federated learning architectures. We conduct a systematic review of existing literature on secure data-sharing platforms for federated learning provided by blockchain technology, providing an in-depth overview of blockchain-based federated learning, its essential components, and discussing its principles, and potential applications. The primary contribution of this survey paper is to identify critical research questions and propose potential directions for future research in blockchain-based federated learning.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
A Survey on the Applications of Blockchains in Security of IoT Systems
Authors:
Zulfiqar Ali Khan,
Akbar Siami Namin
Abstract:
The Internet of Things (IoT) has already changed our daily lives by integrating smart devices together towards delivering high quality services to its clients. These devices when integrated together form a network through which massive amount of data can be produced, transferred, and shared. A critical concern is the security and integrity of such a complex platform to ensure the sustainability an…
▽ More
The Internet of Things (IoT) has already changed our daily lives by integrating smart devices together towards delivering high quality services to its clients. These devices when integrated together form a network through which massive amount of data can be produced, transferred, and shared. A critical concern is the security and integrity of such a complex platform to ensure the sustainability and reliability of these IoT-based systems. Blockchain is an emerging technology that has demonstrated its unique features and capabilities for different problems and application domains including IoT-based systems. This survey paper reviews the adaptation of Blockchain in the context of IoT to represent how this technology is capable of addressing the integration and security problems of devices connected to IoT systems. The innovation of this survey is that we present a survey based upon the integration approaches and security issues of IoT data and discuss the role of Blockchain in connection with these issues.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and TCN Models
Authors:
Saroj Gopali,
Faranak Abri,
Sima Siami-Namini,
Akbar Siami Namin
Abstract:
There exist several data-driven approaches that enable us model time series data including traditional regression-based modeling approaches (i.e., ARIMA). Recently, deep learning techniques have been introduced and explored in the context of time series analysis and prediction. A major research question to ask is the performance of these many variations of deep learning techniques in predicting ti…
▽ More
There exist several data-driven approaches that enable us model time series data including traditional regression-based modeling approaches (i.e., ARIMA). Recently, deep learning techniques have been introduced and explored in the context of time series analysis and prediction. A major research question to ask is the performance of these many variations of deep learning techniques in predicting time series data. This paper compares two prominent deep learning modeling techniques. The Recurrent Neural Network (RNN)-based Long Short-Term Memory (LSTM) and the convolutional Neural Network (CNN)-based Temporal Convolutional Networks (TCN) are compared and their performance and training time are reported. According to our experimental results, both modeling techniques perform comparably having TCN-based models outperform LSTM slightly. Moreover, the CNN-based TCN model builds a stable model faster than the RNN-based LSTM models.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Attack Prediction using Hidden Markov Model
Authors:
Shuvalaxmi Dass,
Prerit Datta,
Akbar Siami Namin
Abstract:
It is important to predict any adversarial attacks and their types to enable effective defense systems. Often it is hard to label such activities as malicious ones without adequate analytical reasoning. We propose the use of Hidden Markov Model (HMM) to predict the family of related attacks. Our proposed model is based on the observations often agglomerated in the form of log files and from the ta…
▽ More
It is important to predict any adversarial attacks and their types to enable effective defense systems. Often it is hard to label such activities as malicious ones without adequate analytical reasoning. We propose the use of Hidden Markov Model (HMM) to predict the family of related attacks. Our proposed model is based on the observations often agglomerated in the form of log files and from the target or the victim's perspective. We have built an HMM-based prediction model and implemented our proposed approach using Viterbi algorithm, which generates a sequence of states corresponding to stages of a particular attack. As a proof of concept and also to demonstrate the performance of the model, we have conducted a case study on predicting a family of attacks called Action Spoofing.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Toward Explainable Users: Using NLP to Enable AI to Understand Users' Perceptions of Cyber Attacks
Authors:
Faranak Abri,
Luis Felipe Gutierrez,
Chaitra T. Kulkarni,
Akbar Siami Namin,
Keith S. Jones
Abstract:
To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-particip…
▽ More
To understand how end-users conceptualize consequences of cyber security attacks, we performed a card sorting study, a well-known technique in Cognitive Sciences, where participants were free to group the given consequences of chosen cyber attacks into as many categories as they wished using rationales they see fit. The results of the open card sorting study showed a large amount of inter-participant variation making the research team wonder how the consequences of security attacks were comprehended by the participants. As an exploration of whether it is possible to explain user's mental model and behavior through Artificial Intelligence (AI) techniques, the research team compared the card sorting data with the outputs of a number of Natural Language Processing (NLP) techniques with the goal of understanding how participants perceived and interpreted the consequences of cyber attacks written in natural languages. The results of the NLP-based exploration methods revealed an interesting observation implying that participants had mostly employed checking individual keywords in each sentence to group cyber attack consequences together and less considered the semantics behind the description of consequences of cyber attacks. The results reported in this paper are seemingly useful and important for cyber attacks comprehension from user's perspectives. To the best of our knowledge, this paper is the first introducing the use of AI techniques in explaining and modeling users' behavior and their perceptions about a context. The novel idea introduced here is about explaining users using AI.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Phishing Detection through Email Embeddings
Authors:
Luis Felipe GutiƩrrez,
Faranak Abri,
Miriam Armstrong,
Akbar Siami Namin,
Keith S. Jones
Abstract:
The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phis…
▽ More
The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phishing features are contributing to variations of the classifiers. In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings, i.e., vectorizations. We then fed machine learning classifiers with the carefully crafted emails to find out about the performance of email embeddings developed. Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
A Survey on Vulnerabilities of Ethereum Smart Contracts
Authors:
Zulfiqar Ali Khan,
Akbar Siami Namin
Abstract:
Smart contract (SC) is an extension of BlockChain technology. Ethereum BlockChain was the first to incorporate SC and thus started a new era of crypto-currencies and electronic transactions. Solidity helps to program the SCs. Still, soon after Solidity's emergence in 2014, Solidity-based SCs suffered many attacks that deprived the SC account holders of their precious funds. The main reason for the…
▽ More
Smart contract (SC) is an extension of BlockChain technology. Ethereum BlockChain was the first to incorporate SC and thus started a new era of crypto-currencies and electronic transactions. Solidity helps to program the SCs. Still, soon after Solidity's emergence in 2014, Solidity-based SCs suffered many attacks that deprived the SC account holders of their precious funds. The main reason for these attacks was the presence of vulnerabilities in SC. This paper discusses SC vulnerabilities and classifies them according to the domain knowledge of the faulty operations. This classification is a source of reminding developers and software engineers that for SC's safety, each SC requires proper testing with effective tools to catch those classes' vulnerabilities.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Predicting Emotions Perceived from Sounds
Authors:
Faranak Abri,
Luis Felipe GutiƩrrez,
Akbar Siami Namin,
David R. W. Sears,
Keith S. Jones
Abstract:
Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as…
▽ More
Sonification is the science of communication of data and events to users through sounds. Auditory icons, earcons, and speech are the common auditory display schemes utilized in sonification, or more specifically in the use of audio to convey information. Once the captured data are perceived, their meanings, and more importantly, intentions can be interpreted more easily and thus can be employed as a complement to visualization techniques. Through auditory perception it is possible to convey information related to temporal, spatial, or some other context-oriented information. An important research question is whether the emotions perceived from these auditory icons or earcons are predictable in order to build an automated sonification platform. This paper conducts an experiment through which several mainstream and conventional machine learning algorithms are developed to study the prediction of emotions perceived from sounds. To do so, the key features of sounds are captured and then are modeled using machine learning algorithms using feature reduction techniques. We observe that it is possible to predict perceived emotions with high accuracy. In particular, the regression based on Random Forest demonstrated its superiority compared to other machine learning algorithms.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Cyber-Attack Consequence Prediction
Authors:
Prerit Datta,
Natalie Lodinger,
Akbar Siami Namin,
Keith S. Jones
Abstract:
Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety…
▽ More
Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety of cyber-alerts on a daily basis many of which turn out to be false positives. In this paper, we use machine learning and natural language processing techniques to predict the consequences of cyberattacks. The idea is to enable security researchers to have tools at their disposal that makes it easier to communicate the attack consequences with various stakeholders who may have little to no cybersecurity expertise. Additionally, with the proposed approach researchers' cognitive load can be reduced by automatically predicting the consequences of attacks in case new attacks are discovered. We compare the performance through various machine learning models employing word vectors obtained using both tf-idf and Doc2Vec models. In our experiments, an accuracy of 60% was obtained using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model.
△ Less
Submitted 2 December, 2020; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Fake Reviews Detection through Analysis of Linguistic Features
Authors:
Faranak Abri,
Luis Felipe Gutierrez,
Akbar Siami Namin,
Keith S. Jones,
David R. W. Sears
Abstract:
Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We pre…
▽ More
Online reviews play an integral part for success or failure of businesses. Prior to purchasing services or goods, customers first review the online comments submitted by previous customers. However, it is possible to superficially boost or hinder some businesses through posting counterfeit and fake reviews. This paper explores a natural language processing approach to identify fake reviews. We present a detailed analysis of linguistic features for distinguishing fake and trustworthy online reviews. We study 15 linguistic features and measure their significance and importance towards the classification schemes employed in this study. Our results indicate that fake reviews tend to include more redundant terms and pauses, and generally contain longer sentences. The application of several machine learning classification algorithms revealed that we were able to discriminate fake from real reviews with high accuracy using these linguistic features.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition
Authors:
Varsha Nair,
Moitrayee Chatterjee,
Neda Tavakoli,
Akbar Siami Namin,
Craig Snoeyink
Abstract:
This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs involved in the Convolutional Neural Networks (CNNs) and thus reduces the overall computational costs. The proposed model identifies the object information fro…
▽ More
This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs involved in the Convolutional Neural Networks (CNNs) and thus reduces the overall computational costs. The proposed model identifies the object information from the images. We apply the Fast Fourier transform algorithm on an image data set to obtain more accessible information about the image data, before segmenting them through the U-Net architecture. More specifically, we implement the FFT-based convolutional neural network to improve the training time of the network. The proposed approach was applied to publicly available Broad Bioimage Benchmark Collection (BBBC) dataset. Our model demonstrated improvement in training time during convolution from $600-700$ ms/step to $400-500$ ms/step. We evaluated the accuracy of our model using Intersection over Union (IoU) metric showing significant improvements.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Vulnerability Coverage as an Adequacy Testing Criterion
Authors:
Shuvalaxmi Dass,
Akbar Siami Namin
Abstract:
Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools…
▽ More
Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools make the exhaustive testing of these tools infeasible against these vulnerabilities infeasible. As an instance of general software testing problem, the research question to address is whether the system under test is robust and secure against these vulnerabilities. This paper introduces the idea of ``vulnerability coverage,'' a concept to adequately test a given application for a certain classes of vulnerabilities, as reported by the National Vulnerability Database (NVD). The deriving idea is to utilize the Common Vulnerability Scoring System (CVSS) as a means to measure the fitness of test inputs generated by evolutionary algorithms and then through pattern matching identify vulnerabilities that match the generated vulnerability vectors and then test the system under test for those identified vulnerabilities. We report the performance of two evolutionary algorithms (i.e., Genetic Algorithms and Particle Swarm Optimization) in generating the vulnerability pattern vectors.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Detection of Coincidentally Correct Test Cases through Random Forests
Authors:
Shuvalaxmi Dass,
Xiaozhen Xue,
Akbar Siami Namin
Abstract:
The performance of coverage-based fault localization greatly depends on the quality of test cases being executed. These test cases execute some lines of the given program and determine whether the underlying tests are passed or failed. In particular, some test cases may be well-behaved (i.e., passed) while executing faulty statements. These test cases, also known as coincidentally correct test cas…
▽ More
The performance of coverage-based fault localization greatly depends on the quality of test cases being executed. These test cases execute some lines of the given program and determine whether the underlying tests are passed or failed. In particular, some test cases may be well-behaved (i.e., passed) while executing faulty statements. These test cases, also known as coincidentally correct test cases, may negatively influence the performance of the spectra-based fault localization and thus be less helpful as a tool for the purpose of automated debugging. In other words, the involvement of these coincidentally correct test cases may introduce noises to the fault localization computation and thus cause in divergence of effectively localizing the location of possible bugs in the given code. In this paper, we propose a hybrid approach of ensemble learning combined with a supervised learning algorithm namely, Random Forests (RF) for the purpose of correctly identifying test cases that are mislabeled to be the passing test cases. A cost-effective analysis of flipping the test status or trimming (i.e., eliminating from the computation) the coincidental correct test cases is also reported.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Vulnerability Coverage for Secure Configuration
Authors:
Shuvalaxmi Dass,
Akbar Siami Namin
Abstract:
We present a novel idea on adequacy testing called ``{vulnerability coverage}.'' The introduced coverage measure examines the underlying software for the presence of certain classes of vulnerabilities often found in the National Vulnerability Database (NVD) website. The thoroughness of the test input generation procedure is performed through the adaptation of evolutionary algorithms namely Genetic…
▽ More
We present a novel idea on adequacy testing called ``{vulnerability coverage}.'' The introduced coverage measure examines the underlying software for the presence of certain classes of vulnerabilities often found in the National Vulnerability Database (NVD) website. The thoroughness of the test input generation procedure is performed through the adaptation of evolutionary algorithms namely Genetic Algorithms (GA) and Particle Swarm Optimization (PSO). The methodology utilizes the Common Vulnerability Scoring System (CVSS), a free and open industry standard for assessing the severity of computer system security vulnerabilities, as a fitness measure for test inputs generation. The outcomes of these evolutionary algorithms are then evaluated in order to identify the vulnerabilities that match a class of vulnerability patterns for testing purposes.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Cloud as an Attack Platform
Authors:
Moitrayee Chatterjee,
Prerit Datta,
Faranak Abri,
Akbar Siami Namin,
Keith S. Jones
Abstract:
We present an exploratory study of responses from $75$ security professionals and ethical hackers in order to understand how they abuse cloud platforms for attack purposes. The participants were recruited at the Black Hat and DEF CON conferences. We presented the participants' with various attack scenarios and asked them to explain the steps they would have carried out for launching the attack in…
▽ More
We present an exploratory study of responses from $75$ security professionals and ethical hackers in order to understand how they abuse cloud platforms for attack purposes. The participants were recruited at the Black Hat and DEF CON conferences. We presented the participants' with various attack scenarios and asked them to explain the steps they would have carried out for launching the attack in each scenario. Participants' responses were studied to understand attackers' mental models, which would improve our understanding of necessary security controls and recommendations regarding precautionary actions to circumvent the exploitation of clouds for malicious activities. We observed that in 93.78% of the responses, participants are abusing cloud services to establish their attack environment and launch attacks.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Fake Reviews Detection through Ensemble Learning
Authors:
Luis Gutierrez-Espinoza,
Faranak Abri,
Akbar Siami Namin,
Keith S. Jones,
David R. W. Sears
Abstract:
Customers represent their satisfactions of consuming products by sharing their experiences through the utilization of online reviews. Several machine learning-based approaches can automatically detect deceptive and fake reviews. Recently, there have been studies reporting the performance of ensemble learning-based approaches in comparison to conventional machine learning techniques. Motivated by t…
▽ More
Customers represent their satisfactions of consuming products by sharing their experiences through the utilization of online reviews. Several machine learning-based approaches can automatically detect deceptive and fake reviews. Recently, there have been studies reporting the performance of ensemble learning-based approaches in comparison to conventional machine learning techniques. Motivated by the recent trends in ensemble learning, this paper evaluates the performance of ensemble learning-based approaches to identify bogus online information. The application of a number of ensemble learning-based approaches to a collection of fake restaurant reviews that we developed show that these ensemble learning-based approaches detect deceptive information better than conventional machine learning algorithms.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Launching Stealth Attacks using Cloud
Authors:
Moitrayee Chatterjee,
Prerit Datta,
Faranak Abri,
Akbar Siami Namin,
Keith S. Jones
Abstract:
Cloud computing offers users scalable platforms and low resource cost. At the same time, the off-site location of the resources of this service model makes it more vulnerable to certain types of adversarial actions. Cloud computing has not only gained major user base, but also, it has the features that attackers can leverage to remain anonymous and stealth. With convenient access to data and techn…
▽ More
Cloud computing offers users scalable platforms and low resource cost. At the same time, the off-site location of the resources of this service model makes it more vulnerable to certain types of adversarial actions. Cloud computing has not only gained major user base, but also, it has the features that attackers can leverage to remain anonymous and stealth. With convenient access to data and technology, cloud has turned into an attack platform among other utilization. This paper reports our study to show that cyber attackers heavily abuse the public cloud platforms to setup their attack environments and launch stealth attacks. The paper first reviews types of attacks launched through cloud environment. It then reports case studies through which the processes of launching cyber attacks using clouds are demonstrated.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Clustering Time Series Data through Autoencoder-based Deep Learning Models
Authors:
Neda Tavakoli,
Sima Siami-Namini,
Mahdi Adl Khanghah,
Fahimeh Mirza Soltani,
Akbar Siami Namin
Abstract:
Machine learning and in particular deep learning algorithms are the emerging approaches to data analysis. These techniques have transformed traditional data mining-based analysis radically into a learning-based model in which existing data sets along with their cluster labels (i.e., train set) are learned to build a supervised learning model and predict the cluster labels of unseen data (i.e., tes…
▽ More
Machine learning and in particular deep learning algorithms are the emerging approaches to data analysis. These techniques have transformed traditional data mining-based analysis radically into a learning-based model in which existing data sets along with their cluster labels (i.e., train set) are learned to build a supervised learning model and predict the cluster labels of unseen data (i.e., test set). In particular, deep learning techniques are capable of capturing and learning hidden features in a given data sets and thus building a more accurate prediction model for clustering and labeling problem. However, the major problem is that time series data are often unlabeled and thus supervised learning-based deep learning algorithms cannot be directly adapted to solve the clustering problems for these special and complex types of data sets. To address this problem, this paper introduces a two-stage method for clustering time series data. First, a novel technique is introduced to utilize the characteristics (e.g., volatility) of given time series data in order to create labels and thus be able to transform the problem from unsupervised learning into supervised learning. Second, an autoencoder-based deep learning model is built to learn and model both known and hidden features of time series data along with their created labels to predict the labels of unseen time series data. The paper reports a case study in which financial and stock time series data of selected 70 stock indices are clustered into distinct groups using the introduced two-stage procedure. The results show that the proposed procedure is capable of achieving 87.5\% accuracy in clustering and predicting the labels for unseen time series data.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
The Performance of Machine and Deep Learning Classifiers in Detecting Zero-Day Vulnerabilities
Authors:
Faranak Abri,
Sima Siami-Namini,
Mahdi Adl Khanghah,
Fahimeh Mirza Soltani,
Akbar Siami Namin
Abstract:
The detection of zero-day attacks and vulnerabilities is a challenging problem. It is of utmost importance for network administrators to identify them with high accuracy. The higher the accuracy is, the more robust the defense mechanism will be. In an ideal scenario (i.e., 100% accuracy) the system can detect zero-day malware without being concerned about mistakenly tagging benign files as malware…
▽ More
The detection of zero-day attacks and vulnerabilities is a challenging problem. It is of utmost importance for network administrators to identify them with high accuracy. The higher the accuracy is, the more robust the defense mechanism will be. In an ideal scenario (i.e., 100% accuracy) the system can detect zero-day malware without being concerned about mistakenly tagging benign files as malware or enabling disruptive malicious code running as none-malicious ones. This paper investigates different machine learning algorithms to find out how well they can detect zero-day malware. Through the examination of 34 machine/deep learning classifiers, we found that the random forest classifier offered the best accuracy. The paper poses several research questions regarding the performance of machine and deep learning algorithms when detecting zero-day malware with zero rates for false positive and false negative.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
A Comparative Analysis of Forecasting Financial Time Series Using ARIMA, LSTM, and BiLSTM
Authors:
Sima Siami-Namini,
Neda Tavakoli,
Akbar Siami Namin
Abstract:
Machine and deep learning-based algorithms are the emerging approaches in addressing prediction problems in time series. These techniques have been shown to produce more accurate results than conventional regression-based modeling. It has been reported that artificial Recurrent Neural Networks (RNN) with memory, such as Long Short-Term Memory (LSTM), are superior compared to Autoregressive Integra…
▽ More
Machine and deep learning-based algorithms are the emerging approaches in addressing prediction problems in time series. These techniques have been shown to produce more accurate results than conventional regression-based modeling. It has been reported that artificial Recurrent Neural Networks (RNN) with memory, such as Long Short-Term Memory (LSTM), are superior compared to Autoregressive Integrated Moving Average (ARIMA) with a large margin. The LSTM-based models incorporate additional "gates" for the purpose of memorizing longer sequences of input data. The major question is that whether the gates incorporated in the LSTM architecture already offers a good prediction and whether additional training of data would be necessary to further improve the prediction.
Bidirectional LSTMs (BiLSTMs) enable additional training by traversing the input data twice (i.e., 1) left-to-right, and 2) right-to-left). The research question of interest is then whether BiLSTM, with additional training capability, outperforms regular unidirectional LSTM. This paper reports a behavioral analysis and comparison of BiLSTM and LSTM models. The objective is to explore to what extend additional layers of training of data would be beneficial to tune the involved parameters. The results show that additional training of data and thus BiLSTM-based modeling offers better predictions than regular LSTM-based models. More specifically, it was observed that BiLSTM models provide better predictions compared to ARIMA and LSTM models. It was also observed that BiLSTM models reach the equilibrium much slower than LSTM-based models.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Markov Decision Process to Enforce Moving Target Defence Policies
Authors:
Jianjun Zheng,
Akbar Siami Namin
Abstract:
Moving Target Defense (MTD) is an emerging game-changing defense strategy in cybersecurity with the goal of strengthening defenders and conversely puzzling adversaries in a network environment. The successful deployment of an MTD system can be affected by several factors including 1) the effectiveness of the employed technique, 2) the deployment strategy, 3) the cost of the MTD implementation, and…
▽ More
Moving Target Defense (MTD) is an emerging game-changing defense strategy in cybersecurity with the goal of strengthening defenders and conversely puzzling adversaries in a network environment. The successful deployment of an MTD system can be affected by several factors including 1) the effectiveness of the employed technique, 2) the deployment strategy, 3) the cost of the MTD implementation, and 4) the impact yielded by the enforced security policies. Many research efforts have been spent on introducing a variety of MTD techniques which are often evaluated through simulations. Nevertheless, this line of research needs more attention. In particular, the determination of optimal cost and policy analysis and the selection of those policies in an MTD setting is still an open research question.
To advance the state-of-the-art of this line of research, this paper introduces an approach based on control theory to model, analyze and select optimal security policies for Moving Target Defense (MTD) deployment strategies. A Markov Decision Process (MDP) scheme is presented to model states of the system from attacking point of view. The employed value iteration method is based on the Bellman optimality equation for optimal policy selection for each state defined in the system. The model is then utilized to analyze the impact of various costs on the optimal policy. The MDP model is then applied to two case studies to evaluate the performance of the model.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Deep Reinforcement Learning for Detecting Malicious Websites
Authors:
Moitrayee Chatterjee,
Akbar Siami Namin
Abstract:
Phishing is the simplest form of cybercrime with the objective of baiting people into giving away delicate information such as individually recognizable data, banking and credit card details, or even credentials and passwords. This type of simple yet most effective cyber-attack is usually launched through emails, phone calls, or instant messages. The credential or private data stolen are then used…
▽ More
Phishing is the simplest form of cybercrime with the objective of baiting people into giving away delicate information such as individually recognizable data, banking and credit card details, or even credentials and passwords. This type of simple yet most effective cyber-attack is usually launched through emails, phone calls, or instant messages. The credential or private data stolen are then used to get access to critical records of the victims and can result in extensive fraud and monetary loss. Hence, sending malicious messages to victims is a stepping stone of the phishing procedure. A \textit{phisher} usually setups a deceptive website, where the victims are conned into entering credentials and sensitive information. It is therefore important to detect these types of malicious websites before causing any harmful damages to victims. Inspired by the evolving nature of the phishing websites, this paper introduces a novel approach based on deep reinforcement learning to model and detect malicious URLs. The proposed model is capable of adapting to the dynamic behavior of the phishing websites and thus learn the features associated with phishing website detection.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
The Sounds of Cyber Threats
Authors:
Akbar Siami Namin,
Rattikorn Hewett,
Keith S. Jones,
Rona Pogrund
Abstract:
The Internet enables users to access vast resources, but it can also expose users to harmful cyber-attacks. This paper investigates human factors issues concerning the use of sounds in a cyber-security domain. It describes a methodology, referred to as sonification, to effectively design and develop auditory cyber-security threat indicators to warn users about cyber-attacks. A case study is presen…
▽ More
The Internet enables users to access vast resources, but it can also expose users to harmful cyber-attacks. This paper investigates human factors issues concerning the use of sounds in a cyber-security domain. It describes a methodology, referred to as sonification, to effectively design and develop auditory cyber-security threat indicators to warn users about cyber-attacks. A case study is presented, along with the results, of various types of usability testing with a number of Internet users who are visually impaired. The paper concludes with a discussion of future steps to enhance this work.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
Forecasting Economics and Financial Time Series: ARIMA vs. LSTM
Authors:
Sima Siami-Namini,
Akbar Siami Namin
Abstract:
Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In partic…
▽ More
Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In particular, ARIMA model has demonstrated its outperformance in precision and accuracy of predicting the next lags of time series. With the recent advancement in computational power of computers and more importantly developing more advanced machine learning algorithms and approaches such as deep learning, new algorithms are developed to forecast time series data. The research question investigated in this article is that whether and how the newly developed deep learning-based algorithms for forecasting time series data, such as "Long Short-Term Memory (LSTM)", are superior to the traditional algorithms. The empirical studies conducted and reported in this article show that deep learning-based algorithms such as LSTM outperform traditional-based algorithms such as ARIMA model. More specifically, the average reduction in error rates obtained by LSTM is between 84 - 87 percent when compared to ARIMA indicating the superiority of LSTM to ARIMA. Furthermore, it was noticed that the number of training times, known as "epoch" in deep learning, has no effect on the performance of the trained forecast model and it exhibits a truly random behavior.
△ Less
Submitted 16 March, 2018;
originally announced March 2018.