Search | arXiv e-print repository

Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

Authors: Sangwon Hyun, Shaukat Ali, M. Ali Babar

Abstract: Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. M… ▽ More Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. Most extant LLM testing studies are limited to automatically generating test cases (i.e., MRs) to enhance failure detection. Additionally, most studies only considered a limited test space of single perturbation MRs in their evaluation of LLMs. In contrast, our paper proposes a search-based approach for optimizing the MR groups to maximize failure detection and minimize the LLM execution cost. Moreover, our approach covers the combinatorial perturbations in MRs, facilitating the expansion of test space in the robustness assessment. We have developed a search process and implemented four search algorithms: Single-GA, NSGA-II, SPEA2, and MOEA/D with novel encoding to solve the MR selection problem in the LLM robustness testing. We conducted comparative experiments on the four search algorithms along with a random search, using two major LLMs with primary Text-to-Text tasks. Our statistical and empirical investigation revealed two key findings: (1) the MOEA/D algorithm performed the best in optimizing the MR space for LLM robustness testing, and (2) we identified silver bullet MRs for the LLM robustness testing, which demonstrated dominant capabilities in confusing LLMs across different Text-to-Text tasks. In LLM robustness assessment, our research sheds light on the fundamental problem for optimized testing and provides insights into search-based solutions. △ Less

Submitted 7 July, 2025; originally announced July 2025.

arXiv:2507.00460 [pdf, ps, other]

Pitfalls of Evaluating Language Models with Open Benchmarks

Authors: Md. Najib Hasan, Mohammad Fakhruddin Babar, Souvika Sarkar, Monowar Hasan, Santu Karmaker

Abstract: Open Large Language Model (LLM) benchmarks, such as HELM and BIG-bench, offer standardized, transparent protocols that facilitate the fair comparison, reproducibility, and iterative advancement of Language Models (LMs). However, their openness also introduces critical and underexplored pitfalls. This study exposes these weaknesses by systematically constructing ``cheating'' models -- smaller varia… ▽ More Open Large Language Model (LLM) benchmarks, such as HELM and BIG-bench, offer standardized, transparent protocols that facilitate the fair comparison, reproducibility, and iterative advancement of Language Models (LMs). However, their openness also introduces critical and underexplored pitfalls. This study exposes these weaknesses by systematically constructing ``cheating'' models -- smaller variants of BART, T5, and GPT-2 fine-tuned directly on public test sets -- which achieve top rankings on a prominent open, holistic benchmark (HELM) despite poor generalization and limited practical utility. Our findings underscore three key insights: \ca high leaderboard performance on open benchmarks may not always reflect real-world effectiveness; \cb private or dynamic benchmarks must complement open evaluations to safeguard integrity; and \cc a fundamental reevaluation of current benchmarking practices is essential to ensure robust and trustworthy LM assessments. △ Less

Submitted 1 July, 2025; originally announced July 2025.

arXiv:2506.11600 [pdf, ps, other]

GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news

Authors: Abdul Haque, Umm e Hani, Ahmad Din, Muhammad Babar, Ali Abbas, Insaf Ullah

Abstract: GraphRAG-Causal introduces an innovative framework that combines graph-based retrieval with large language models to enhance causal reasoning in news analysis. Traditional NLP approaches often struggle with identifying complex, implicit causal links, especially in low-data scenarios. Our approach addresses these challenges by transforming annotated news headlines into structured causal knowledge g… ▽ More GraphRAG-Causal introduces an innovative framework that combines graph-based retrieval with large language models to enhance causal reasoning in news analysis. Traditional NLP approaches often struggle with identifying complex, implicit causal links, especially in low-data scenarios. Our approach addresses these challenges by transforming annotated news headlines into structured causal knowledge graphs. It then employs a hybrid retrieval system that merges semantic embeddings with graph-based structural cues leveraging Neo4j to accurately match and retrieve relevant events. The framework is built on a three-stage pipeline: First, during Data Preparation, news sentences are meticulously annotated and converted into causal graphs capturing cause, effect, and trigger relationships. Next, the Graph Retrieval stage stores these graphs along with their embeddings in a Neo4j database and utilizes hybrid Cypher queries to efficiently identify events that share both semantic and structural similarities with a given query. Finally, the LLM Inference stage utilizes these retrieved causal graphs in a few-shot learning setup with XML-based prompting, enabling robust classification and tagging of causal relationships. Experimental evaluations demonstrate that GraphRAG-Causal achieves an impressive F1-score of 82.1% on causal classification using just 20 few-shot examples. This approach significantly boosts accuracy and consistency, making it highly suitable for real-time applications in news reliability assessment, misinformation detection, and policy analysis. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 18 pages, 8 figures

arXiv:2506.01991 [pdf, ps, other]

Investigating Timing-Based Information Leakage in Data Flow-Driven Real-Time Systems

Authors: Mohammad Fakhruddin Babar, Zain A. H. Hammadeh, Mohammad Hamad, Monowar Hasan

Abstract: Leaking information about the execution behavior of critical real-time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real-time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional condition… ▽ More Leaking information about the execution behavior of critical real-time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real-time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional conditions). The data flow-driven applications inherit such a multimode execution model. In this paper, we investigate whether a low-priority "observer" task can infer the execution patterns of a high-priority "victim" task (especially the critical executions). We develop a new statistical analysis technique and show that by analyzing the response times of the low-priority task, it becomes possible to extract the execution behavior of the high-priority task. We test our approach against a random selection technique that arbitrarily classifies a job as critical. We find that correlating the observer's response times with the victim's jobs can result in higher precision in identifying critical invocations compared to a random guess. We conduct extensive evaluations with systemically generated workloads, including a case study using a UAV autopilot (ArduPilot) taskset parameters. We found that our inference algorithm can achieve relatively low false positive rates (less than 25%) with relatively low footprint (1 MB memory and 50 ms timing overhead on a Raspberry Pi 4 platform). We further demonstrate the feasibility of inference on two cyber-physical platforms: an off-the-shelf manufacturing robot and a custom-built surveillance system. △ Less

Submitted 3 June, 2025; v1 submitted 17 May, 2025; originally announced June 2025.

arXiv:2504.04351 [pdf, other]

DDPT: Diffusion-Driven Prompt Tuning for Large Language Model Code Generation

Authors: Jinyang Li, Sangwon Hyun, M. Ali Babar

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation. However, the quality of the generated code is heavily dependent on the structure and composition of the prompts used. Crafting high-quality prompts is a challenging task that requires significant knowledge and skills of prompt engineering. To advance the automation support for the prompt engineering for LLM-… ▽ More Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation. However, the quality of the generated code is heavily dependent on the structure and composition of the prompts used. Crafting high-quality prompts is a challenging task that requires significant knowledge and skills of prompt engineering. To advance the automation support for the prompt engineering for LLM-based code generation, we propose a novel solution Diffusion-Driven Prompt Tuning (DDPT) that learns how to generate optimal prompt embedding from Gaussian Noise to automate the prompt engineering for code generation. We evaluate the feasibility of diffusion-based optimization and abstract the optimal prompt embedding as a directional vector toward the optimal embedding. We use the code generation loss given by the LLMs to help the diffusion model capture the distribution of optimal prompt embedding during training. The trained diffusion model can build a path from the noise distribution to the optimal distribution at the sampling phrase, the evaluation result demonstrates that DDPT helps improve the prompt optimization for code generation. △ Less

Submitted 6 April, 2025; originally announced April 2025.

Comments: ICSE CAIN 2025

arXiv:2502.02009 [pdf, other]

LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations

Authors: Ziyang Ye, Triet Huynh Minh Le, M. Ali Babar

Abstract: Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generati… ▽ More Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generation, presents an opportunity to address this limitation. This study introduces LLMSecConfig, an innovative framework that bridges this gap by combining SATs with LLMs. Our approach leverages advanced prompting techniques and Retrieval-Augmented Generation (RAG) to automatically repair security misconfigurations while preserving operational functionality. Evaluation of 1,000 real-world Kubernetes configurations achieved a 94\% success rate while maintaining a low rate of introducing new misconfigurations. Our work makes a promising step towards automated container security management, reducing the manual effort required for configuration maintenance. △ Less

Submitted 3 February, 2025; originally announced February 2025.

arXiv:2501.16364 [pdf, other]

Multivariate Time Series Anomaly Detection by Capturing Coarse-Grained Intra- and Inter-Variate Dependencies

Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

Abstract: Multivariate time series anomaly detection is essential for failure management in web application operations, as it directly influences the effectiveness and timeliness of implementing remedial or preventive measures. This task is often framed as a semi-supervised learning problem, where only normal data are available for model training, primarily due to the labor-intensive nature of data labeling… ▽ More Multivariate time series anomaly detection is essential for failure management in web application operations, as it directly influences the effectiveness and timeliness of implementing remedial or preventive measures. This task is often framed as a semi-supervised learning problem, where only normal data are available for model training, primarily due to the labor-intensive nature of data labeling and the scarcity of anomalous data. Existing semi-supervised methods often detect anomalies by capturing intra-variate temporal dependencies and/or inter-variate relationships to learn normal patterns, flagging timestamps that deviate from these patterns as anomalies. However, these approaches often fail to capture salient intra-variate temporal and inter-variate dependencies in time series due to their focus on excessively fine granularity, leading to suboptimal performance. In this study, we introduce MtsCID, a novel semi-supervised multivariate time series anomaly detection method. MtsCID employs a dual network architecture: one network operates on the attention maps of multi-scale intra-variate patches for coarse-grained temporal dependency learning, while the other works on variates to capture coarse-grained inter-variate relationships through convolution and interaction with sinusoidal prototypes. This design enhances the ability to capture the patterns from both intra-variate temporal dependencies and inter-variate relationships, resulting in improved performance. Extensive experiments across seven widely used datasets demonstrate that MtsCID achieves performance comparable or superior to state-of-the-art benchmark methods. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Comments: 9 pages, 3 figures, Accepted to TheWebConference 2025

arXiv:2412.06166 [pdf, other]

MVD: A Multi-Lingual Software Vulnerability Detection Framework

Authors: Boyu Zhang, Triet H. M. Le, M. Ali Babar

Abstract: Software vulnerabilities can result in catastrophic cyberattacks that increasingly threaten business operations. Consequently, ensuring the safety of software systems has become a paramount concern for both private and public sectors. Recent literature has witnessed increasing exploration of learning-based approaches for software vulnerability detection. However, a key limitation of these techniqu… ▽ More Software vulnerabilities can result in catastrophic cyberattacks that increasingly threaten business operations. Consequently, ensuring the safety of software systems has become a paramount concern for both private and public sectors. Recent literature has witnessed increasing exploration of learning-based approaches for software vulnerability detection. However, a key limitation of these techniques is their primary focus on a single programming language, such as C/C++, which poses constraints considering the polyglot nature of modern software projects. Further, there appears to be an oversight in harnessing the synergies of vulnerability knowledge across varied languages, potentially underutilizing the full capabilities of these methods. To address the aforementioned issues, we introduce MVD - an innovative multi-lingual vulnerability detection framework. This framework acquires the ability to detect vulnerabilities across multiple languages by concurrently learning from vulnerability data of various languages, which are curated by our specialized pipeline. We also incorporate incremental learning to enable the detection capability of MVD to be extended to new languages, thus augmenting its practical utility. Extensive experiments on our curated dataset of more than 11K real-world multi-lingual vulnerabilities substantiate that our framework significantly surpasses state-of-the-art methods in multi-lingual vulnerability detection by 83.7% to 193.6% in PR-AUC. The results also demonstrate that MVD detects vulnerabilities well for new languages without compromising the detection performance of previously trained languages, even when training data for the older languages is unavailable. Overall, our findings motivate and pave the way for the prediction of multi-lingual vulnerabilities in modern software systems. △ Less

Submitted 8 December, 2024; originally announced December 2024.

arXiv:2408.12904 [pdf]

SecDOAR: A Software Reference Architecture for Security Data Orchestration, Analysis and Reporting

Authors: Muhammad Aufeef Chauhan, Muhammad Ali Babar, Fethi Rabhi

Abstract: A Software Reference Architecture (SRA) is a useful tool for standardising existing architectures in a specific domain and facilitating concrete architecture design, development and evaluation by instantiating SRA and using SRA as a benchmark for the development of new systems. In this paper, we have presented an SRA for Security Data Orchestration, Analysis and Reporting (SecDOAR) to provide stan… ▽ More A Software Reference Architecture (SRA) is a useful tool for standardising existing architectures in a specific domain and facilitating concrete architecture design, development and evaluation by instantiating SRA and using SRA as a benchmark for the development of new systems. In this paper, we have presented an SRA for Security Data Orchestration, Analysis and Reporting (SecDOAR) to provide standardisation of security data platforms that can facilitate the integration of security orchestration, analysis and reporting tools for security data. The SecDOAR SRA has been designed by leveraging existing scientific literature and security data standards. We have documented SecDOAR SRA in terms of design methodology, meta-models to relate to different concepts in the security data architecture, and details on different elements and components of the SRA. We have evaluated SecDOAR SRA for its effectiveness and completeness by comparing it with existing commercial solutions. We have demonstrated the feasibility of the proposed SecDOAR SRA by instantiating it as a prototype platform to support security orchestration, analysis and reporting for a selected set of tools. The proposed SecDOAR SRA consists of meta-models for security data, security events and security data management processes as well as security metrics and corresponding measurement schemes, a security data integration model, and a description of SecDOAR SRA components. The proposed SecDOAR SRA can be used by researchers and practitioners as a structured approach for designing and implementing cybersecurity monitoring, analysis and reporting systems in various domains. △ Less

Submitted 25 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

Comments: 21 pages, 17 Figures, 5 Tables

arXiv:2408.00435 [pdf, other]

A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality

Authors: M. Mehdi Kholoosi, M. Ali Babar, Roland Croft

Abstract: Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure… ▽ More Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure software, ChatGPT's assistance is expected to be explored for security-related tasks during the development/evolution of software. To gain an understanding of the potential of ChatGPT as an emerging technology for supporting software security, we adopted a two-fold approach. Initially, we performed an empirical study to analyse the perceptions of those who had explored the use of ChatGPT for security tasks and shared their views on Twitter. It was determined that security practitioners view ChatGPT as beneficial for various software security tasks, including vulnerability detection, information retrieval, and penetration testing. Secondly, we designed an experiment aimed at investigating the practicality of this technology when deployed as an oracle in real-world settings. In particular, we focused on vulnerability detection and qualitatively examined ChatGPT outputs for given prompts within this prominent software security task. Based on our analysis, responses from ChatGPT in this task are largely filled with generic security information and may not be appropriate for industry use. To prevent data leakage, we performed this analysis on a vulnerability dataset compiled after the OpenAI data cut-off date from real-world projects covering 40 distinct vulnerability types and 12 programming languages. We assert that the findings from this study would contribute to future research aimed at developing and evaluating LLMs dedicated to software security. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted for publication at International Conference on Trust, Privacy and Security - 2024

arXiv:2407.17803 [pdf, other]

Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?

Authors: Triet H. M. Le, M. Ali Babar

Abstract: Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quanti… ▽ More Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quantitatively and qualitatively study the quality and use of the state-of-the-art auto-labeled SV data, D2A, for SV prediction. Method: Using multiple sources and manual validation, we curate clean SV data from human-labeled SV-fixing commits in two well-known projects for investigating the auto-labeled counterparts. Results: We discover that 50+% of the auto-labeled SVs are noisy (incorrectly labeled), and they hardly overlap with the publicly reported ones. Yet, SV prediction models utilizing the noisy auto-labeled SVs can perform up to 22% and 90% better in Matthews Correlation Coefficient and Recall, respectively, than the original models. We also reveal the promises and difficulties of applying noise-reduction methods for automatically addressing the noise in auto-labeled SV data to maximize the data utilization for SV prediction. Conclusions: Our study informs the benefits and challenges of using auto-labeled SVs, paving the way for large-scale SV prediction. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2407.17053 [pdf, other]

doi 10.1145/3674805.3686670

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Authors: Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

Abstract: Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has be… ▽ More Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area. △ Less

Submitted 3 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2407.10722 [pdf, other]

Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?

Authors: Triet H. M. Le, M. Ali Babar

Abstract: Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distribution… ▽ More Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distributions of the CVSS classes, but such data imbalance has been hardly understood and addressed in the literature. Aims: We conduct a large-scale study to quantify the impacts of data imbalance and mitigate the issue for SV assessment through the use of data augmentation. Method: We leverage nine data augmentation techniques to balance the class distributions of the CVSS metrics. We then compare the performance of SV assessment models with and without leveraging the augmented data. Results: Through extensive experiments on 180k+ real-world SVs, we show that mitigating data imbalance can significantly improve the predictive performance of models for all the CVSS tasks, by up to 31.8% in Matthews Correlation Coefficient. We also discover that simple text augmentation like combining random text insertion, deletion, and replacement can outperform the baseline across the board. Conclusions: Our study provides the motivation and the first promising step toward tackling data imbalance for effective SV assessment. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

arXiv:2406.19765 [pdf, other]

Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, M. Ali Babar

Abstract: Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensivel… ▽ More Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensively review and analyze existing literature concerning learning-based methods within the CI domain. We endeavour to identify and analyse various techniques documented in the literature, emphasizing the fundamental attributes of training phases within learning-based solutions in the context of CI. Method: We conducted a Systematic Literature Review (SLR) involving 52 primary studies. Through statistical and thematic analyses, we explored the correlations between CI tasks and the training phases of learning-based methodologies across the selected studies, encompassing a spectrum from data engineering techniques to evaluation metrics. Results: This paper presents an analysis of the automation of CI tasks utilizing learning-based methods. We identify and analyze nine types of data sources, four steps in data preparation, four feature types, nine subsets of data features, five approaches for hyperparameter selection and tuning, and fifteen evaluation metrics. Furthermore, we discuss the latest techniques employed, existing gaps in CI task automation, and the characteristics of the utilized learning-based techniques. Conclusion: This study provides a comprehensive overview of learning-based methods in CI, offering valuable insights for researchers and practitioners developing CI task automation. It also highlights the need for further research to advance these methods in CI. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: This paper has been accepted to be published in IEEE Access

arXiv:2406.18813 [pdf, other]

Towards Secure Management of Edge-Cloud IoT Microservices using Policy as Code

Authors: Samodha Pallewatta, Muhammad Ali Babar

Abstract: IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domai… ▽ More IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domain data processing and transmission among distributed and dynamically composed microservices. The ability to implement granular security controls at the microservices level has the potential to solve this. To this end, edge-cloud environments require intricate and scalable security frameworks that operate across multi-domain environments to enforce various security policies during the management of microservices (i.e., initial placement, scaling, migration, and dynamic composition), considering the sensitivity of the IoT data. To address the lack of such a framework, we propose an architectural framework that uses Policy-as-Code to ensure secure microservice management within multi-domain edge-cloud environments. The proposed framework contains a "control plane" to intelligently and dynamically utilise and configure cloud-native (i.e., container orchestrators and service mesh) technologies to enforce security policies. We implement a prototype of the proposed framework using open-source cloud-native technologies such as Docker, Kubernetes, Istio, and Open Policy Agent to validate the framework. Evaluations verify our proposed framework's ability to enforce security policies for distributed microservices management, thus harvesting the MSA characteristics to ensure IoT application security needs. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 16 pages, 7 figures, Accepted for full paper presentation at ECSA 2024 conference

arXiv:2406.09737 [pdf, other]

A Multivocal Review of MLOps Practices, Challenges and Open Issues

Authors: Beyza Eken, Samodha Pallewatta, Nguyen Khoi Tran, Ayse Tosun, Muhammad Ali Babar

Abstract: MLOps has emerged as a key solution to address many socio-technical challenges of bringing ML models to production, such as integrating ML models with non-ML software, continuous monitoring, maintenance, and retraining of deployed models. Despite the utility of MLOps, an integrated body of knowledge regarding MLOps remains elusive because of its extensive scope due to the diversity of ML productio… ▽ More MLOps has emerged as a key solution to address many socio-technical challenges of bringing ML models to production, such as integrating ML models with non-ML software, continuous monitoring, maintenance, and retraining of deployed models. Despite the utility of MLOps, an integrated body of knowledge regarding MLOps remains elusive because of its extensive scope due to the diversity of ML productionalization challenges it addresses. Whilst the existing literature reviews provide valuable snapshots of specific practices, tools, and research prototypes related to MLOps at various times, they focus on particular facets of MLOps, thus fail to offer a comprehensive and invariant framework that can weave these perspectives into a unified understanding of MLOps. This paper presents a Multivocal Literature Review that systematically analyzes a corpus of 150 peer-reviewed and 48 grey literature to synthesize a unified conceptualization of MLOps and develop a snapshot of its best practices, adoption challenges, and solutions. △ Less

Submitted 16 April, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: 44 pages, 4 figures

arXiv:2406.04902 [pdf, other]

Beyond Data, Towards Sustainability: A Sydney Case Study on Urban Digital Twins

Authors: Ammar Sohail, Bojie Shen, Muhammad Aamir Cheema, Mohammed Eunus Ali, Anwaar Ulhaq, Muhammad Ali Babar, Asama Qureshi

Abstract: As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and d… ▽ More As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and data analysis techniques, the study explores some applications of this digital twin in urban sustainability, such as spatial ranking of suburbs and automatic identification of correlations between variables. Additionally, the research delves into predictive modeling, employing machine learning to forecast traffic crash risks using environmental data, showcasing the potential for proactive interventions. The contributions of this work lie in the comprehensive exploration of a city-scale digital twin for sustainable urban planning, offering a multifaceted approach to data-driven decision-making. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.15293 [pdf, other]

Transaction Fee Estimation in the Bitcoin System

Authors: Limeng Zhang, Rui Zhou, Qing Liu, Chengfei Liu, M. Ali Babar

Abstract: In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter.… ▽ More In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter. Hence it is indispensable to help a client set a reasonable fee, as a higher fee incurs over-spending and a lower fee could delay the confirmation. In this work, we focus on estimating the transaction fee for a new transaction to help with its confirmation within a given expected time. We identify two major drawbacks in the existing works. First, the current industry products are built on explicit analytical models, ignoring the complex interactions of different factors which could be better captured by machine learning based methods; Second, all of the existing works utilize limited knowledge for the estimation which hinders the potential of further improving the estimation quality. As a result, we propose a framework FENN, which aims to integrate the knowledge from a wide range of sources, including the transaction itself, unconfirmed transactions in the mempool and the blockchain confirmation environment, into a neural network model in order to estimate a proper transaction fee. Finally, we conduct experiments on real blockchain datasets to demonstrate the effectiveness and efficiency of our proposed framework over the state-of-the-art works evaluated by MAPE and RMSE. Each variation model in our framework can finish training within one block interval, which shows the potential of our framework to process the realtime transaction updates in the Bitcoin blockchain. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.17110 [pdf, other]

Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

Abstract: Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and i… ▽ More Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

arXiv:2404.11294 [pdf, other]

LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking

Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

Abstract: Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data re… ▽ More Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data requirements and optimal detection performance, contrasting with their supervised and unsupervised counterparts. However, existing semi-supervised methods overlook the potential bias introduced by highly frequent log messages on the learned normal patterns, which leads to their less than satisfactory performance. In this study, we propose LogSD, a novel semi-supervised self-supervised learning approach. LogSD employs a dual-network architecture and incorporates a frequency-based masking scheme, a global-to-local reconstruction paradigm and three self-supervised learning tasks. These features enable LogSD to focus more on relatively infrequent log messages, thereby effectively learning less biased and more discriminative patterns from historical normal data. This emphasis ultimately leads to improved anomaly detection performance. Extensive experiments have been conducted on three commonly-used datasets and the results show that LogSD significantly outperforms eight state-of-the-art benchmark methods. △ Less

Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 23 pages with 11 figures

arXiv:2404.06043 [pdf, other]

Automatic Configuration Tuning on Cloud Database: A Survey

Authors: Limeng Zhang, M. Ali Babar

Abstract: Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware se… ▽ More Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware setup, software setup, database physical and logical design, etc., that control runtime behaviors and impact database performance. To find the optimal configuration for achieving optimal performance, extensive research has been conducted on automatic parameter tuning in DBMS. This paper provides a comprehensive survey of predominant configuration tuning techniques, including Bayesian optimization-based solutions, Neural network-based solutions, Reinforcement learning-based solutions, and Search-based solutions. Moreover, it investigates the fundamental aspects of parameter tuning pipeline, including tuning objective, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings. We highlight technique comparisons in each component, corresponding solutions, and introduce the experimental setting for performance evaluation. Finally, we conclude this paper and present future research opportunities. This paper aims to assist future researchers and practitioners in gaining a better understanding of automatic parameter tuning in cloud databases by providing state-of-the-art existing solutions, research directions, and evaluation benchmarks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03823 [pdf, other]

An Investigation into Misuse of Java Security APIs by Large Language Models

Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar

Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding sof… ▽ More The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding software security, yet effectively integrating security APIs presents substantial challenges. This leads to inadvertent misuse by developers, thereby exposing software to vulnerabilities. To overcome these challenges, developers may seek assistance from LLMs. In this paper, we systematically assess ChatGPT's trustworthiness in code generation for security API use cases in Java. To conduct a thorough evaluation, we compile an extensive collection of 48 programming tasks for 5 widely used security APIs. We employ both automated and manual approaches to effectively detect security API misuse in the code generated by ChatGPT for these tasks. Our findings are concerning: around 70% of the code instances across 30 attempts per task contain security API misuse, with 20 distinct misuse types identified. Moreover, for roughly half of the tasks, this rate reaches 100%, indicating that there is a long way to go before developers can rely on ChatGPT to securely implement security API code. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by ACM ASIACCS 2024

arXiv:2403.15416 [pdf, other]

Fuzzy hyperparameters update in a second order optimization

Authors: Abdelaziz Bensadok, Muhammad Zeeshan Babar

Abstract: This research will present a hybrid approach to accelerate convergence in a second order optimization. An online finite difference approximation of the diagonal Hessian matrix will be introduced, along with fuzzy inferencing of several hyperparameters. Competitive results have been achieved This research will present a hybrid approach to accelerate convergence in a second order optimization. An online finite difference approximation of the diagonal Hessian matrix will be introduced, along with fuzzy inferencing of several hyperparameters. Competitive results have been achieved △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2401.16577 [pdf, other]

LLMs as On-demand Customizable Service

Authors: Souvika Sarkar, Mohammad Fakhruddin Babar, Monowar Hasan, Shubhra Kanti Karmaker

Abstract: Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing… ▽ More Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.13199 [pdf, other]

Why People Still Fall for Phishing Emails: An Empirical Investigation into How Users Make Email Response Decisions

Authors: Asangi Jayatilaka, Nalin Asanka Gamagedara Arachchilage, Muhammad Ali Babar

Abstract: Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of t… ▽ More Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of the in-depth qualitative data has enabled us to identify different elements of email users' decision-making that influence their email response decisions. Furthermore, we developed a theoretical model that explains how people could be driven to respond to emails based on the identified elements of users' email decision-making processes and the relationships uncovered from the data. The findings provide deeper insights into phishing email susceptibility due to people's email response decision-making behavior. We also discuss the implications of our findings for designers and researchers working in anti-phishing training, education, and awareness interventions △ Less

Submitted 23 January, 2024; originally announced January 2024.

Journal ref: Symposium on Usable Security and Privacy (USEC) 2024

arXiv:2401.11105 [pdf, other]

Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

Authors: Triet H. M. Le, Xiaoning Du, M. Ali Babar

Abstract: Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the useful… ▽ More Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the usefulness of these latent SVs for SV prediction. To bridge these gaps, we conduct a large-scale study on the latent vulnerable functions in two commonly used SV datasets and their utilization for function-level and line-level SV predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more than 100k latent vulnerable functions in the studied datasets. We find that these latent functions can increase the number of SVs by 4x on average and correct up to 5k mislabeled functions, yet they have a noise level of around 6%. Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs. The improvements are up to 24.5% in the performance (F1-Score) of function-level SV predictions and up to 67% in the effectiveness of localizing vulnerable lines. Overall, our study presents the first promising step toward the use of latent SVs to improve the quality of SV datasets and enhance the performance of SV prediction tasks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted as a full paper in the technical track at the 21st International Conference on Mining Software Repositories (MSR) 2024

arXiv:2312.06056 [pdf, other]

METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Authors: Sangwon Hyun, Mingyu Guo, M. Ali Babar

Abstract: Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have l… ▽ More Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have limited their coverage of QAs and tasks in LLMs and are difficult to extend. Additionally, these studies have only used one evaluation metric, Attack Success Rate (ASR), to assess the effectiveness of their approaches. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues by applying Metamorphic Testing (MT) techniques. This approach facilitates the systematic testing of LLM qualities by defining Metamorphic Relations (MRs), which serve as modularized evaluation metrics. The METAL framework can automatically generate hundreds of MRs from templates that cover various QAs and tasks. In addition, we introduced novel metrics that integrate the ASR method into the semantic qualities of text to assess the effectiveness of MRs accurately. Through the experiments conducted with three prominent LLMs, we have confirmed that the METAL framework effectively evaluates essential QAs on primary LLM tasks and reveals the quality risks in LLMs. Moreover, the newly proposed metrics can guide the optimal MRs for testing each task and suggest the most effective method for generating MRs. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted to International Conference on Software Testing, Verification and Validation (ICST) 2024 / Key words: Large-language models, Metamorphic testing, Quality evaluation, Text perturbations

arXiv:2310.18839 [pdf, other]

The Telehealth Chain: a protocol for secure and transparent telemedicine transactions on the blockchain

Authors: Syed Sarosh Mahdi, Zaib Ullah, Gopi Battineni, Muneer Gohar Babar, Umer Daood

Abstract: Blockchain technology provides a secure and decentralized platform for storing and transferring sensitive medical data, which can be utilized to enable remote medical consultations. This paper proposes a theoretical framework for creating a blockchain-based digital entity to facilitate telemedicine services. The proposed framework utilizes blockchain technology to provide a secure and reliable pla… ▽ More Blockchain technology provides a secure and decentralized platform for storing and transferring sensitive medical data, which can be utilized to enable remote medical consultations. This paper proposes a theoretical framework for creating a blockchain-based digital entity to facilitate telemedicine services. The proposed framework utilizes blockchain technology to provide a secure and reliable platform for medical practitioners to remotely interact with patient transactions. The blockchain will serve as a one-stop digital service to secure patient data, ensure privacy, and facilitate payments. The proposed framework leverages the existing Hyperledger Fabric platform to build a secure blockchain-assisted telemedicine platform. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.06300 [pdf, other]

An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

Authors: Nguyen Khoi Tran, Samodha Pallewatta, M. Ali Babar

Abstract: With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado… ▽ More With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Adopting SSC metadata requires organisations to procure or develop a Software Supply Chain Metadata Management system (SCM2), a suite of software tools for performing life cycle activities of SSC metadata documents such as creation, signing, distribution, and consumption. Selecting or developing an SCM2 is challenging due to the lack of a comprehensive domain model and architectural blueprint to aid practitioners in navigating the vast design space of SSC metadata terminologies, frameworks, and solutions. This paper addresses the above-mentioned challenge by presenting an empirically grounded Reference Architecture (RA) comprising of a domain model and an architectural blueprint for SCM2 systems. Our proposed RA is constructed systematically on an empirical foundation built with industry-driven and peer-reviewed SSC security frameworks. Our theoretical evaluation, which consists of an architectural mapping of five prominent SSC security tools on the RA, ensures its validity and applicability, thus affirming the proposed RA as an effective framework for analysing existing SCM2 solutions and guiding the engineering of new SCM2 systems. △ Less

Submitted 8 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted for full paper presentation at EASE 2024 conference

arXiv:2310.00635 [pdf, other]

Reinforcement Learning Based Neighbour Selection for VANET with Adaptive Trust Management

Authors: Orvila Sarker, Hong Shen, M. Ali Babar

Abstract: Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet dropping. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood informatio… ▽ More Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet dropping. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood information without compromising the quality of service requirements is an ongoing challenge. This work proposes a Reinforcement Learning (RL) based neighbour selection framework for VANET with an adaptive trust management system to capture the behavioural changes of potential attackers and to dynamically update the neighbourhood information. In contrast to existing works, we consider trust and link-life time in unison as neighbour selection criteria to achieve trustworthy communication. Our adaptive trust model takes into account the social relationship, time and confidence in trust observation to avoid four types of attackers. To update the neighbourhood information, our framework sets the learning rate of the RL agent according to the velocities of the neighbour nodes to improve the model's adaptability to network topology changes. Results demonstrate that our method can take less number of hops to the destination for large network sizes while can response is up to 54 percent faster compared to a baseline method. Also, the proposed model can outperform the other baseline method by reducing the packet dropping rate up to 57 percent caused by the attacker. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: This article is accepted at the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 2023

arXiv:2308.11862 [pdf, other]

Empirical Analysis of Software Vulnerabilities Causing Timing Side Channels

Authors: M. Mehdi Kholoosi, M. Ali Babar, Cemal Yilmaz

Abstract: Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several effor… ▽ More Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several efforts have been devoted to exploring the various aspects of timing attacks, particularly in cryptography, little attention has been paid to empirically studying the timing attack-related vulnerabilities in non-cryptographic software. By inspecting these software vulnerabilities, this study aims to gain an evidence-based understanding of weaknesses in non-cryptographic software that may help timing attacks succeed. We used qualitative and quantitative research approaches to systematically study the timing attack-related vulnerabilities reported in the National Vulnerability Database (NVD) from March 2003 to December 2022. Our analysis was focused on the modifications made to the code for patching the identified vulnerabilities. We found that a majority of the timing attack-related vulnerabilities were introduced due to not following known secure coding practices. The findings of this study are expected to help the software security community gain evidence-based information about the nature and causes of the vulnerabilities related to timing attacks. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2307.04458 [pdf, other]

Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu

Authors: Victor Prokhorenko, Chadni Islam, Muhammad Ali Babar

Abstract: An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused… ▽ More An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files, specifically executable files, and dynamically loadable libraries. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files and their associated evolutionary changes. We demonstrate the utility of DepEx by systematically investigating the evolution of a large-scale Open Source OS, Ubuntu. DepEx enabled us to systematically acquire and analyze the dependencies in different versions of Ubuntu released between 2005 (5.04) to 2023 (23.04). Our analysis revealed various evolutionary trends in package management and their implications based on the analysis of the 84 consecutive versions available for download (these include beta versions). This study has enabled us to assert that DepEx can provide researchers and practitioners with a better understanding of the implicit software dependencies in order to improve the stability, performance, and functionality of their software as well as to reduce the risk of issues arising during maintenance, updating, or migration. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: This paper is accepted for publication in the 17th international conference on Software Architecture

arXiv:2307.01225 [pdf, other]

Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

Authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba

Abstract: Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven… ▽ More Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven Detection and Transformation (IT-DT) framework. It focuses on interpretability and transparency in detecting and transforming textual adversarial examples. IT-DT utilizes techniques like attention maps, integrated gradients, and model feedback for interpretability during detection. This helps identify salient features and perturbed words contributing to adversarial classifications. In the transformation phase, IT-DT uses pre-trained embeddings and model feedback to generate optimal replacements for perturbed words. By finding suitable substitutions, we aim to convert adversarial examples into non-adversarial counterparts that align with the model's intended behavior while preserving the text's meaning. Transparency is emphasized through human expert involvement. Experts review and provide feedback on detection and transformation results, enhancing decision-making, especially in complex scenarios. The framework generates insights and threat intelligence empowering analysts to identify vulnerabilities and improve model robustness. Comprehensive experiments demonstrate the effectiveness of IT-DT in detecting and transforming adversarial examples. The approach enhances interpretability, provides transparency, and enables accurate identification and successful transformation of adversarial inputs. By combining technical analysis and human expertise, IT-DT significantly improves the resilience and trustworthiness of transformer-based text classifiers against adversarial attacks. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.08869 [pdf, other]

Detecting Misuse of Security APIs: A Systematic Review

Authors: Zahra Mousavi, Chadni Islam, M. Ali Babar, Alsharif Abuadbba, Kristen Moore

Abstract: Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and… ▽ More Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and evaluated several approaches to detecting security API misuse to help developers and organizations. This study rigorously reviews the literature on detecting misuse of security APIs to gain a comprehensive understanding of this critical domain. Our goal is to identify and analyze security API misuses, the detection approaches developed, and the evaluation methodologies employed along with the open research avenues to advance the state-of-the-art in this area. Employing the systematic literature review (SLR) methodology, we analyzed 69 research papers. Our review has yielded (a) identification of 6 security API types; (b) classification of 30 distinct misuses; (c) categorization of detection techniques into heuristic-based and ML-based approaches; and (d) identification of 10 performance measures and 9 evaluation benchmarks. The review reveals a lack of coverage of detection approaches in several areas. We recommend that future efforts focus on aligning security API development with developers' needs and advancing standardized evaluation methods for detection technologies. △ Less

Submitted 14 May, 2025; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted for publication in ACM Computing Surveys, 2025

arXiv:2306.06600 [pdf, other]

Enabling Spatial Digital Twins: Technologies, Challenges, and Future Research Directions

Authors: Mohammed Eunus Ali, Muhammad Aamir Cheema, Tanzima Hashem, Anwaar Ulhaq, Muhammad Ali Babar

Abstract: A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment.… ▽ More A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment. The current body of research on SDTs primarily concentrates on analyzing their potential impact and opportunities within various application domains. As building an SDT is a complex process and requires a variety of spatial computing technologies, it is not straightforward for practitioners and researchers of this multi-disciplinary domain to grasp the underlying details of enabling technologies of the SDT. In this paper, we are the first to systematically analyze different spatial technologies relevant to building an SDT in layered approach (starting from data acquisition to visualization). More specifically, we present the key components of SDTs into four layers of technologies: (i) data acquisition; (ii) spatial database management \& big data analytics systems; (iii) GIS middleware software, maps \& APIs; and (iv) key functional components such as visualizing, querying, mining, simulation and prediction. Moreover, we discuss how modern technologies such as AI/ML, blockchains, and cloud computing can be effectively utilized in enabling and enhancing SDTs. Finally, we identify a number of research challenges and opportunities in SDTs. This work serves as an important resource for SDT researchers and practitioners as it explicitly distinguishes SDTs from traditional DTs, identifies unique applications, outlines the essential technological components of SDTs, and presents a vision for their future development along with the challenges that lie ahead. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: 26 pages, 2 figures

arXiv:2305.12736

Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgen… ▽ More Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgent need of identifying and evaluating suitable approaches that can help in reducing the retraining efforts and time for ML models used for TCP in CI environments. Aims: This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments without requiring detailed knowledge of the software projects. Method: We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model. We evaluated the efficacy of this method on multiple datasets and compared the APFDc and NAPFD evaluation metrics against models that were regularly retrained, with careful consideration of the statistical methods. Results: Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs. However, the performance of this method may vary depending on the dataset. Conclusions: Our findings suggest that data drift detection methods can assist in identifying retraining points for ML models in CI environments, while significantly reducing the required retraining time. These methods can be helpful for practitioners who lack specialized knowledge of software projects, enabling them to maintain ML model accuracy. △ Less

Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

arXiv:2305.12695

Systematic Literature Review on Application of Machine Learning in Continuous Integration

Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metr… ▽ More This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metrics. In this paper, we have depicted the phases of CI testing, the connection between them, and the employed techniques in training the ML method phases. We presented nine types of data sources and four taken steps in the selected studies for preparing the data. Also, we identified four feature types and nine subsets of data features through thematic analysis of the selected studies. Besides, five methods for selecting and tuning the hyper-parameters are shown. In addition, we summarised the evaluation methods used in the literature and identified fifteen different metrics. The most commonly used evaluation methods were found to be precision, recall, and F1-score, and we have also identified five methods for evaluating the performance of trained ML models. Finally, we have presented the relationship between ML model types, performance measurements, and CI phases. The study provides valuable insights for researchers and practitioners interested in ML-based methods in CI and emphasizes the need for further research in this area. △ Less

Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

arXiv:2305.11657 [pdf, other]

Cost Sharing Public Project with Minimum Release Delay

Authors: Mingyu Guo, Diksha Goel, Guanhua Wang, Yong Yang, Muhammad Ali Babar

Abstract: We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier t… ▽ More We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier than low-paying agents. The release delay, while hurting the social welfare, is implemented to incentivize payments to cover the project cost. The mechanism design objective is to minimize the maximum release delay and the total release delay among all agents. We first consider the setting where we know the prior distribution of the agents' types. Our objectives are minimizing the expected maximum release delay and the expected total release delay. We propose the single deadline mechanisms. We show that the optimal single deadline mechanism is asymptotically optimal for both objectives, regardless of the prior distribution. For small number of agents, we propose the sequential unanimous mechanisms by extending the largest unanimous mechanisms from [Ohseto 2000]. We propose an automated mechanism design approach via evolutionary computation to optimize within the sequential unanimous mechanisms. We next study prior-free mechanism design. We propose the group-based optimal deadline mechanism and show that it is competitive against an undominated mechanism under minor technical assumptions. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.07315

arXiv:2304.11520 [pdf, other]

doi 10.1145/3629526.3645054

Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?

Authors: Souvika Sarkar, Mohammad Fakhruddin Babar, Md Mahadi Hassan, Monowar Hasan, Shubhra Kanti Karmaker Santu

Abstract: This paper presents a performance study of transformer language models under different hardware configurations and accuracy requirements and derives empirical observations about these resource/accuracy trade-offs. In particular, we study how the most commonly used BERT-based language models (viz., BERT, RoBERTa, DistilBERT, and TinyBERT) perform on embedded systems. We tested them on four off-the-… ▽ More This paper presents a performance study of transformer language models under different hardware configurations and accuracy requirements and derives empirical observations about these resource/accuracy trade-offs. In particular, we study how the most commonly used BERT-based language models (viz., BERT, RoBERTa, DistilBERT, and TinyBERT) perform on embedded systems. We tested them on four off-the-shelf embedded platforms (Raspberry Pi, Jetson, UP2, and UDOO) with 2 GB and 4 GB memory (i.e., a total of eight hardware configurations) and four datasets (i.e., HuRIC, GoEmotion, CoNLL, WNUT17) running various NLP tasks. Our study finds that executing complex NLP tasks (such as "sentiment" classification) on embedded systems is feasible even without any GPUs (e.g., Raspberry Pi with 2 GB of RAM). Our findings can help designers understand the deployability and performance of transformer language models, especially those based on BERT architectures. △ Less

Submitted 6 March, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

Journal ref: ICPE 2024

arXiv:2304.02829 [pdf, other]

SoK: Machine Learning for Continuous Integration

Authors: Ali Kazemi Arani, Mansooreh Zahedi, Triet Huynh Minh Le, Muhammad Ali Babar

Abstract: Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches… ▽ More Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches for CI phases. This paper reports an SoK of different aspects of the use of ML for CI. Our systematic analysis also highlights the deficiencies of the existing ML-based solutions that can be improved for advancing the state-of-the-art. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 6 pages, 2 figures, accepted in the ICSE'23 Workshop on Cloud Intelligence / AIOps

arXiv:2301.05456 [pdf, other]

Data Quality for Software Vulnerability Datasets

Authors: Roland Croft, M. Ali Babar, Mehdi Kholoosi

Abstract: The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reli… ▽ More The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reliability and value of produced outcomes. Whilst awareness of software vulnerability data preparation challenges is growing, there has been little investigation into the potential negative impacts of software vulnerability data quality. For instance, we lack confirmation that vulnerability labels are correct or consistent. Our study seeks to address such shortcomings by inspecting five inherent data quality attributes for four state-of-the-art software vulnerability datasets and the subsequent impacts that issues can have on software vulnerability prediction models. Surprisingly, we found that all the analyzed datasets exhibit some data quality problems. In particular, we found 20-71% of vulnerability labels to be inaccurate in real-world datasets, and 17-99% of data points were duplicated. We observed that these issues could cause significant impacts on downstream models, either preventing effective model training or inflating benchmark performance. We advocate for the need to overcome such challenges. Our findings will enable better consideration and assessment of software vulnerability data quality in the future. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: Accepted for publication in the ICSE 23 Technical Track

arXiv:2211.08916 [pdf, other]

doi 10.1109/TSE.2023.3290237

Privacy Engineering in the Wild: Understanding the Practitioners' Mindset, Organisational Aspects, and Current Practices

Authors: Leonardo Horn Iwaya, Muhammad Ali Babar, Awais Rashid

Abstract: Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. S… ▽ More Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. Significant research has been done to understand the software practitioners' perceptions of information privacy, but more emphasis should be given to the uptake of concrete privacy engineering components. This research delves into the software practitioners' perspectives and mindset, organisational aspects, and current practices on privacy and its engineering processes. A total of 30 practitioners from nine countries and backgrounds were interviewed, sharing their experiences and voicing their opinions on a broad range of privacy topics. The thematic analysis methodology was adopted to code the interview data qualitatively and construct a rich and nuanced thematic framework. As a result, we identified three critical interconnected themes that compose our thematic framework for privacy engineering "in the wild": (1) personal privacy mindset and stance, categorised into practitioners' privacy knowledge, attitudes and behaviours; (2) organisational privacy aspects, such as decision-power and positive and negative examples of privacy climate; and, (3) privacy engineering practices, such as procedures and controls concretely used in the industry. Among the main findings, this study provides many insights about the state-of-the-practice of privacy engineering, pointing to a positive influence of privacy laws (e.g., EU General Data Protection Regulation) on practitioners' behaviours and organisations' cultures. Aspects such as organisational privacy culture and climate were also confirmed to have [...]. △ Less

Submitted 30 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 26 pages, 8 figures

arXiv:2211.07585 [pdf]

An Empirical Study on Secure Usage of Mobile Health Apps: The Attack Simulation Approach

Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

Abstract: Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such… ▽ More Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such studies rely on self reported behaviors documented via interviews or survey questions that lack a practical, i.e. action based approach to monitor and synthesise users actions and behaviors in security critical scenarios. We conducted an empirical study, engaging participants with attack simulation scenarios and analyse their actions, for investigating the security awareness of mHealth app users via action-based research. We simulated some common security attack scenarios in mHealth context and engaged a total of 105 app users to monitor their actions and analyse their behavior. We analysed users data with statistical analysis including reliability and correlations tests, descriptive analysis, and qualitative data analysis. Our results indicate that whilst the minority of our participants perceived access permissions positively, the majority had negative views by indicating that such an app could violate or cost them to lose privacy. Users provide their consent, granting permissions, without a careful review of privacy policies that leads to undesired or malicious access to health critical data. The results also indicated that 73.3% of our participants had denied at least one access permission, and 36% of our participants preferred no authentication method. The study complements existing research on secure usage of mHealth apps, simulates security threats to monitor users actions, and provides empirically grounded guidelines for secure development and usage of mobile health systems. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.06953 [pdf, other]

Collaborative Application Security Testing for DevSecOps: An Empirical Analysis of Challenges, Best Practices and Tool Support

Authors: Roshan Namal Rajapakse, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional te… ▽ More DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional teams. However, collaborative aspects related to these teams have received very little empirical attention in the DevSecOps literature. Hence, we present a study focusing on a key security activity, Application Security Testing (AST), in which practitioners face difficulties performing collaborative work in a DevSecOps environment. Our study made novel use of 48 systematically selected webinars, technical talks and panel discussions as a data source to qualitatively analyse software practitioner discussions on the most recent trends and emerging solutions in this highly evolving field. We find that the lack of features that facilitate collaboration built into the AST tools themselves is a key tool-related challenge in DevSecOps. In addition, the lack of clarity related to role definitions, shared goals, and ownership also hinders Collaborative AST (CoAST). We also captured a range of best practices for collaboration (e.g., Shift-left security), emerging communication methods (e.g., ChatOps), and new team structures (e.g., hybrid teams) for CoAST. Finally, our study identified several requirements for new tool features and specific gap areas for future research to provide better support for CoAST in DevSecOps. △ Less

Submitted 25 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Submitted to the Empirical Software Engineering journal_v2

arXiv:2210.06679 [pdf, other]

A Survey on UAV-enabled Edge Computing: Resource Management Perspective

Authors: Xiaoyu Xia, Sheik Mohammad Mostakim Fattah, Muhammad Ali Babar

Abstract: Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where… ▽ More Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where traditional terrestrial networks are limited or unavailable. In such environments, UAVs can be deployed as aerial edge servers or relays to facilitate edge computing services. This form of computing is also known as UAV-enabled Edge Computing (UEC), which offers several unique benefits such as mobility, line-of-sight, flexibility, computational capability, and cost-efficiency. However, the resources on UAVs, edge servers, and IoT devices are typically very limited in the context of UEC. Efficient resource management is, therefore, a critical research challenge in UEC. In this article, we present a survey on the existing research in UEC from the resource management perspective. We identify a conceptual architecture, different types of collaborations, wireless communication models, research directions, key techniques and performance indicators for resource management in UEC. We also present a taxonomy of resource management in UEC. Finally, we identify and discuss some open research challenges that can stimulate future research directions for resource management in UEC. △ Less

Submitted 26 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: 36 pages, Accepted to ACM CSUR

arXiv:2209.09487 [pdf, other]

Design and Implementation of Fragmented Clouds for Evaluation of Distributed Databases

Authors: Yaser Mansouri, Faheem Ullah, Shagun Dhingra, M. Ali Babar

Abstract: In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms o… ▽ More In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms of latency and network throughput that are reversely proportional to time-varying distances among different nodes. Mobility also results in intermittent interruption among computing nodes and network links of FHC infrastructure. To fully consider mobility and its consequences, we implemented a layered FHC that leverages Linux utilities and bash-shell programming. We also evaluated the impact of the mobility of nodes on the performance of distributed databases as a result of time-varying latency and bandwidth, downsizing and upsizing cluster nodes, and network accessibility. The findings from our extensive experiments provide deep insights into the performance of well-known big data databases, such as Cassandra, MongoDB, Redis, and MySQL, when deployed on a FHC. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.07869 [pdf, other]

LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

Abstract: Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.… ▽ More Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies. These anomalies are often identified as violations of quantitative relational patterns or sequential patterns of log events in log sequences. However, existing methods fail to leverage the spatial structural relationships among log events, resulting in potential false alarms and unstable performance. In this study, we propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue by transforming log sequences into graphs. We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection. We evaluate the proposed method on four widely-used public log datasets. Experimental results show that LogGD can outperform state-of-the-art quantitative-based and sequence-based methods and achieve stable performance under different window size settings. The results confirm that LogGD is effective in log-based anomaly detection. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 12 pages, 12 figures

arXiv:2209.01518 [pdf, other]

doi 10.1145/3551349.3556969

An Empirical Study of Automation in Software Security Patch Management

Authors: Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, Muhammad Ali Babar

Abstract: Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated i… ▽ More Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated in the existing software engineering literature. This paper reports an empirical study aimed at investigating different aspects of automation for security patch management using semi-structured interviews with 17 practitioners from three different organisations in the healthcare domain. The findings are focused on the role of automation in security patch management for providing insights into the as-is state of automation in practice, the limitations of current automation, how automation support can be enhanced to effectively meet practitioners' needs, and the role of the human in an automated process. Based on the findings, we have derived a set of recommendations for directing future efforts aimed at developing automated support for security patch management. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 13 pages, 2 figures

arXiv:2206.10110 [pdf, other]

ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

Authors: Nguyen Khoi Tran, Bushra Sabir, M. Ali Babar, Nini Cui, Mehran Abolhasan, Justin Lipman

Abstract: Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fai… ▽ More Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted as full paper in ECSA 2022 conference. To be presented

arXiv:2205.07204 [pdf, other]

doi 10.1145/3534526

Mod2Dash: A Framework for Model-Driven Dashboards Generation

Authors: Liuyue Jiang, Nguyen Khoi Tran, M. Ali Babar

Abstract: The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, pr… ▽ More The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, practitioners have to implement this implicit design manually by coding or configuring it on a dashboard platform. This paper proposes Mod2Dash, a software framework that enables practitioners to capture their dashboard designs as models and generate operational dashboards automatically from these models. The framework also provides a GUI-driven customization approach for practitioners to fine-tune the auto-generated dashboards and update their models. With these abilities, Mod2Dash enables practitioners to rapidly prototype and deploy dashboards for both operational and research purposes. We evaluated the framework's effectiveness in a case study on cyber security visualization for decision support. A proof-of-concept of Mod2Dash was employed to model and reconstruct 31 diverse real-world cyber security dashboards. A human-assisted comparison between the Mod2Dash-generated dashboards and the baseline dashboards shows a close matching, indicating the framework's effectiveness for real-world scenarios. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Showing 1–50 of 116 results for author: Babar, M