Skip to main content

Showing 1–50 of 116 results for author: Babar, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05565  [pdf, ps, other

    cs.SE cs.AI cs.NE

    Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

    Authors: Sangwon Hyun, Shaukat Ali, M. Ali Babar

    Abstract: Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. M… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2507.00460  [pdf, ps, other

    cs.CL

    Pitfalls of Evaluating Language Models with Open Benchmarks

    Authors: Md. Najib Hasan, Mohammad Fakhruddin Babar, Souvika Sarkar, Monowar Hasan, Santu Karmaker

    Abstract: Open Large Language Model (LLM) benchmarks, such as HELM and BIG-bench, offer standardized, transparent protocols that facilitate the fair comparison, reproducibility, and iterative advancement of Language Models (LMs). However, their openness also introduces critical and underexplored pitfalls. This study exposes these weaknesses by systematically constructing ``cheating'' models -- smaller varia… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.11600  [pdf, ps, other

    cs.IR cs.AI

    GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news

    Authors: Abdul Haque, Umm e Hani, Ahmad Din, Muhammad Babar, Ali Abbas, Insaf Ullah

    Abstract: GraphRAG-Causal introduces an innovative framework that combines graph-based retrieval with large language models to enhance causal reasoning in news analysis. Traditional NLP approaches often struggle with identifying complex, implicit causal links, especially in low-data scenarios. Our approach addresses these challenges by transforming annotated news headlines into structured causal knowledge g… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 18 pages, 8 figures

  4. arXiv:2506.01991  [pdf, ps, other

    cs.DC eess.SY

    Investigating Timing-Based Information Leakage in Data Flow-Driven Real-Time Systems

    Authors: Mohammad Fakhruddin Babar, Zain A. H. Hammadeh, Mohammad Hamad, Monowar Hasan

    Abstract: Leaking information about the execution behavior of critical real-time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real-time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional condition… ▽ More

    Submitted 3 June, 2025; v1 submitted 17 May, 2025; originally announced June 2025.

  5. arXiv:2504.04351  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    DDPT: Diffusion-Driven Prompt Tuning for Large Language Model Code Generation

    Authors: Jinyang Li, Sangwon Hyun, M. Ali Babar

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation. However, the quality of the generated code is heavily dependent on the structure and composition of the prompts used. Crafting high-quality prompts is a challenging task that requires significant knowledge and skills of prompt engineering. To advance the automation support for the prompt engineering for LLM-… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: ICSE CAIN 2025

  6. arXiv:2502.02009  [pdf, other

    cs.SE cs.AI cs.CR cs.LG

    LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations

    Authors: Ziyang Ye, Triet Huynh Minh Le, M. Ali Babar

    Abstract: Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generati… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  7. arXiv:2501.16364  [pdf, other

    cs.LG cs.AI

    Multivariate Time Series Anomaly Detection by Capturing Coarse-Grained Intra- and Inter-Variate Dependencies

    Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

    Abstract: Multivariate time series anomaly detection is essential for failure management in web application operations, as it directly influences the effectiveness and timeliness of implementing remedial or preventive measures. This task is often framed as a semi-supervised learning problem, where only normal data are available for model training, primarily due to the labor-intensive nature of data labeling… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: 9 pages, 3 figures, Accepted to TheWebConference 2025

  8. arXiv:2412.06166  [pdf, other

    cs.SE cs.CR cs.LG

    MVD: A Multi-Lingual Software Vulnerability Detection Framework

    Authors: Boyu Zhang, Triet H. M. Le, M. Ali Babar

    Abstract: Software vulnerabilities can result in catastrophic cyberattacks that increasingly threaten business operations. Consequently, ensuring the safety of software systems has become a paramount concern for both private and public sectors. Recent literature has witnessed increasing exploration of learning-based approaches for software vulnerability detection. However, a key limitation of these techniqu… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  9. arXiv:2408.12904  [pdf

    cs.CR cs.SE

    SecDOAR: A Software Reference Architecture for Security Data Orchestration, Analysis and Reporting

    Authors: Muhammad Aufeef Chauhan, Muhammad Ali Babar, Fethi Rabhi

    Abstract: A Software Reference Architecture (SRA) is a useful tool for standardising existing architectures in a specific domain and facilitating concrete architecture design, development and evaluation by instantiating SRA and using SRA as a benchmark for the development of new systems. In this paper, we have presented an SRA for Security Data Orchestration, Analysis and Reporting (SecDOAR) to provide stan… ▽ More

    Submitted 25 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 21 pages, 17 Figures, 5 Tables

  10. arXiv:2408.00435  [pdf, other

    cs.SE cs.AI cs.CR

    A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality

    Authors: M. Mehdi Kholoosi, M. Ali Babar, Roland Croft

    Abstract: Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted for publication at International Conference on Trust, Privacy and Security - 2024

  11. arXiv:2407.17803  [pdf, other

    cs.SE cs.CR cs.LG

    Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?

    Authors: Triet H. M. Le, M. Ali Babar

    Abstract: Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quanti… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

  12. arXiv:2407.17053  [pdf, other

    cs.SE cs.CR cs.LG

    Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

    Authors: Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

    Abstract: Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has be… ▽ More

    Submitted 3 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

  13. arXiv:2407.10722  [pdf, other

    cs.SE cs.CR cs.LG

    Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?

    Authors: Triet H. M. Le, M. Ali Babar

    Abstract: Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distribution… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted as a full paper in the technical track at The International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

  14. arXiv:2406.19765  [pdf, other

    cs.SE cs.LG

    Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, M. Ali Babar

    Abstract: Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensivel… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to be published in IEEE Access

  15. arXiv:2406.18813  [pdf, other

    cs.CR cs.DC cs.SE

    Towards Secure Management of Edge-Cloud IoT Microservices using Policy as Code

    Authors: Samodha Pallewatta, Muhammad Ali Babar

    Abstract: IoT application providers increasingly use MicroService Architecture (MSA) to develop applications that convert IoT data into valuable information. The independently deployable and scalable nature of microservices enables dynamic utilization of edge and cloud resources provided by various service providers, thus improving performance. However, IoT data security should be ensured during multi-domai… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, Accepted for full paper presentation at ECSA 2024 conference

  16. arXiv:2406.09737  [pdf, other

    cs.SE

    A Multivocal Review of MLOps Practices, Challenges and Open Issues

    Authors: Beyza Eken, Samodha Pallewatta, Nguyen Khoi Tran, Ayse Tosun, Muhammad Ali Babar

    Abstract: MLOps has emerged as a key solution to address many socio-technical challenges of bringing ML models to production, such as integrating ML models with non-ML software, continuous monitoring, maintenance, and retraining of deployed models. Despite the utility of MLOps, an integrated body of knowledge regarding MLOps remains elusive because of its extensive scope due to the diversity of ML productio… ▽ More

    Submitted 16 April, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 44 pages, 4 figures

  17. arXiv:2406.04902  [pdf, other

    cs.ET

    Beyond Data, Towards Sustainability: A Sydney Case Study on Urban Digital Twins

    Authors: Ammar Sohail, Bojie Shen, Muhammad Aamir Cheema, Mohammed Eunus Ali, Anwaar Ulhaq, Muhammad Ali Babar, Asama Qureshi

    Abstract: As urban areas grapple with unprecedented challenges stemming from population growth and climate change, the emergence of urban digital twins offers a promising solution. This paper presents a case study focusing on Sydney's urban digital twin, a virtual replica integrating diverse real-time and historical data, including weather, crime, emissions, and traffic. Through advanced visualization and d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2405.15293  [pdf, other

    cs.CR

    Transaction Fee Estimation in the Bitcoin System

    Authors: Limeng Zhang, Rui Zhou, Qing Liu, Chengfei Liu, M. Ali Babar

    Abstract: In the Bitcoin system, transaction fees serve as an incentive for blockchain confirmations. In general, a transaction with a higher fee is likely to be included in the next block mined, whereas a transaction with a smaller fee or no fee may be delayed or never processed at all. However, the transaction fee needs to be specified when submitting a transaction and almost cannot be altered thereafter.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  19. arXiv:2404.17110  [pdf, other

    cs.SE cs.CR cs.LG

    Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

    Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

    Abstract: Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

  20. arXiv:2404.11294  [pdf, other

    cs.SE

    LogSD: Detecting Anomalies from System Logs through Self-supervised Learning and Frequency-based Masking

    Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

    Abstract: Log analysis is one of the main techniques that engineers use for troubleshooting large-scale software systems. Over the years, many supervised, semi-supervised, and unsupervised log analysis methods have been proposed to detect system anomalies by analyzing system logs. Among these, semi-supervised methods have garnered increasing attention as they strike a balance between relaxed labeled data re… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 23 pages with 11 figures

  21. arXiv:2404.06043  [pdf, other

    cs.DB

    Automatic Configuration Tuning on Cloud Database: A Survey

    Authors: Limeng Zhang, M. Ali Babar

    Abstract: Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware se… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.03823  [pdf, other

    cs.CR cs.CL cs.CY

    An Investigation into Misuse of Java Security APIs by Large Language Models

    Authors: Zahra Mousavi, Chadni Islam, Kristen Moore, Alsharif Abuadbba, Muhammad Ali Babar

    Abstract: The increasing trend of using Large Language Models (LLMs) for code generation raises the question of their capability to generate trustworthy code. While many researchers are exploring the utility of code generation for uncovering software vulnerabilities, one crucial but often overlooked aspect is the security Application Programming Interfaces (APIs). APIs play an integral role in upholding sof… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ACM ASIACCS 2024

  23. arXiv:2403.15416  [pdf, other

    cs.LG cs.AI math.OC

    Fuzzy hyperparameters update in a second order optimization

    Authors: Abdelaziz Bensadok, Muhammad Zeeshan Babar

    Abstract: This research will present a hybrid approach to accelerate convergence in a second order optimization. An online finite difference approximation of the diagonal Hessian matrix will be introduced, along with fuzzy inferencing of several hyperparameters. Competitive results have been achieved

    Submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2401.16577  [pdf, other

    cs.CL cs.AI

    LLMs as On-demand Customizable Service

    Authors: Souvika Sarkar, Mohammad Fakhruddin Babar, Monowar Hasan, Shubhra Kanti Karmaker

    Abstract: Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  25. arXiv:2401.13199  [pdf, other

    cs.CR cs.CY cs.HC

    Why People Still Fall for Phishing Emails: An Empirical Investigation into How Users Make Email Response Decisions

    Authors: Asangi Jayatilaka, Nalin Asanka Gamagedara Arachchilage, Muhammad Ali Babar

    Abstract: Despite technical and non-technical countermeasures, humans continue to be tricked by phishing emails. How users make email response decisions is a missing piece in the puzzle to identifying why people still fall for phishing emails. We conducted an empirical study using a think-aloud method to investigate how people make 'response decisions' while reading emails. The grounded theory analysis of t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Journal ref: Symposium on Usable Security and Privacy (USEC) 2024

  26. arXiv:2401.11105  [pdf, other

    cs.SE cs.CR cs.LG

    Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

    Authors: Triet H. M. Le, Xiaoning Du, M. Ali Babar

    Abstract: Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the useful… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted as a full paper in the technical track at the 21st International Conference on Mining Software Repositories (MSR) 2024

  27. arXiv:2312.06056  [pdf, other

    cs.SE cs.AI cs.CL

    METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

    Authors: Sangwon Hyun, Mingyu Guo, M. Ali Babar

    Abstract: Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have l… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted to International Conference on Software Testing, Verification and Validation (ICST) 2024 / Key words: Large-language models, Metamorphic testing, Quality evaluation, Text perturbations

  28. arXiv:2310.18839  [pdf, other

    cs.CR

    The Telehealth Chain: a protocol for secure and transparent telemedicine transactions on the blockchain

    Authors: Syed Sarosh Mahdi, Zaib Ullah, Gopi Battineni, Muneer Gohar Babar, Umer Daood

    Abstract: Blockchain technology provides a secure and decentralized platform for storing and transferring sensitive medical data, which can be utilized to enable remote medical consultations. This paper proposes a theoretical framework for creating a blockchain-based digital entity to facilitate telemedicine services. The proposed framework utilizes blockchain technology to provide a secure and reliable pla… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  29. arXiv:2310.06300  [pdf, other

    cs.CR cs.SE

    An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

    Authors: Nguyen Khoi Tran, Samodha Pallewatta, M. Ali Babar

    Abstract: With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado… ▽ More

    Submitted 8 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted for full paper presentation at EASE 2024 conference

  30. arXiv:2310.00635  [pdf, other

    cs.NI

    Reinforcement Learning Based Neighbour Selection for VANET with Adaptive Trust Management

    Authors: Orvila Sarker, Hong Shen, M. Ali Babar

    Abstract: Successful information propagation from source to destination in Vehicular Adhoc Network (VANET) can be hampered by the presence of neighbouring attacker nodes causing unwanted packet dropping. Potential attackers change their behaviour over time and remain undetected due to the ad-hoc nature of VANET. Capturing the dynamic attacker behaviour and updating the corresponding neighbourhood informatio… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: This article is accepted at the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 2023

  31. arXiv:2308.11862  [pdf, other

    cs.CR cs.SE

    Empirical Analysis of Software Vulnerabilities Causing Timing Side Channels

    Authors: M. Mehdi Kholoosi, M. Ali Babar, Cemal Yilmaz

    Abstract: Timing attacks are considered one of the most damaging side-channel attacks. These attacks exploit timing fluctuations caused by certain operations to disclose confidential information to an attacker. For instance, in asymmetric encryption, operations such as multiplication and division can cause time-varying execution times that can be ill-treated to obtain an encryption key. Whilst several effor… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  32. arXiv:2307.04458  [pdf, other

    cs.SE

    Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu

    Authors: Victor Prokhorenko, Chadni Islam, Muhammad Ali Babar

    Abstract: An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. When a multitude of independent packages are placed together in an OS, an implicit inter-package architecture is formed. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: This paper is accepted for publication in the 17th international conference on Software Architecture

  33. arXiv:2307.01225  [pdf, other

    cs.CL cs.AI cs.LG

    Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

    Authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba

    Abstract: Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

  34. arXiv:2306.08869  [pdf, other

    cs.CR cs.SE

    Detecting Misuse of Security APIs: A Systematic Review

    Authors: Zahra Mousavi, Chadni Islam, M. Ali Babar, Alsharif Abuadbba, Kristen Moore

    Abstract: Security Application Programming Interfaces (APIs) are crucial for ensuring software security. However, their misuse introduces vulnerabilities, potentially leading to severe data breaches and substantial financial loss. Complex API design, inadequate documentation, and insufficient security training often lead to unintentional misuse by developers. The software security community has devised and… ▽ More

    Submitted 14 May, 2025; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted for publication in ACM Computing Surveys, 2025

  35. arXiv:2306.06600  [pdf, other

    cs.CY

    Enabling Spatial Digital Twins: Technologies, Challenges, and Future Research Directions

    Authors: Mohammed Eunus Ali, Muhammad Aamir Cheema, Tanzima Hashem, Anwaar Ulhaq, Muhammad Ali Babar

    Abstract: A Digital Twin (DT) is a virtual replica of a physical object or system, created to monitor, analyze, and optimize its behavior and characteristics. A Spatial Digital Twin (SDT) is a specific type of digital twin that emphasizes the geospatial aspects of the physical entity, incorporating precise location and dimensional attributes for a comprehensive understanding within its spatial environment.… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 26 pages, 2 figures

  36. arXiv:2305.12736   

    cs.SE

    Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgen… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  37. arXiv:2305.12695   

    cs.SE cs.LG

    Systematic Literature Review on Application of Machine Learning in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metr… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  38. arXiv:2305.11657  [pdf, other

    cs.GT

    Cost Sharing Public Project with Minimum Release Delay

    Authors: Mingyu Guo, Diksha Goel, Guanhua Wang, Yong Yang, Muhammad Ali Babar

    Abstract: We study the excludable public project model where the decision is binary (build or not build). In a classic excludable and binary public project model, an agent either consumes the project in its whole or is completely excluded. We study a setting where the mechanism can set different project release time for different agents, in the sense that high-paying agents can consume the project earlier t… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.07315

  39. Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?

    Authors: Souvika Sarkar, Mohammad Fakhruddin Babar, Md Mahadi Hassan, Monowar Hasan, Shubhra Kanti Karmaker Santu

    Abstract: This paper presents a performance study of transformer language models under different hardware configurations and accuracy requirements and derives empirical observations about these resource/accuracy trade-offs. In particular, we study how the most commonly used BERT-based language models (viz., BERT, RoBERTa, DistilBERT, and TinyBERT) perform on embedded systems. We tested them on four off-the-… ▽ More

    Submitted 6 March, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

    Journal ref: ICPE 2024

  40. arXiv:2304.02829  [pdf, other

    cs.SE cs.LG

    SoK: Machine Learning for Continuous Integration

    Authors: Ali Kazemi Arani, Mansooreh Zahedi, Triet Huynh Minh Le, Muhammad Ali Babar

    Abstract: Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: 6 pages, 2 figures, accepted in the ICSE'23 Workshop on Cloud Intelligence / AIOps

  41. arXiv:2301.05456  [pdf, other

    cs.SE

    Data Quality for Software Vulnerability Datasets

    Authors: Roland Croft, M. Ali Babar, Mehdi Kholoosi

    Abstract: The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability datasets used for training and benchmarking. However, we observe that the quality of the data powering these solutions is currently ill-considered, hindering the reli… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted for publication in the ICSE 23 Technical Track

  42. Privacy Engineering in the Wild: Understanding the Practitioners' Mindset, Organisational Aspects, and Current Practices

    Authors: Leonardo Horn Iwaya, Muhammad Ali Babar, Awais Rashid

    Abstract: Privacy engineering, as an emerging field of research and practice, comprises the technical capabilities and management processes needed to implement, deploy, and operate privacy features and controls in working systems. For that, software practitioners and other stakeholders in software companies need to work cooperatively toward building privacy-preserving businesses and engineering solutions. S… ▽ More

    Submitted 30 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures

  43. arXiv:2211.07585  [pdf

    cs.CY cs.SE

    An Empirical Study on Secure Usage of Mobile Health Apps: The Attack Simulation Approach

    Authors: Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi, M. Ali Babar

    Abstract: Mobile applications, mobile apps for short, have proven their usefulness in enhancing service provisioning across a multitude of domains that range from smart healthcare, to mobile commerce, and areas of context sensitive computing. In recent years, a number of empirically grounded, survey-based studies have been conducted to investigate secure development and usage of mHealth apps. However, such… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  44. arXiv:2211.06953  [pdf, other

    cs.SE

    Collaborative Application Security Testing for DevSecOps: An Empirical Analysis of Challenges, Best Practices and Tool Support

    Authors: Roshan Namal Rajapakse, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: DevSecOps is a software development paradigm that places a high emphasis on the culture of collaboration between developers (Dev), security (Sec) and operations (Ops) teams to deliver secure software continuously and rapidly. Adopting this paradigm effectively, therefore, requires an understanding of the challenges, best practices and available solutions for collaboration among these functional te… ▽ More

    Submitted 25 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Submitted to the Empirical Software Engineering journal_v2

  45. arXiv:2210.06679  [pdf, other

    cs.DC

    A Survey on UAV-enabled Edge Computing: Resource Management Perspective

    Authors: Xiaoyu Xia, Sheik Mohammad Mostakim Fattah, Muhammad Ali Babar

    Abstract: Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where… ▽ More

    Submitted 26 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: 36 pages, Accepted to ACM CSUR

  46. arXiv:2209.09487  [pdf, other

    cs.DB

    Design and Implementation of Fragmented Clouds for Evaluation of Distributed Databases

    Authors: Yaser Mansouri, Faheem Ullah, Shagun Dhingra, M. Ali Babar

    Abstract: In this paper, we present a Fragmented Hybrid Cloud (FHC) that provides a unified view of multiple geographically distributed private cloud datacenters. FHC leverages a fragmented usage model in which outsourcing is bi-directional across private clouds that can be hosted by static and mobile entities. The mobility aspect of private cloud nodes has important impact on the FHC performance in terms o… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  47. arXiv:2209.07869  [pdf, other

    cs.SE cs.LG

    LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

    Authors: Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

    Abstract: Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: 12 pages, 12 figures

  48. An Empirical Study of Automation in Software Security Patch Management

    Authors: Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: Several studies have shown that automated support for different activities of the security patch management process has great potential for reducing delays in installing security patches. However, it is also important to understand how automation is used in practice, its limitations in meeting real-world needs and what practitioners really need, an area that has not been empirically investigated i… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 13 pages, 2 figures

  49. arXiv:2206.10110  [pdf, other

    cs.SE

    ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

    Authors: Nguyen Khoi Tran, Bushra Sabir, M. Ali Babar, Nini Cui, Mehran Abolhasan, Justin Lipman

    Abstract: Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fai… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted as full paper in ECSA 2022 conference. To be presented

  50. Mod2Dash: A Framework for Model-Driven Dashboards Generation

    Authors: Liuyue Jiang, Nguyen Khoi Tran, M. Ali Babar

    Abstract: The construction of an interactive dashboard involves deciding on what information to present and how to display it and implementing those design decisions to create an operational dashboard. Traditionally, a dashboard's design is implied in the deployed dashboard rather than captured explicitly as a digital artifact, preventing it from being backed up, version-controlled, and shared. Moreover, pr… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.