Search | arXiv e-print repository

arXiv:2505.02015 [pdf, other]

Requirements-Based Test Generation: A Comprehensive Survey

Authors: Zhenzhen Yang, Rubing Huang, Chenhui Cui, Nan Niu, Dave Towey

Abstract: As an important way of assuring software quality, software testing generates and executes test cases to identify software failures. Many strategies have been proposed to guide test-case generation, such as source-code-based approaches and methods based on bug reports. Requirements-based test generation (RBTG) constructs test cases based on specified requirements, aligning with user needs and expec… ▽ More As an important way of assuring software quality, software testing generates and executes test cases to identify software failures. Many strategies have been proposed to guide test-case generation, such as source-code-based approaches and methods based on bug reports. Requirements-based test generation (RBTG) constructs test cases based on specified requirements, aligning with user needs and expectations, without requiring access to the source code. Since its introduction in 1994, there have been many contributions to the development of RBTG, including various approaches, implementations, tools, assessment and evaluation methods, and applications. This paper provides a comprehensive survey on RBTG, categorizing requirement types, classifying approaches, investigating types of test cases, summarizing available tools, and analyzing experimental evaluations. This paper also summarizes the domains and industrial applications of RBTG, and discusses some open research challenges and potential future work. △ Less

Submitted 6 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

arXiv:2504.16833 [pdf, other]

LRASGen: LLM-based RESTful API Specification Generation

Authors: Sida Deng, Rubing Huang, Man Zhang, Chenhui Cui, Dave Towey, Rongcun Wang

Abstract: REpresentation State Transfer (REST) is an architectural style for designing web applications that enable scalable, stateless communication between clients and servers via common HTTP techniques. Web APIs that employ the REST style are known as RESTful (or REST) APIs. When using or testing a RESTful API, developers may need to employ its specification, which is often defined by open-source standar… ▽ More REpresentation State Transfer (REST) is an architectural style for designing web applications that enable scalable, stateless communication between clients and servers via common HTTP techniques. Web APIs that employ the REST style are known as RESTful (or REST) APIs. When using or testing a RESTful API, developers may need to employ its specification, which is often defined by open-source standards such as the OpenAPI Specification (OAS). However, it can be very time-consuming and error-prone to write and update these specifications, which may negatively impact the use of RESTful APIs, especially when the software requirements change. Many tools and methods have been proposed to solve this problem, such as Respector and Swagger Core. OAS generation can be regarded as a common text-generation task that creates a formal description of API endpoints derived from the source code. A potential solution for this may involve using Large Language Models (LLMs), which have strong capabilities in both code understanding and text generation. Motivated by this, we propose a novel approach for generating the OASs of RESTful APIs using LLMs: LLM-based RESTful API-Specification Generation (LRASGen). To the best of our knowledge, this is the first use of LLMs and API source code to generate OASs for RESTful APIs. Compared with existing tools and methods, LRASGen can generate the OASs, even when the implementation is incomplete (with partial code, and/or missing annotations/comments, etc.). To evaluate the LRASGen performance, we conducted a series of empirical studies on 20 real-world RESTful APIs. The results show that two LLMs (GPT-4o mini and DeepSeek V3) can both support LARSGen to generate accurate specifications, and LRASGen-generated specifications cover an average of 48.85% more missed entities than the developer-provided specifications. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2503.22141 [pdf, other]

Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations

Authors: Yifan Zhang, Dave Towey, Matthew Pike, Quang-Hung Luu, Huai Liu, Tsong Yueh Chen

Abstract: Context: This paper provides an in-depth examination of the generation and evaluation of Metamorphic Relations (MRs) using GPT models developed by OpenAI, with a particular focus on the capabilities of GPT-4 in software testing environments. Objective: The aim is to examine the quality of MRs produced by GPT-3.5 and GPT-4 for a specific System Under Test (SUT) adopted from an earlier study, and… ▽ More Context: This paper provides an in-depth examination of the generation and evaluation of Metamorphic Relations (MRs) using GPT models developed by OpenAI, with a particular focus on the capabilities of GPT-4 in software testing environments. Objective: The aim is to examine the quality of MRs produced by GPT-3.5 and GPT-4 for a specific System Under Test (SUT) adopted from an earlier study, and to introduce and apply an improved set of evaluation criteria for a diverse range of SUTs. Method: The initial phase evaluates MRs generated by GPT-3.5 and GPT-4 using criteria from a prior study, followed by an application of an enhanced evaluation framework on MRs created by GPT-4 for a diverse range of nine SUTs, varying from simple programs to complex systems incorporating AI/ML components. A custom-built GPT evaluator, alongside human evaluators, assessed the MRs, enabling a direct comparison between automated and human evaluation methods. Results: The study finds that GPT-4 outperforms GPT-3.5 in generating accurate and useful MRs. With the advanced evaluation criteria, GPT-4 demonstrates a significant ability to produce high-quality MRs across a wide range of SUTs, including complex systems incorporating AI/ML components. Conclusions: GPT-4 exhibits advanced capabilities in generating MRs suitable for various applications. The research underscores the growing potential of AI in software testing, particularly in the generation and evaluation of MRs, and points towards the complementarity of human and AI skills in this domain. △ Less

Submitted 28 March, 2025; originally announced March 2025.

Comments: Submitted to Information and Software Technology

arXiv:2412.10476 [pdf, other]

A Survey on Web Application Testing: A Decade of Evolution

Authors: Tao Li, Rubing Huang, Chenhui Cui, Dave Towey, Lei Ma, Yuan-Fang Li, Wen Xia

Abstract: As one of the most popular software applications, a web application is a program, accessible through the web, to dynamically generate content based on user interactions or contextual data, for example, online shopping platforms, social networking sites, and financial services. Web applications operate in diverse environments and leverage web technologies such as HTML, CSS, JavaScript, and Ajax, of… ▽ More As one of the most popular software applications, a web application is a program, accessible through the web, to dynamically generate content based on user interactions or contextual data, for example, online shopping platforms, social networking sites, and financial services. Web applications operate in diverse environments and leverage web technologies such as HTML, CSS, JavaScript, and Ajax, often incorporating features like asynchronous operations to enhance user experience. Due to the increasing user and popularity of web applications, approaches to their quality have become increasingly important. Web Application Testing (WAT) plays a vital role in ensuring web applications' functionality, security, and reliability. Given the speed with which web technologies are evolving, WAT is especially important. Over the last decade, various WAT approaches have been developed. The diversity of approaches reflects the many aspects of web applications, such as dynamic content, asynchronous operations, and diverse user environments. This paper provides a comprehensive overview of the main achievements during the past decade: It examines the main steps involved in WAT, including test-case generation and execution, and evaluation and assessment. The currently available tools for WAT are also examined. The paper also discusses some open research challenges and potential future WAT work. △ Less

Submitted 25 April, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

arXiv:2410.00077 [pdf, other]

RNA-Protein Interaction Prediction Based on Deep Learning: A Comprehensive Survey

Authors: Danyu Li, Rubing Huang, Chenhui Cui, Dave Towey, Ling Zhou, Jinyu Tian, Bin Zou

Abstract: The interaction between Ribonucleic Acids (RNAs) and proteins, also called RNA Protein Interaction (RPI), plays an important role in the life activities of organisms, including in various regulatory processes, such as gene splicing, gene localization, and disease pathogenesis. RPI Prediction (RPIP) predicts the interactions between RNAs and proteins, which includes looking for the existence of int… ▽ More The interaction between Ribonucleic Acids (RNAs) and proteins, also called RNA Protein Interaction (RPI), plays an important role in the life activities of organisms, including in various regulatory processes, such as gene splicing, gene localization, and disease pathogenesis. RPI Prediction (RPIP) predicts the interactions between RNAs and proteins, which includes looking for the existence of interactions and the binding sites of interactions, and adding RNA-protein functional annotations (such as immunity regulation, neuroprotection, etc). Due to the huge amounts of complex biological data, Deep Learning-based RPIP (DL-based RPIP) has been widely investigated, as it can extract high-dimensional features from data and make accurate predictions. Over the last decade, there have been many achievements and contributions in DL-based RPIP. Although some previous studies review DL-based RPIP, to the best of our knowledge, there is still a lack of a comprehensive survey. In this paper, we extensively survey DL-based RPIP in terms of its entire process, including: feature encoding, deep learning modeling, results evaluation, RPIP application domains, and available websites and software. We also identify some open research challenges, and discuss the potential future work for DL-based RPIP. △ Less

Submitted 30 September, 2024; originally announced October 2024.

arXiv:2408.16202 [pdf, other]

doi 10.1016/j.engappai.2025.110980

Short-Term Electricity-Load Forecasting by Deep Learning: A Comprehensive Survey

Authors: Qi Dong, Rubing Huang, Chenhui Cui, Dave Towey, Ling Zhou, Jinyu Tian, Jianzhou Wang

Abstract: Short-Term Electricity-Load Forecasting (STELF) refers to the prediction of the immediate demand (in the next few hours to several days) for the power system. Various external factors, such as weather changes and the emergence of new electricity consumption scenarios, can impact electricity demand, causing load data to fluctuate and become non-linear, which increases the complexity and difficulty… ▽ More Short-Term Electricity-Load Forecasting (STELF) refers to the prediction of the immediate demand (in the next few hours to several days) for the power system. Various external factors, such as weather changes and the emergence of new electricity consumption scenarios, can impact electricity demand, causing load data to fluctuate and become non-linear, which increases the complexity and difficulty of STELF. In the past decade, deep learning has been applied to STELF, modeling and predicting electricity demand with high accuracy, and contributing significantly to the development of STELF. This paper provides a comprehensive survey on deep-learning-based STELF over the past ten years. It examines the entire forecasting process, including data pre-processing, feature extraction, deep-learning modeling and optimization, and results evaluation. This paper also identifies some research challenges and potential research directions to be further investigated in future work. △ Less

Submitted 18 May, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

Comments: To be published in Engineering Applications of Artificial Intelligence

arXiv:2406.05397 [pdf, other]

Metamorphic Relation Generation: State of the Art and Visions for Future Research

Authors: Rui Li, Huai Liu, Pak-Lok Poon, Dave Towey, Chang-Ai Sun, Zheng Zheng, Zhi Quan Zhou, Tsong Yueh Chen

Abstract: Metamorphic testing has become one mainstream technique to address the notorious oracle problem in software testing, thanks to its great successes in revealing real-life bugs in a wide variety of software systems. Metamorphic relations, the core component of metamorphic testing, have continuously attracted research interests from both academia and industry. In the last decade, a rapidly increasing… ▽ More Metamorphic testing has become one mainstream technique to address the notorious oracle problem in software testing, thanks to its great successes in revealing real-life bugs in a wide variety of software systems. Metamorphic relations, the core component of metamorphic testing, have continuously attracted research interests from both academia and industry. In the last decade, a rapidly increasing number of studies have been conducted to systematically generate metamorphic relations from various sources and for different application domains. In this article, based on the systematic review on the state of the art for metamorphic relations' generation, we summarize and highlight visions for further advancing the theory and techniques for identifying and constructing metamorphic relations, and discuss potential research trends in related areas. △ Less

Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: Accepted by International Workshop on Software Engineering in 2030

arXiv:2405.09965 [pdf, other]

doi 10.1145/3735553

Large Language Models for Automated Web-Form-Test Generation: An Empirical Study

Authors: Tao Li, Chenhui Cui, Rubing Huang, Dave Towey, Lei Ma

Abstract: Testing web forms is an essential activity for ensuring the quality of web applications. It typically involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in the tes… ▽ More Testing web forms is an essential activity for ensuring the quality of web applications. It typically involves evaluating the interactions between users and forms. Automated test-case generation remains a challenge for web-form testing: Due to the complex, multi-level structure of web pages, it can be difficult to automatically capture their inherent contextual information for inclusion in the tests. Large Language Models (LLMs) have shown great potential for contextual text generation. This motivated us to explore how they could generate automated tests for web forms, making use of the contextual information within form elements. To the best of our knowledge, no comparative study examining different LLMs has yet been reported for web-form-test generation. To address this gap in the literature, we conducted a comprehensive empirical study investigating the effectiveness of 11 LLMs on 146 web forms from 30 open-source Java web applications. In addition, we propose three HTML-structure-pruning methods to extract key contextual information. The experimental results show that different LLMs can achieve different testing effectiveness. Compared with GPT-4, the other LLMs had difficulty generating appropriate tests for the web forms: Their successfully-submitted rates (SSRs) decreased by 9.10% to 74.15%. Our findings also show that, for all LLMs, when the designed prompts include complete and clear contextual information about the web forms, more effective web-form tests were generated. Specifically, when using Parser-Processed HTML for Task Prompt (PH-P), the SSR averaged 70.63%, higher than the 60.21% for Raw HTML for Task Prompt (RH-P) and 50.27% for LLM-Processed HTML for Task Prompt (LH-P). Finally, this paper also highlights strategies for selecting LLMs based on performance metrics, and for optimizing the prompt design to improve the quality of the web-form tests. △ Less

Submitted 18 May, 2025; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: To be published in ACM Transactions on Software Engineering and Methodology

arXiv:2404.17587 [pdf, other]

Uncovering the Metaverse within Everyday Environments: a Coarse-to-Fine Approach

Authors: Liming Xu, Dave Towey, Andrew P. French, Steve Benford

Abstract: The recent release of the Apple Vision Pro has reignited interest in the metaverse, showcasing the intensified efforts of technology giants in developing platforms and devices to facilitate its growth. As the metaverse continues to proliferate, it is foreseeable that everyday environments will become increasingly saturated with its presence. Consequently, uncovering links to these metaverse items… ▽ More The recent release of the Apple Vision Pro has reignited interest in the metaverse, showcasing the intensified efforts of technology giants in developing platforms and devices to facilitate its growth. As the metaverse continues to proliferate, it is foreseeable that everyday environments will become increasingly saturated with its presence. Consequently, uncovering links to these metaverse items will be a crucial first step to interacting with this new augmented world. In this paper, we address the problem of establishing connections with virtual worlds within everyday environments, especially those that are not readily discernible through direct visual inspection. We introduce a vision-based approach leveraging Artcode visual markers to uncover hidden metaverse links embedded in our ambient surroundings. This approach progressively localises the access points to the metaverse, transitioning from coarse to fine localisation, thus facilitating an exploratory interaction process. Detailed experiments are conducted to study the performance of the proposed approach, demonstrating its effectiveness in Artcode localisation and enabling new interaction opportunities. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024) for publication. It includes around 5600 words, 11 pages, 15 figures, and 1 table

arXiv:2404.08948 [pdf, other]

Large Language Models for Mobile GUI Text Input Generation: An Empirical Study

Authors: Chenhui Cui, Tao Li, Junjie Wang, Chunyang Chen, Dave Towey, Rubing Huang

Abstract: Mobile applications have become an essential part of our daily lives, making ensuring their quality an important activity. Graphical User Interface (GUI) testing is a quality assurance method that has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to be able to m… ▽ More Mobile applications have become an essential part of our daily lives, making ensuring their quality an important activity. Graphical User Interface (GUI) testing is a quality assurance method that has frequently been used for mobile apps. When conducting GUI testing, it is important to generate effective text inputs for the text-input components. Some GUIs require these text inputs to be able to move from one page to the next: This can be a challenge to achieving complete UI exploration. Recently, Large Language Models (LLMs) have demonstrated excellent text-generation capabilities. To the best of our knowledge, there has not yet been any empirical study to evaluate different pre-trained LLMs' effectiveness at generating text inputs for mobile GUI testing. This paper reports on a large-scale empirical study that extensively investigates the effectiveness of nine state-of-the-art LLMs in Android text-input generation for UI pages. We collected 114 UI pages from 62 open-source Android apps and extracted contextual information from the UI pages to construct prompts for LLMs to generate text inputs. The experimental results show that some LLMs can generate more effective and higher-quality text inputs, achieving a 50.58% to 66.67% page-pass-through rate (PPTR). We also found that using more complete UI contextual information can increase the PPTRs of LLMs for generating text inputs. We conducted an experiment to evaluate the bug-detection capabilities of LLMs by directly generating invalid text inputs. We collected 37 real-world bugs related to text inputs. The results show that using LLMs to directly generate invalid text inputs for bug detection is insufficient: The bug-detection rates of the nine LLMs are all less than 23%. In addition, we also describe six insights gained regarding the use of LLMs for Android testing: These insights will benefit the Android testing community. △ Less

Submitted 26 February, 2025; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2311.08157 [pdf, other]

doi 10.1109/TSE.2024.3393419

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation

Authors: Zixiang Xian, Rubing Huang, Dave Towey, Chunrong Fang, Zhenyu Chen

Abstract: Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However, existing PTMs that operate on individual code tokens suffer from several limitations: They are costly to train and fine-tune; and they rely heavily on labeled data… ▽ More Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However, existing PTMs that operate on individual code tokens suffer from several limitations: They are costly to train and fine-tune; and they rely heavily on labeled data for fine-tuning on task-specific datasets. In this paper, we present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner. Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language. We also propose a novel data-augmentation technique called abstract syntax tree (AST) transformation, which applies syntactic and semantic transformations to the original code snippets, to generate more diverse and robust samples for contrastive learning. Our framework has several advantages over existing methods: (1) It is flexible and adaptable, because it can easily be extended to other downstream tasks that require code representation (such as code-clone detection and classification); (2) it is efficient and scalable, because it does not require a large model or a large amount of training data, and it can support any programming language; (3) it is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives; and (4) it can also adjust the number of encoder parameters based on computing resources. We evaluate our framework on several code-related tasks, and demonstrate its effectiveness and superiority over the state-of-the-art methods such as SourcererCC, Code2vec, and InferCode. △ Less

Submitted 23 April, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: To be published in IEEE Transactions on Software Engineering

arXiv:2309.06444 [pdf, other]

doi 10.1109/COMPSAC54236.2022.00063

Connecting Everyday Objects with the Metaverse: A Unified Recognition Framework

Authors: Liming Xu, Dave Towey, Andrew P. French, Steve Benford

Abstract: The recent Facebook rebranding to Meta has drawn renewed attention to the metaverse. Technology giants, amongst others, are increasingly embracing the vision and opportunities of a hybrid social experience that mixes physical and virtual interactions. As the metaverse gains in traction, it is expected that everyday objects may soon connect more closely with virtual elements. However, discovering t… ▽ More The recent Facebook rebranding to Meta has drawn renewed attention to the metaverse. Technology giants, amongst others, are increasingly embracing the vision and opportunities of a hybrid social experience that mixes physical and virtual interactions. As the metaverse gains in traction, it is expected that everyday objects may soon connect more closely with virtual elements. However, discovering this "hidden" virtual world will be a crucial first step to interacting with it in this new augmented world. In this paper, we address the problem of connecting physical objects with their virtual counterparts, especially through connections built upon visual markers. We propose a unified recognition framework that guides approaches to the metaverse access points. We illustrate the use of the framework through experimental studies under different conditions, in which an interactive and visually attractive decoration pattern, an Artcode, is used as the approach to enable the connection. This paper will be of interest to, amongst others, researchers working in Interaction Design or Augmented Reality who are seeking techniques or guidelines for augmenting physical objects in an unobtrusive, complementary manner. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: This paper includes 6 pages, 4 figures, and 1 table, and has been accepted to be published by the 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Los Alamitos, CA, USA

arXiv:2305.18865 [pdf, other]

Elongated Physiological Structure Segmentation via Spatial and Scale Uncertainty-aware Network

Authors: Yinglin Zhang, Ruiling Xi, Huazhu Fu, Dave Towey, RuiBin Bai, Risa Higashita, Jiang Liu

Abstract: Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous… ▽ More Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous regions and integrate hierarchical structure contexts. First, we estimate epistemic and aleatoric spatial uncertainty maps using Monte Carlo dropout to approximate Bayesian networks. Based on these spatial uncertainty maps, we propose the gated soft uncertainty-aware (GSUA) module to guide the model to focus on ambiguous regions. Second, we extract the uncertainty under different scales and propose the multi-scale uncertainty-aware (MSUA) fusion module to integrate structure contexts from hierarchical predictions, strengthening the final prediction. Finally, we visualize the uncertainty map of final prediction, providing interpretability for segmentation results. Experiment results show that the SSU-Net performs best on cornea endothelial cell and retinal vessel segmentation tasks. Moreover, compared with counterpart uncertainty-based methods, SSU-Net is more accurate and robust. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.17496 [pdf, ps, other]

doi 10.1109/TSE.2024.3379592

Toward Cost-effective Adaptive Random Testing: An Approximate Nearest Neighbor Approach

Authors: Rubing Huang, Chenhui Cui, Junlong Lian, Dave Towey, Weifeng Sun, Haibo Chen

Abstract: Adaptive Random Testing (ART) enhances the testing effectiveness (including fault-detection capability) of Random Testing (RT) by increasing the diversity of the random test cases throughout the input domain. Many ART algorithms have been investigated such as Fixed-Size-Candidate-Set ART (FSCS) and Restricted Random Testing (RRT), and have been widely used in many practical applications. Despite i… ▽ More Adaptive Random Testing (ART) enhances the testing effectiveness (including fault-detection capability) of Random Testing (RT) by increasing the diversity of the random test cases throughout the input domain. Many ART algorithms have been investigated such as Fixed-Size-Candidate-Set ART (FSCS) and Restricted Random Testing (RRT), and have been widely used in many practical applications. Despite its popularity, ART suffers from the problem of high computational costs during test-case generation, especially as the number of test cases increases. Although several strategies have been proposed to enhance the ART testing efficiency, such as the forgetting strategy and the k-dimensional tree strategy, these algorithms still face some challenges, including: (1) Although these algorithms can reduce the computation time, their execution costs are still very high, especially when the number of test cases is large; and (2) To achieve low computational costs, they may sacrifice some fault-detection capability. In this paper, we propose an approach based on Approximate Nearest Neighbors (ANNs), called Locality-Sensitive Hashing ART (LSH-ART). When calculating distances among different test inputs, LSH-ART identifies the approximate (not necessarily exact) nearest neighbors for candidates in an efficient way. LSH-ART attempts to balance ART testing effectiveness and efficiency. △ Less

Submitted 19 March, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: To be published in IEEE Transactions on Software Engineering

arXiv:2108.02694 [pdf, other]

Using Metamorphic Relations to Verify and Enhance Artcode Classification

Authors: Liming Xu, Dave Towey, Andrew French, Steve Benford, Zhi Quan Zhou, Tsong Yueh Chen

Abstract: Software testing is often hindered where it is impossible or impractical to determine the correctness of the behaviour or output of the software under test (SUT), a situation known as the oracle problem. An example of an area facing the oracle problem is automatic image classification, using machine learning to classify an input image as one of a set of predefined classes. An approach to software… ▽ More Software testing is often hindered where it is impossible or impractical to determine the correctness of the behaviour or output of the software under test (SUT), a situation known as the oracle problem. An example of an area facing the oracle problem is automatic image classification, using machine learning to classify an input image as one of a set of predefined classes. An approach to software testing that alleviates the oracle problem is metamorphic testing (MT). While traditional software testing examines the correctness of individual test cases, MT instead examines the relations amongst multiple executions of test cases and their outputs. These relations are called metamorphic relations (MRs): if an MR is found to be violated, then a fault must exist in the SUT. This paper examines the problem of classifying images containing visually hidden markers called Artcodes, and applies MT to verify and enhance the trained classifiers. This paper further examines two MRs, Separation and Occlusion, and reports on their capability in verifying the image classification using one-way analysis of variance (ANOVA) in conjunction with three other statistical analysis methods: t-test (for unequal variances), Kruskal-Wallis test, and Dunnett's test. In addition to our previously-studied classifier, that used Random Forests, we introduce a new classifier that uses a support vector machine, and present its MR-augmented version. Experimental evaluations across a number of performance metrics show that the augmented classifiers can achieve better performance than non-augmented classifiers. This paper also analyses how the enhanced performance is obtained. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 32 pages, 11 figures

arXiv:2105.06056 [pdf, other]

VPP-ART: An Efficient Implementation of Fixed-Size-Candidate-Set Adaptive Random Testing using Vantage Point Partitioning

Authors: Rubing Huang, Chenhui Cui, Dave Towey, Weifeng Sun, Junlong Lian

Abstract: Adaptive Random Testing (ART) is an enhancement of Random Testing (RT), and aims to improve the RT failure-detection effectiveness by distributing test cases more evenly in the input domain. Many ART algorithms have been proposed, with Fixed-Size-Candidate-Set ART (FSCS-ART) being one of the most effective and popular. FSCS-ART ensures high failure-detection effectiveness by selecting the next tes… ▽ More Adaptive Random Testing (ART) is an enhancement of Random Testing (RT), and aims to improve the RT failure-detection effectiveness by distributing test cases more evenly in the input domain. Many ART algorithms have been proposed, with Fixed-Size-Candidate-Set ART (FSCS-ART) being one of the most effective and popular. FSCS-ART ensures high failure-detection effectiveness by selecting the next test case as the candidate farthest from previously-executed test cases. Although FSCS-ART has good failure-detection effectiveness, it also faces some challenges, including heavy computational overheads. In this paper, we propose an enhanced version of FSCS-ART, Vantage Point Partitioning ART (VPP-ART). VPP-ART addresses the FSCS-ART computational overhead problem using vantage point partitioning, while maintaining the failure-detection effectiveness. VPP-ART partitions the input domain space using a modified Vantage Point tree (VP-tree) and finds the approximate nearest executed test cases of a candidate test case in the partitioned sub-domains -- thereby significantly reducing the time overheads compared with the searches required for FSCS-ART. To enable the FSCS-ART dynamic insertion process, we modify the traditional VP-tree to support dynamic data. The simulation results show that VPP-ART has a much lower time overhead compared to FSCS-ART, but also delivers similar (or better) failure-detection effectiveness, especially in the higher dimensional input domains. According to statistical analyses, VPP-ART can improve on the FSCS-ART failure-detection effectiveness by approximately 50% to 58%. VPP-ART also compares favorably with the KDFC-ART algorithms (a series of enhanced ART algorithms based on the KD-tree). Our experiments also show that VPP-ART is more cost-effective than FSCS-ART and KDFC-ART. △ Less

Submitted 6 December, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: We have polished the previous version, to remove some potential problems

arXiv:2105.05490 [pdf, other]

doi 10.1016/j.jss.2021.111008

SWFC-ART: A Cost-effective Approach for Fixed-Size-Candidate-Set Adaptive Random Testing through Small World Graphs

Authors: Muhammad Ashfaq, Rubing Huang, Dave Towey, Michael Omari, Dmitry Yashunin, Patrick Kwaku Kudjo, Tao Zhang

Abstract: Adaptive random testing (ART) improves the failure-detection effectiveness of random testing by leveraging properties of the clustering of failure-causing inputs of most faulty programs: ART uses a sampling mechanism that evenly spreads test cases within a software's input domain. The widely-used Fixed-Sized-Candidate-Set ART (FSCS-ART) sampling strategy faces a quadratic time cost, which worsens… ▽ More Adaptive random testing (ART) improves the failure-detection effectiveness of random testing by leveraging properties of the clustering of failure-causing inputs of most faulty programs: ART uses a sampling mechanism that evenly spreads test cases within a software's input domain. The widely-used Fixed-Sized-Candidate-Set ART (FSCS-ART) sampling strategy faces a quadratic time cost, which worsens as the dimensionality of the software input domain increases. In this paper, we propose an approach based on small world graphs that can enhance the computational efficiency of FSCS-ART: SWFC-ART. To efficiently perform nearest neighbor queries for candidate test cases, SWFC-ART incrementally constructs a hierarchical navigable small world graph for previously executed, non-failure-causing test cases. Moreover, SWFC-ART has shown consistency in programs with high dimensional input domains. Our simulation and empirical studies show that SWFC-ART reduces the computational overhead of FSCS-ART from quadratic to log-linear order while maintaining the failure-detection effectiveness of FSCS-ART, and remaining consistent in high dimensional input domains. We recommend using SWFC-ART in practical software testing scenarios, where real-life programs often have high dimensional input domains and low failure rates. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 26 Pages

ACM Class: D.2.5

arXiv:2007.03885 [pdf, ps, other]

A Survey on Adaptive Random Testing

Authors: Rubing Huang, Weifeng Sun, Yinyin Xu, Haibo Chen, Dave Towey, Xin Xia

Abstract: Random testing (RT) is a well-studied testing method that has been widely applied to the testing of many applications, including embedded software systems, SQL database systems, and Android applications. Adaptive random testing (ART) aims to enhance RT's failure-detection ability by more evenly spreading the test cases over the input domain. Since its introduction in 2001, there have been many con… ▽ More Random testing (RT) is a well-studied testing method that has been widely applied to the testing of many applications, including embedded software systems, SQL database systems, and Android applications. Adaptive random testing (ART) aims to enhance RT's failure-detection ability by more evenly spreading the test cases over the input domain. Since its introduction in 2001, there have been many contributions to the development of ART, including various approaches, implementations, assessment and evaluation methods, and applications. This paper provides a comprehensive survey on ART, classifying techniques, summarizing application areas, and analyzing experimental evaluations. This paper also addresses some misconceptions about ART, and identifies open research challenges to be further investigated in the future work. △ Less

Submitted 14 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

arXiv:2007.00370 [pdf, other]

Regression Test Case Prioritization by Code Combinations Coverage

Authors: Rubing Huang, Quanjun Zhang, Dave Towey, Weifeng Sun, Jinfu Chen

Abstract: Regression test case prioritization (RTCP) aims to improve the rate of fault detection by executing more important test cases as early as possible. Various RTCP techniques have been proposed based on different coverage criteria. Among them, a majority of techniques leverage code coverage information to guide the prioritization process, with code units being considered individually, and in isolatio… ▽ More Regression test case prioritization (RTCP) aims to improve the rate of fault detection by executing more important test cases as early as possible. Various RTCP techniques have been proposed based on different coverage criteria. Among them, a majority of techniques leverage code coverage information to guide the prioritization process, with code units being considered individually, and in isolation. In this paper, we propose a new coverage criterion, code combinations coverage, that combines the concepts of code coverage and combination coverage. We apply this coverage criterion to RTCP, as a new prioritization technique, code combinations coverage based prioritization (CCCP). We report on empirical studies conducted to compare the testing effectiveness and efficiency of CCCP with four popular RTCP techniques: total, additional, adaptive random, and search-based test prioritization. The experimental results show that even when the lowest combination strength is assigned, overall, the CCCP fault detection rates are greater than those of the other four prioritization techniques. The CCCP prioritization costs are also found to be comparable to the additional test prioritization technique. Moreover, our results also show that when the combination strength is increased, CCCP provides higher fault detection rates than the state-of-the-art, regardless of the levels of code coverage. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Showing 1–19 of 19 results for author: Towey, D