Search | arXiv e-print repository

Unit Testing Past vs. Present: Examining LLMs' Impact on Defect Detection and Efficiency

Authors: Rudolf Ramler, Philipp Straubinger, Reinhold Plösch, Dietmar Winkler

Abstract: The integration of Large Language Models (LLMs), such as ChatGPT and GitHub Copilot, into software engineering workflows has shown potential to enhance productivity, particularly in software testing. This paper investigates whether LLM support improves defect detection effectiveness during unit testing. Building on prior studies comparing manual and tool-supported testing, we replicated and extend… ▽ More The integration of Large Language Models (LLMs), such as ChatGPT and GitHub Copilot, into software engineering workflows has shown potential to enhance productivity, particularly in software testing. This paper investigates whether LLM support improves defect detection effectiveness during unit testing. Building on prior studies comparing manual and tool-supported testing, we replicated and extended an experiment where participants wrote unit tests for a Java-based system with seeded defects within a time-boxed session, supported by LLMs. Comparing LLM supported and manual testing, results show that LLM support significantly increases the number of unit tests generated, defect detection rates, and overall testing efficiency. These findings highlight the potential of LLMs to improve testing and defect detection outcomes, providing empirical insights into their practical application in software testing. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2408.07816 [pdf, other]

doi 10.1109/SEAA64295.2024.00062

How Industry Tackles Anomalies during Runtime: Approaches and Key Monitoring Parameters

Authors: Monika Steidl, Benedikt Dornauer, Michael Felderer, Rudolf Ramler, Mircea-Cristian Racasan, Marko Gattringer

Abstract: Deviations from expected behavior during runtime, known as anomalies, have become more common due to the systems' complexity, especially for microservices. Consequently, analyzing runtime monitoring data, such as logs, traces for microservices, and metrics, is challenging due to the large volume of data collected. Developing effective rules or AI algorithms requires a deep understanding of this da… ▽ More Deviations from expected behavior during runtime, known as anomalies, have become more common due to the systems' complexity, especially for microservices. Consequently, analyzing runtime monitoring data, such as logs, traces for microservices, and metrics, is challenging due to the large volume of data collected. Developing effective rules or AI algorithms requires a deep understanding of this data to reliably detect unforeseen anomalies. This paper seeks to comprehend anomalies and current anomaly detection approaches across diverse industrial sectors. Additionally, it aims to pinpoint the parameters necessary for identifying anomalies via runtime monitoring data. Therefore, we conducted semi-structured interviews with fifteen industry participants who rely on anomaly detection during runtime. Additionally, to supplement information from the interviews, we performed a literature review focusing on anomaly detection approaches applied to industrial real-life datasets. Our paper (1) demonstrates the diversity of interpretations and examples of software anomalies during runtime and (2) explores the reasons behind choosing rule-based approaches in the industry over self-developed AI approaches. AI-based approaches have become prominent in published industry-related papers in the last three years. Furthermore, we (3) identified key monitoring parameters collected during runtime (logs, traces, and metrics) that assist practitioners in detecting anomalies during runtime without introducing bias in their anomaly detection approach due to inconclusive parameters. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: accepted at 2024 50th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

arXiv:2407.04017 [pdf, other]

Contemporary Software Modernization: Perspectives and Challenges to Deal with Legacy Systems

Authors: Wesley K. G. Assunção, Luciano Marchezan, Alexander Egyed, Rudolf Ramler

Abstract: Software modernization is an inherent activity of software engineering, as technology advances and systems inevitably become outdated. The term "software modernization" emerged as a research topic in the early 2000s, with a differentiation from traditional software evolution. Studies on this topic became popular due to new programming paradigms, technologies, and architectural styles. Given the pe… ▽ More Software modernization is an inherent activity of software engineering, as technology advances and systems inevitably become outdated. The term "software modernization" emerged as a research topic in the early 2000s, with a differentiation from traditional software evolution. Studies on this topic became popular due to new programming paradigms, technologies, and architectural styles. Given the pervasive nature of software today, modernizing legacy systems is paramount to provide users with competitive and innovative products and services. Despite the large amount of work available in the literature, there are significant limitations: (i) proposed approaches are strictly specific to one scenario or technology, lacking flexibility; (ii) most of the proposed approaches are not aligned with the current modern software development scenario; and (iii) due to a myriad of proposed modernization approaches, practitioners may be misguided on how to modernize legacies. In this work, our goal is to call attention to the need for advances in research and practices toward a well-defined software modernization domain. The focus is on enabling organizations to preserve the knowledge represented in legacy systems while taking advantages of disruptive and emerging technologies. Based on this goal, we put the different perspectives of software modernization in the context of contemporary software development. We also present a research agenda with 10 challenges to motivate new studies. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2403.16639 [pdf, other]

doi 10.1007/s10664-023-10390-z

Investigating the Readability of Test Code: Combining Scientific and Practical Views

Authors: Dietmar Winkler, Pirmin Urbanke, Rudolf Ramler

Abstract: The readability of source code is key for understanding and maintaining software systems and tests. Several studies investigate the readability of source code, but there is limited research on the readability of test code and related influence factors. We investigate the factors that influence the readability of test code from an academic perspective complemented by practical views. First, we perf… ▽ More The readability of source code is key for understanding and maintaining software systems and tests. Several studies investigate the readability of source code, but there is limited research on the readability of test code and related influence factors. We investigate the factors that influence the readability of test code from an academic perspective complemented by practical views. First, we perform a Systematic Mapping Study (SMS) with a focus on scientific literature. Second, we extend this study by reviewing grey literature sources for practical aspects on test code readability and understandability. Finally, we conduct a controlled experiment on the readability of a selected set of test cases to collect additional knowledge on influence factors discussed in practice. The result set of the SMS includes 19 primary studies from the scientific literature. The grey literature search reveals 62 sources for information on test code readability. Based on an analysis of these sources, we identified a combined set of 14 factors that influence the readability of test code. 7 of these factors were found in scientific and grey literature, while some factors were mainly discussed in academia (2) or industry (5) with limited overlap. The controlled experiment on practically relevant influence factors showed that the investigated factors have a significant impact on readability for half of the selected test cases. Our review of scientific and grey literature showed that test code readability is of interest for academia and industry with a consensus on key influence factors. However, we also found factors only discussed by practitioners. For some of these factors we were able to confirm an impact on readability in a first experiment. Therefore, we see the need to bring together academic and industry viewpoints to achieve a common view on the readability of software test code. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Journal ref: Empir Software Eng 29, 53 (2024)

arXiv:2309.07067 [pdf, other]

Data Pipeline Quality: Influencing Factors, Root Causes of Data-related Issues, and Processing Problem Areas for Developers

Authors: Harald Foidl, Valentina Golendukhina, Rudolf Ramler, Michael Felderer

Abstract: Data pipelines are an integral part of various modern data-driven systems. However, despite their importance, they are often unreliable and deliver poor-quality data. A critical step toward improving this situation is a solid understanding of the aspects contributing to the quality of data pipelines. Therefore, this article first introduces a taxonomy of 41 factors that influence the ability of da… ▽ More Data pipelines are an integral part of various modern data-driven systems. However, despite their importance, they are often unreliable and deliver poor-quality data. A critical step toward improving this situation is a solid understanding of the aspects contributing to the quality of data pipelines. Therefore, this article first introduces a taxonomy of 41 factors that influence the ability of data pipelines to provide quality data. The taxonomy is based on a multivocal literature review and validated by eight interviews with experts from the data engineering domain. Data, infrastructure, life cycle management, development & deployment, and processing were found to be the main influencing themes. Second, we investigate the root causes of data-related issues, their location in data pipelines, and the main topics of data pipeline processing issues for developers by mining GitHub projects and Stack Overflow posts. We found data-related issues to be primarily caused by incorrect data types (33%), mainly occurring in the data cleaning stage of pipelines (35%). Data integration and ingestion tasks were found to be the most asked topics of developers, accounting for nearly half (47%) of all questions. Compatibility issues were found to be a separate problem area in addition to issues corresponding to the usual data pipeline processing areas (i.e., data loading, ingestion, integration, cleaning, and transformation). These findings suggest that future research efforts should focus on analyzing compatibility and data type issues in more depth and assisting developers in data integration and ingestion tasks. The proposed taxonomy is valuable to practitioners in the context of quality assurance activities and fosters future research into data pipeline quality. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: To be published by The Journal of Systems & Software

arXiv:2307.15527 [pdf, other]

doi 10.1109/SEAA60479.2023.00058

Towards Automatic Generation of Amplified Regression Test Oracles

Authors: Alejandra Duque-Torres, Claus Klammer, Dietmar Pfahl, Stefan Fischer, Rudolf Ramler

Abstract: Regression testing is crucial in ensuring that pure code refactoring does not adversely affect existing software functionality, but it can be expensive, accounting for half the cost of software maintenance. Automated test case generation reduces effort but may generate weak test suites. Test amplification is a promising solution that enhances tests by generating additional or improving existing on… ▽ More Regression testing is crucial in ensuring that pure code refactoring does not adversely affect existing software functionality, but it can be expensive, accounting for half the cost of software maintenance. Automated test case generation reduces effort but may generate weak test suites. Test amplification is a promising solution that enhances tests by generating additional or improving existing ones, increasing test coverage, but it faces the test oracle problem. To address this, we propose a test oracle derivation approach that uses object state data produced during System Under Test (SUT) test execution to amplify regression test oracles. The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour. Our preliminary evaluation shows that the proposed approach can enhance the detection of behaviour changes substantially, providing initial evidence of its effectiveness. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 8 pages, 1 figure

ACM Class: D.2

Journal ref: SEAA 2023

arXiv:2305.10910 [pdf, other]

doi 10.1109/MOBILSoft59058.2023.00010

Analysis of Library Dependency Networks of Package Managers Used in iOS Development

Authors: Kristiina Rahkema, Dietmar Pfahl, Rudolf Ramler

Abstract: Reusing existing solutions in the form of third-party libraries is common practice when writing software. Package managers are used to manage dependencies to third-party libraries by automating the process of installing and updating the libraries. Library dependencies themselves can have dependencies to other libraries creating a dependency network with several levels of indirections. The library… ▽ More Reusing existing solutions in the form of third-party libraries is common practice when writing software. Package managers are used to manage dependencies to third-party libraries by automating the process of installing and updating the libraries. Library dependencies themselves can have dependencies to other libraries creating a dependency network with several levels of indirections. The library dependency network in the Swift ecosystem encompasses libraries from CocoaPods, Carthage and Swift Package Manager (PM). These package managers are used when developing, for example, iOS or Mac OS applications in Swift and Objective-C. We provide the first analysis of the library dependency network evolution in the Swift ecosystem. Although CocoaPods is the package manager with the biggest set of libraries, the difference to other package managers is not as big as expected. The youngest package manager and official package manager for Swift, Swift PM, is becoming more and more popular, resulting in a gradual slow-down of the growth of the other two package managers. When analyzing direct and transitive dependencies, we found that the mean total number of dependencies is lower in the Swift ecosystem compared to many other ecosystems. Still, the total number of dependencies shows a clear growing trend over the last five years. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: 5 pages, 4 figures

ACM Class: D.2.0

arXiv:2301.09001 [pdf, other]

doi 10.1016/j.jss.2023.111615

The Pipeline for the Continuous Development of Artificial Intelligence Models -- Current State of Research and Practice

Authors: Monika Steidl, Michael Felderer, Rudolf Ramler

Abstract: Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Re… ▽ More Companies struggle to continuously develop and deploy AI models to complex production systems due to AI characteristics while assuring quality. To ease the development process, continuous pipelines for AI have become an active research area where consolidated and in-depth analysis regarding the terminology, triggers, tasks, and challenges is required. This paper includes a Multivocal Literature Review where we consolidated 151 relevant formal and informal sources. In addition, nine-semi structured interviews with participants from academia and industry verified and extended the obtained information. Based on these sources, this paper provides and compares terminologies for DevOps and CI/CD for AI, MLOps, (end-to-end) lifecycle management, and CD4ML. Furthermore, the paper provides an aggregated list of potential triggers for reiterating the pipeline, such as alert systems or schedules. In addition, this work uses a taxonomy creation strategy to present a consolidated pipeline comprising tasks regarding the continuous development of AI. This pipeline consists of four stages: Data Handling, Model Learning, Software Development and System Operations. Moreover, we map challenges regarding pipeline implementation, adaption, and usage for the continuous development of AI to these four stages. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: accepted in the Journal Systems and Software

arXiv:2205.15780 [pdf, other]

doi 10.1109/SANER53432.2022.00088

A Replication Study on Predicting Metamorphic Relations at Unit Testing Level

Authors: Alejandra Duque-Torres, Dietmar Pfahl, Rudolf Ramler, Claus Klammer

Abstract: Metamorphic Testing (MT) addresses the test oracle problem by examining the relations between inputs and outputs of test executions. Such relations are known as Metamorphic Relations (MRs). In current practice, identifying and selecting suitable MRs is usually a challenging manual task, requiring a thorough grasp of the SUT and its application domain. Thus, Kanewala et al. proposed the Predicting… ▽ More Metamorphic Testing (MT) addresses the test oracle problem by examining the relations between inputs and outputs of test executions. Such relations are known as Metamorphic Relations (MRs). In current practice, identifying and selecting suitable MRs is usually a challenging manual task, requiring a thorough grasp of the SUT and its application domain. Thus, Kanewala et al. proposed the Predicting Metamorphic Relations (PMR) approach to automatically suggest MRs from a list of six pre-defined MRs for testing newly developed methods. PMR is based on a classification model trained on features extracted from the control-flow graph (CFG) of 100 Java methods. In our replication study, we explore the generalizability of PMR. First, we rebuild the entire preprocessing and training pipeline and repeat the original study in a close replication to verify the reported results and establish the basis for further experiments. Second, we perform a conceptual replication to explore the reusability of the PMR model trained on CFGs from Java methods in the first step for functionally identical methods implemented in Python and C++. Finally, we retrain the model on the CFGs from the Python and C++ methods to investigate the dependence on programming language and implementation details. We were able to successfully replicate the original study achieving comparable results for the Java methods set. However, the prediction performance of the Java-based classifiers significantly decreases when applied to functionally equivalent Python and C++ methods despite using only CFG features to abstract from language details. Since the performance improved again when the classifiers were retrained on the CFGs of the methods written in Python and C++, we conclude that the PMR approach can be generalized, but only when classifiers are developed starting from code artefacts in the used programming language. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 11 pages, 10 tables, 2 figures

ACM Class: D.2.5

Journal ref: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

arXiv:2203.10384 [pdf, other]

Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems

Authors: Harald Foidl, Michael Felderer, Rudolf Ramler

Abstract: High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-b… ▽ More High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets. △ Less

Submitted 16 June, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

arXiv:2102.05351 [pdf, other]

doi 10.1007/978-3-030-65854-0_3

Quality Assurance for AI-based Systems: Overview and Challenges

Authors: Michael Felderer, Rudolf Ramler

Abstract: The number and importance of AI-based systems in all domains is growing. With the pervasive use and the dependence on AI-based systems, the quality of these systems becomes essential for their practical usage. However, quality assurance for AI-based systems is an emerging area that has not been well explored and requires collaboration between the SE and AI research communities. This paper discusse… ▽ More The number and importance of AI-based systems in all domains is growing. With the pervasive use and the dependence on AI-based systems, the quality of these systems becomes essential for their practical usage. However, quality assurance for AI-based systems is an emerging area that has not been well explored and requires collaboration between the SE and AI research communities. This paper discusses terminology and challenges on quality assurance for AI-based systems to set a baseline for that purpose. Therefore, we define basic concepts and characterize AI-based systems along the three dimensions of artifact type, process, and quality characteristics. Furthermore, we elaborate on the key challenges of (1) understandability and interpretability of AI models, (2) lack of specifications and defined requirements, (3) need for validation data and test input generation, (4) defining expected outcomes as test oracles, (5) accuracy and correctness measures, (6) non-functional properties of AI-based systems, (7) self-adaptive and self-learning characteristics, and (8) dynamic and frequently changing environments. △ Less

Submitted 10 February, 2021; originally announced February 2021.

arXiv:1912.07936 [pdf, other]

Probabilistic Software Modeling: A Data-driven Paradigm for Software Analysis

Authors: Hannes Thaller, Lukas Linsbauer, Rudolf Ramler, Alexander Egyed

Abstract: Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling p… ▽ More Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling paradigm for predictive and generative methods in software engineering. PSM analyzes a program and synthesizes a network of probabilistic models that can simulate and quantify the original program's behavior. The approach extracts the type, executable, and property structure of a program and copies its topology. Each model is then optimized towards the observed runtime leading to a network that reflects the system's structure and behavior. The resulting network allows for the full spectrum of statistical inferential analysis with which rich predictive and generative applications can be built. Applications range from the visualization of states, inferential queries, test case generation, and anomaly detection up to the stochastic execution of the modeled system. In this work, we present the modeling methodologies, an empirical study of the runtime behavior of software systems, and a comprehensive study on PSM modeled systems. Results indicate that PSM is a solid foundation for structural and behavioral software comprehension applications. △ Less

Submitted 18 December, 2019; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: 10 pages, 7 figures, 4 tables, 64 references, closed source until full publication

arXiv:1911.06594 [pdf, other]

doi 10.1145/3360664.3362698

Integrating Threat Modeling and Automated Test Case Generation into Industrialized Software Security Testing

Authors: Stefan Marksteiner, Rudolf Ramler, Hannes Sochor

Abstract: Industrial Internet of Things (IIoT) application provide a whole new set of possibilities to drive efficiency of industrial production forward. However, with the higher degree of integration among systems, comes a plethora of newthreats to the latter, as they are not yet designed to be broadly reachable and interoperable. To mitigate these vast amount of new threats, systematic and automated test… ▽ More Industrial Internet of Things (IIoT) application provide a whole new set of possibilities to drive efficiency of industrial production forward. However, with the higher degree of integration among systems, comes a plethora of newthreats to the latter, as they are not yet designed to be broadly reachable and interoperable. To mitigate these vast amount of new threats, systematic and automated test methods are necessary. This comprehensiveness can be achieved by thorough threat modeling. In order to automate security test, we present an approach to automate the testing process from threat modeling onward, closing the gap between threat modeling and automated test case generation. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: 3 pages, 1 figure, Central European Cybersecurity Conference 2019 (CECC2019), Munich

ACM Class: D.2.4; D.2.5

arXiv:1706.03934 [pdf, other]

doi 10.1109/ETFA.2017.8247574

Exploring Code Clones in Programmable Logic Controller Software

Authors: Hannes Thaller, Rudolf Ramler, Josef Pichler, Alexander Egyed

Abstract: The reuse of code fragments by copying and pasting is widely practiced in software development and results in code clones. Cloning is considered an anti-pattern as it negatively affects program correctness and increases maintenance efforts. Programmable Logic Controller (PLC) software is no exception in the code clone discussion as reuse in development and maintenance is frequently achieved throug… ▽ More The reuse of code fragments by copying and pasting is widely practiced in software development and results in code clones. Cloning is considered an anti-pattern as it negatively affects program correctness and increases maintenance efforts. Programmable Logic Controller (PLC) software is no exception in the code clone discussion as reuse in development and maintenance is frequently achieved through copy, paste, and modification. Even though the presence of code clones may not necessary be a problem per se, it is important to detect, track and manage clones as the software system evolves. Unfortunately, tool support for clone detection and management is not commonly available for PLC software systems or limited to generic tools with a reduced set of features. In this paper, we investigate code clones in a real-world PLC software system based on IEC 61131-3 Structured Text and C/C++. We extended a widely used tool for clone detection with normalization support. Furthermore, we evaluated the different types and natures of code clones in the studied system and their relevance for refactoring. Results shed light on the applicability and usefulness of clone detection in the context of industrial automation systems and it demonstrates the benefit of adapting detection and management tools for IEC 611313-3 languages. △ Less

Submitted 13 June, 2017; originally announced June 2017.

Comments: 8 pages, 2 figures, 2 tables, COMET Center SCCH (FFG #844597), etfa2018

Report number: FFG #844597

Showing 1–14 of 14 results for author: Ramler, R