Search | arXiv e-print repository

Unit Testing Past vs. Present: Examining LLMs' Impact on Defect Detection and Efficiency

Authors: Rudolf Ramler, Philipp Straubinger, Reinhold Plösch, Dietmar Winkler

Abstract: The integration of Large Language Models (LLMs), such as ChatGPT and GitHub Copilot, into software engineering workflows has shown potential to enhance productivity, particularly in software testing. This paper investigates whether LLM support improves defect detection effectiveness during unit testing. Building on prior studies comparing manual and tool-supported testing, we replicated and extend… ▽ More The integration of Large Language Models (LLMs), such as ChatGPT and GitHub Copilot, into software engineering workflows has shown potential to enhance productivity, particularly in software testing. This paper investigates whether LLM support improves defect detection effectiveness during unit testing. Building on prior studies comparing manual and tool-supported testing, we replicated and extended an experiment where participants wrote unit tests for a Java-based system with seeded defects within a time-boxed session, supported by LLMs. Comparing LLM supported and manual testing, results show that LLM support significantly increases the number of unit tests generated, defect detection rates, and overall testing efficiency. These findings highlight the potential of LLMs to improve testing and defect detection outcomes, providing empirical insights into their practical application in software testing. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2403.16639 [pdf, other]

doi 10.1007/s10664-023-10390-z

Investigating the Readability of Test Code: Combining Scientific and Practical Views

Authors: Dietmar Winkler, Pirmin Urbanke, Rudolf Ramler

Abstract: The readability of source code is key for understanding and maintaining software systems and tests. Several studies investigate the readability of source code, but there is limited research on the readability of test code and related influence factors. We investigate the factors that influence the readability of test code from an academic perspective complemented by practical views. First, we perf… ▽ More The readability of source code is key for understanding and maintaining software systems and tests. Several studies investigate the readability of source code, but there is limited research on the readability of test code and related influence factors. We investigate the factors that influence the readability of test code from an academic perspective complemented by practical views. First, we perform a Systematic Mapping Study (SMS) with a focus on scientific literature. Second, we extend this study by reviewing grey literature sources for practical aspects on test code readability and understandability. Finally, we conduct a controlled experiment on the readability of a selected set of test cases to collect additional knowledge on influence factors discussed in practice. The result set of the SMS includes 19 primary studies from the scientific literature. The grey literature search reveals 62 sources for information on test code readability. Based on an analysis of these sources, we identified a combined set of 14 factors that influence the readability of test code. 7 of these factors were found in scientific and grey literature, while some factors were mainly discussed in academia (2) or industry (5) with limited overlap. The controlled experiment on practically relevant influence factors showed that the investigated factors have a significant impact on readability for half of the selected test cases. Our review of scientific and grey literature showed that test code readability is of interest for academia and industry with a consensus on key influence factors. However, we also found factors only discussed by practitioners. For some of these factors we were able to confirm an impact on readability in a first experiment. Therefore, we see the need to bring together academic and industry viewpoints to achieve a common view on the readability of software test code. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Journal ref: Empir Software Eng 29, 53 (2024)

arXiv:2208.04343 [pdf, other]

EFI: A Toolbox for Feature Importance Fusion and Interpretation in Python

Authors: Aayush Kumar, Jimiama Mafeni Mase, Divish Rengasamy, Benjamin Rothwell, Mercedes Torres Torres, David A. Winkler, Grazziela P. Figueredo

Abstract: This paper presents an open-source Python toolbox called Ensemble Feature Importance (EFI) to provide machine learning (ML) researchers, domain experts, and decision makers with robust and accurate feature importance quantification and more reliable mechanistic interpretation of feature importance for prediction problems using fuzzy sets. The toolkit was developed to address uncertainties in featu… ▽ More This paper presents an open-source Python toolbox called Ensemble Feature Importance (EFI) to provide machine learning (ML) researchers, domain experts, and decision makers with robust and accurate feature importance quantification and more reliable mechanistic interpretation of feature importance for prediction problems using fuzzy sets. The toolkit was developed to address uncertainties in feature importance quantification and lack of trustworthy feature importance interpretation due to the diverse availability of machine learning algorithms, feature importance calculation methods, and dataset dependencies. EFI merges results from multiple machine learning models with different feature importance calculation approaches using data bootstrapping and decision fusion techniques, such as mean, majority voting and fuzzy logic. The main attributes of the EFI toolbox are: (i) automatic optimisation of ML algorithms, (ii) automatic computation of a set of feature importance coefficients from optimised ML algorithms and feature importance calculation techniques, (iii) automatic aggregation of importance coefficients using multiple decision fusion techniques, and (iv) fuzzy membership functions that show the importance of each feature to the prediction task. The key modules and functions of the toolbox are described, and a simple example of their application is presented using the popular Iris dataset. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 16 pages, 5 tables, 9 figures

arXiv:2110.11713 [pdf, other]

Mechanistic Interpretation of Machine Learning Inference: A Fuzzy Feature Importance Fusion Approach

Authors: Divish Rengasamy, Jimiama M. Mase, Mercedes Torres Torres, Benjamin Rothwell, David A. Winkler, Grazziela P. Figueredo

Abstract: With the widespread use of machine learning to support decision-making, it is increasingly important to verify and understand the reasons why a particular output is produced. Although post-training feature importance approaches assist this interpretation, there is an overall lack of consensus regarding how feature importance should be quantified, making explanations of model predictions unreliable… ▽ More With the widespread use of machine learning to support decision-making, it is increasingly important to verify and understand the reasons why a particular output is produced. Although post-training feature importance approaches assist this interpretation, there is an overall lack of consensus regarding how feature importance should be quantified, making explanations of model predictions unreliable. In addition, many of these explanations depend on the specific machine learning approach employed and on the subset of data used when calculating feature importance. A possible solution to improve the reliability of explanations is to combine results from multiple feature importance quantifiers from different machine learning approaches coupled with re-sampling. Current state-of-the-art ensemble feature importance fusion uses crisp techniques to fuse results from different approaches. There is, however, significant loss of information as these approaches are not context-aware and reduce several quantifiers to a single crisp output. More importantly, their representation of 'importance' as coefficients is misleading and incomprehensible to end-users and decision makers. Here we show how the use of fuzzy data fusion methods can overcome some of the important limitations of crisp fusion methods. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 12 pages, 11 figures, 8 tables

arXiv:2109.11435 [pdf]

doi 10.1109/TSE.2021.3099532.

What Makes Agile Software Development Agile?

Authors: Marco Kuhrmann, Paolo Tell, Regina Hebig, Jil Klünder, Jürgen Münch, Oliver Linssen, Dietmar Pfahl, Michael Felderer, Christian R. Prause, Stephen G. MacDonell, Joyce Nakatumba-Nabende, David Raffo, Sarah Beecham, Eray Tüzün, Gustavo López, Nicolas Paez, Diego Fontdevila, Sherlock A. Licorish, Steffen Küpper, Günther Ruhe, Eric Knauss, Özden Özcan-Top, Paul Clarke, Fergal McCaffery, Marcela Genero , et al. (22 additional authors not shown)

Abstract: Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by… ▽ More Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by their creator(s), software processes in practice mutate into hybrids over time. Are these still agile? In this article, we investigate the question: what makes a software development method agile? We present an empirical study grounded in a large-scale international survey that aims to identify software development methods and practices that improve or tame agility. Based on 556 data points, we analyze the perceived degree of agility in the implementation of standard project disciplines and its relation to used development methods and practices. Our findings suggest that only a small number of participants operate their projects in a purely traditional or agile manner (under 15%). That said, most project disciplines and most practices show a clear trend towards increasing degrees of agility. Compared to the methods used to develop software, the selection of practices has a stronger effect on the degree of agility of a given discipline. Finally, there are no methods or practices that explicitly guarantee or prevent agility. We conclude that agility cannot be defined solely at the process level. Additional factors need to be taken into account when trying to implement or improve agility in a software company. Finally, we discuss the field of software process-related research in the light of our findings and present a roadmap for future research. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: Journal paper, 17 pages, 14 figures

Journal ref: IEEE Transactions on Software Engineering (2021), pp.TBC

arXiv:2004.12819 [pdf, other]

Detecting and Tracking Communal Bird Roosts in Weather Radar Data

Authors: Zezhou Cheng, Saadia Gabriel, Pankaj Bhambhani, Daniel Sheldon, Subhransu Maji, Andrew Laughlin, David Winkler

Abstract: The US weather radar archive holds detailed information about biological phenomena in the atmosphere over the last 20 years. Communally roosting birds congregate in large numbers at nighttime roosting locations, and their morning exodus from the roost is often visible as a distinctive pattern in radar images. This paper describes a machine learning system to detect and track roost signatures in we… ▽ More The US weather radar archive holds detailed information about biological phenomena in the atmosphere over the last 20 years. Communally roosting birds congregate in large numbers at nighttime roosting locations, and their morning exodus from the roost is often visible as a distinctive pattern in radar images. This paper describes a machine learning system to detect and track roost signatures in weather radar data. A significant challenge is that labels were collected opportunistically from previous research studies and there are systematic differences in labeling style. We contribute a latent variable model and EM algorithm to learn a detection model together with models of labeling styles for individual annotators. By properly accounting for these variations we learn a significantly more accurate detector. The resulting system detects previously unknown roosting locations and provides comprehensive spatio-temporal data about roosts across the US. This data will provide biologists important information about the poorly understood phenomena of broad-scale habitat use and movements of communally roosting birds during the non-breeding season. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: 9 pages, 6 figures, AAAI 2020 (AI for Social Impact Track)

arXiv:1911.11559 [pdf, other]

Impressive computational acceleration by using machine learning for 2-dimensional super-lubricant materials discovery

Authors: Marco Fronzi, Mutaz Abu Ghazaleh, Olexandr Isayev, David A. Winkler, Joe Shapter, Michael J. Ford

Abstract: The screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first-principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state-of-the-art computational resources. Additionally, they can be often extremely time consuming. We… ▽ More The screening of novel materials is an important topic in the field of materials science. Although traditional computational modeling, especially first-principles approaches, is a very useful and accurate tool to predict the properties of novel materials, it still demands extensive and expensive state-of-the-art computational resources. Additionally, they can be often extremely time consuming. We describe a time and resource-efficient machine learning approach to create a large dataset of structural properties of van der Waals layered structures. In particular, we focus on the interlayer energy and the elastic constant of layered materials composed of two different 2-dimensional (2D) structures, that are important for novel solid lubricant and super-lubricant materials. We show that machine learning models can recapitulate results of computationally expansive approaches (i.e. density functional theory) with high accuracy. △ Less

Submitted 29 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

arXiv:1901.05704 [pdf, other]

doi 10.1038/s42256-018-0009-9

Evolving embodied intelligence from materials to machines

Authors: David Howard, Agoston E. Eiben, Danielle Frances Kennedy, Jean-Baptiste Mouret, Philip Valencia, Dave Winkler

Abstract: Natural lifeforms specialise to their environmental niches across many levels; from low-level features such as DNA and proteins, through to higher-level artefacts including eyes, limbs, and overarching body plans. We propose Multi-Level Evolution (MLE), a bottom-up automatic process that designs robots across multiple levels and niches them to tasks and environmental conditions. MLE concurrently e… ▽ More Natural lifeforms specialise to their environmental niches across many levels; from low-level features such as DNA and proteins, through to higher-level artefacts including eyes, limbs, and overarching body plans. We propose Multi-Level Evolution (MLE), a bottom-up automatic process that designs robots across multiple levels and niches them to tasks and environmental conditions. MLE concurrently explores constituent molecular and material 'building blocks', as well as their possible assemblies into specialised morphological and sensorimotor configurations. MLE provides a route to fully harness a recent explosion in available candidate materials and ongoing advances in rapid manufacturing processes. We outline a feasible MLE architecture that realises this vision, highlight the main roadblocks and how they may be overcome, and show robotic applications to which MLE is particularly suited. By forming a research agenda to stimulate discussion between researchers in related fields, we hope to inspire the pursuit of multi-level robotic design all the way from material to machine. △ Less

Submitted 17 January, 2019; originally announced January 2019.

Journal ref: Nature Machine Intelligence. Vol. 1, Number 1, pages 12--19. 2019

arXiv:1805.07951 [pdf, other]

doi 10.1145/3306607

Status Quo in Requirements Engineering: A Theory and a Global Family of Surveys

Authors: Stefan Wagner, Daniel Méndez Fernández, Michael Felderer, Antonio Vetró, Marcos Kalinowski, Roel Wieringa, Dietmar Pfahl, Tayana Conte, Marie-Therese Christiansson, Desmond Greer, Casper Lassenius, Tomi Männistö, Maleknaz Nayebi, Markku Oivo, Birgit Penzenstadler, Rafael Prikladnicki, Guenther Ruhe, André Schekelmann, Sagar Sen, Rodrigo Spínola, Ahmed Tuzcu, Jose Luis de la Vara, Dietmar Winkler

Abstract: Requirements Engineering (RE) has established itself as a software engineering discipline during the past decades. While researchers have been investigating the RE discipline with a plethora of empirical studies, attempts to systematically derive an empirically-based theory in context of the RE discipline have just recently been started. However, such a theory is needed if we are to define and mot… ▽ More Requirements Engineering (RE) has established itself as a software engineering discipline during the past decades. While researchers have been investigating the RE discipline with a plethora of empirical studies, attempts to systematically derive an empirically-based theory in context of the RE discipline have just recently been started. However, such a theory is needed if we are to define and motivate guidance in performing high quality RE research and practice. We aim at providing an empirical and valid foundation for a theory of RE, which helps software engineers establish effective and efficient RE processes. We designed a survey instrument and theory that has now been replicated in 10 countries world-wide. We evaluate the propositions of the theory with bootstrapped confidence intervals and derive potential explanations for the propositions. We report on the underlying theory and the full results obtained from the replication studies with participants from 228 organisations. Our results represent a substantial step forward towards developing an empirically-based theory of RE giving insights into current practices with RE processes. The results reveal, for example, that there are no strong differences between organisations in different countries and regions, that interviews, facilitated meetings and prototyping are the most used elicitation techniques, that requirements are often documented textually, that traces between requirements and code or design documents is common, requirements specifications themselves are rarely changed and that requirements engineering (process) improvement endeavours are mostly intrinsically motivated. Our study establishes a theory that can be used as starting point for many further studies for more detailed investigations. Practitioners can use the results as theory-supported guidance on selecting suitable RE methods and techniques. △ Less

Submitted 17 December, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: 47 pages, 19 figures, accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM)

Journal ref: ACM Transactions on Software Engineering and Methodology, Volume 28, Issue 2, March 2019

arXiv:1802.07257 [pdf, other]

Predicting Natural Hazards with Neuronal Networks

Authors: Matthias Rauter, Daniel Winkler

Abstract: Gravitational mass flows, such as avalanches, debris flows and rockfalls are common events in alpine regions with high impact on transport routes. Within the last few decades, hazard zone maps have been developed to systematically approach this threat. These maps mark vulnerable zones in habitable areas to allow effective planning of hazard mitigation measures and development of settlements. Hazar… ▽ More Gravitational mass flows, such as avalanches, debris flows and rockfalls are common events in alpine regions with high impact on transport routes. Within the last few decades, hazard zone maps have been developed to systematically approach this threat. These maps mark vulnerable zones in habitable areas to allow effective planning of hazard mitigation measures and development of settlements. Hazard zone maps have shown to be an effective tool to reduce fatalities during extreme events. They are created in a complex process, based on experience, empirical models, physical simulations and historical data. The generation of such maps is therefore expensive and limited to crucially important regions, e.g. permanently inhabited areas. In this work we interpret the task of hazard zone mapping as a classification problem. Every point in a specific area has to be classified according to its vulnerability. On a regional scale this leads to a segmentation problem, where the total area has to be divided in the respective hazard zones. The recent developments in artificial intelligence, namely convolutional neuronal networks, have led to major improvement in a very similar task, image classification and semantic segmentation, i.e. computer vision. We use a convolutional neuronal network to identify terrain formations with the potential for catastrophic snow avalanches and label points in their reach as vulnerable. Repeating this procedure for all points allows us to generate an artificial hazard zone map. We demonstrate that the approach is feasible and promising based on the hazard zone map of the Tirolean Oberland. However, more training data and further improvement of the method is required before such techniques can be applied reliably. △ Less

Submitted 21 February, 2018; originally announced February 2018.

arXiv:1612.00163 [pdf]

doi 10.1007/978-3-319-27033-3_5

Preventing Incomplete/Hidden Requirements: Reflections on Survey Data from Austria and Brazil

Authors: M. Kalinowski, M. Felderer, T. Conte, R. Spínola, R. Prikladnicki, D. Winkler, D. Méndez Fernández, S. Wagner

Abstract: Many software projects fail due to problems in requirements engineering (RE). The goal of this paper is analyzing a specific and relevant RE problem in detail: incomplete/hidden requirements. We replicated a global family of RE surveys with representatives of software organizations in Austria and Brazil. We used the data to (a) characterize the criticality of the selected RE problem, and to (b) an… ▽ More Many software projects fail due to problems in requirements engineering (RE). The goal of this paper is analyzing a specific and relevant RE problem in detail: incomplete/hidden requirements. We replicated a global family of RE surveys with representatives of software organizations in Austria and Brazil. We used the data to (a) characterize the criticality of the selected RE problem, and to (b) analyze the reported main causes and mitigation actions. Based on the analysis, we discuss how to prevent the problem. The survey includes 14 different organizations in Austria and 74 in Brazil, including small, medium and large sized companies, conducting both, plan-driven and agile development processes. Respondents from both countries cited the incomplete/hidden requirements problem as one of the most critical RE problems. We identified and graphically represented the main causes and documented solution options to address these causes. Further, we compiled a list of reported mitigation actions. From a practical point of view, this paper provides further insights into common causes of incomplete/hidden requirements and on how to prevent this problem. △ Less

Submitted 1 December, 2016; originally announced December 2016.

Comments: in Proceedings of the Software Quality Days, 2015

Showing 1–11 of 11 results for author: Winkler, D