-
Why you shouldn't fully trust ChatGPT: A synthesis of this AI tool's error rates across disciplines and the software engineering lifecycle
Authors:
Vahid Garousi
Abstract:
Context: ChatGPT and other large language models (LLMs) are widely used across healthcare, business, economics, engineering, and software engineering (SE). Despite their popularity, concerns persist about their reliability, especially their error rates across domains and the software development lifecycle (SDLC).
Objective: This study synthesizes and quantifies ChatGPT's reported error rates acr…
▽ More
Context: ChatGPT and other large language models (LLMs) are widely used across healthcare, business, economics, engineering, and software engineering (SE). Despite their popularity, concerns persist about their reliability, especially their error rates across domains and the software development lifecycle (SDLC).
Objective: This study synthesizes and quantifies ChatGPT's reported error rates across major domains and SE tasks aligned with SDLC phases. It provides an evidence-based view of where ChatGPT excels, where it fails, and how reliability varies by task, domain, and model version (GPT-3.5, GPT-4, GPT-4-turbo, GPT-4o).
Method: A Multivocal Literature Review (MLR) was conducted, gathering data from academic studies, reports, benchmarks, and grey literature up to 2025. Factual, reasoning, coding, and interpretive errors were considered. Data were grouped by domain and SE phase and visualized using boxplots to show error distributions.
Results: Error rates vary across domains and versions. In healthcare, rates ranged from 8% to 83%. Business and economics saw error rates drop from ~50% with GPT-3.5 to 15-20% with GPT-4. Engineering tasks averaged 20-30%. Programming success reached 87.5%, though complex debugging still showed over 50% errors. In SE, requirements and design phases showed lower error rates (~5-20%), while coding, testing, and maintenance phases had higher variability (10-50%). Upgrades from GPT-3.5 to GPT-4 improved reliability.
Conclusion: Despite improvements, ChatGPT still exhibits non-negligible error rates varying by domain, task, and SDLC phase. Full reliance without human oversight remains risky, especially in critical settings. Continuous evaluation and critical validation are essential to ensure reliability and trustworthiness.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
AI-powered software testing tools: A systematic review and empirical assessment of their features and limitations
Authors:
Vahid Garousi,
Nithin Joy,
Zafar Jafarov,
Alper Buğra Keleş,
Sevde Değirmenci,
Ece Özdemir,
Ryan Zarringhalami
Abstract:
Context: The rise of Artificial Intelligence (AI) in software engineering has led to the development of AI-powered test automation tools, promising improved efficiency, reduced maintenance effort, and enhanced defect-detection. However, a systematic evaluation of these tools is needed to understand their capabilities, benefits, and limitations. Objective: This study has two objectives: (1) A syste…
▽ More
Context: The rise of Artificial Intelligence (AI) in software engineering has led to the development of AI-powered test automation tools, promising improved efficiency, reduced maintenance effort, and enhanced defect-detection. However, a systematic evaluation of these tools is needed to understand their capabilities, benefits, and limitations. Objective: This study has two objectives: (1) A systematic review of AI-assisted test automation tools, categorizing their key AI features; (2) an empirical study of two selected AI-powered tools on two software under test, to investigate the effectiveness and limitations of the tools. Method: A systematic review of 55 AI-based test automation tools was conducted, classifying them based on their AI-assisted capabilities such as self-healing tests, visual testing, and AI-powered test generation. In the second phase, two representative tools were selected for the empirical study, in which we applied them to test two open-source software systems. Their performance was compared with traditional test automation approaches to evaluate efficiency and adaptability. Results: The review provides a comprehensive taxonomy of AI-driven testing tools, highlighting common features and trends. The empirical evaluation demonstrates that AI-powered automation enhances test execution efficiency and reduces maintenance effort but also exposes limitations such as handling complex UI changes and contextual understanding. Conclusion: AI-driven test automation tools show strong potential in improving software quality and reducing manual testing effort. However, their current limitations-such as false positives, lack of domain knowledge, and dependency on predefined models-indicate the need for further refinement. Future research should focus on advancing AI models to improve adaptability, reliability, and robustness in software testing.
△ Less
Submitted 1 May, 2025; v1 submitted 31 August, 2024;
originally announced September 2024.
-
Coverage measurement in model-based testing of web applications: Tool support and an industrial experience report
Authors:
Vahid Garousi,
Alper Buğra Keleş,
Yunus Balaman,
Alper Mermer,
Zeynep Özdemir Güler
Abstract:
There are many widely used tools for measuring test-coverage and code-coverage. Test coverage is the ratio of requirements or other non-code artifacts covered by a test suite, while code-coverage is the ratio of source code covered by tests. Almost all coverage tools show a few certain subset of coverage values, and almost always either test-coverage or code-coverage measures. In a large-scale ind…
▽ More
There are many widely used tools for measuring test-coverage and code-coverage. Test coverage is the ratio of requirements or other non-code artifacts covered by a test suite, while code-coverage is the ratio of source code covered by tests. Almost all coverage tools show a few certain subset of coverage values, and almost always either test-coverage or code-coverage measures. In a large-scale industrial web-application-testing setting, we were faced with the need to "integrate" several types of coverage data (including front-end and back-end code coverage with requirements coverage), and to see all of them "live" as large model-based test suites were running. By being unable to find any off-the-shelf toolset to address the above need, we have developed an open-source test coverage tool, specific for MBT, named MBTCover. In addition to code coverage, the tool measures and reports requirements and model coverage, "live" as a given MBT test suite is executing. In this paper, we present the features of the MBTCover tool and our experience from using it in multiple large test-automation projects in practice. Other software test engineers, who conduct web application testing and MBT, may find the tool useful in their projects.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
A pragmatic look at education and training of software test engineers: Further cooperation of academia and industry is needed
Authors:
Vahid Garousi,
Alper Buğra Keleş
Abstract:
Alongside software testing education in universities, a great extent of effort and resources are spent on software-testing training activities in industry. For example, there are several international certification schemes in testing, such as those provided by the International Software Testing Qualifications Board (ISTQB), which have been issued to more than 914K testers so far. To train the high…
▽ More
Alongside software testing education in universities, a great extent of effort and resources are spent on software-testing training activities in industry. For example, there are several international certification schemes in testing, such as those provided by the International Software Testing Qualifications Board (ISTQB), which have been issued to more than 914K testers so far. To train the highly qualified test engineers of tomorrow, it is important for both university educators and trainers in industry to be aware of the status of software testing education in academia versus its training in industry, to analyze the relationships of these two approaches, and to assess ways on how to improve the education / training landscape. For that purpose, this paper provides a pragmatic overview of the issue, presents several recommendations, and hopes to trigger further discussions in the community, between industry and academia, on how to further improve the status-quo, and to find further best practices for more effective education and training of software testers. The paper is based on combined ~40 years of the two authors' technical experience in test engineering, and their ~30 years of experience in providing testing education and training in more than six countries.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Model-based testing in practice: An experience report from the web applications domain
Authors:
Vahid Garousi,
Alper Buğra Keleş,
Yunus Balaman,
Zeynep Özdemir Güler,
Andrea Arcuri
Abstract:
In the context of a large software testing company, we have deployed the model-based testing (MBT) approach to take the company's test automation practices to higher levels of maturity /and capability. We have chosen, from a set of open-source/commercial MBT tools, an open-source tool named GraphWalker, and have pragmatically used MBT for end-to-end test automation of several large web and mobile…
▽ More
In the context of a large software testing company, we have deployed the model-based testing (MBT) approach to take the company's test automation practices to higher levels of maturity /and capability. We have chosen, from a set of open-source/commercial MBT tools, an open-source tool named GraphWalker, and have pragmatically used MBT for end-to-end test automation of several large web and mobile applications under test. The MBT approach has provided, so far in our project, various tangible and intangible benefits in terms of improved test coverage (number of paths tested), improved test-design practices, and also improved real-fault detection effectiveness. The goal of this experience report (applied research report), done based on "action research", is to share our experience of applying and evaluating MBT as a software technology (technique and tool) in a real industrial setting. We aim at contributing to the body of empirical evidence in industrial application of MBT by sharing our industry-academia project on applying MBT in practice, the insights that we have gained, and the challenges and questions that we have faced and tackled so far. We discuss an overview of the industrial setting, provide motivation, explain the events leading to the outcomes, discuss the challenges faced, summarize the outcomes, and conclude with lessons learned, take-away messages, and practical advices based on the described experience. By learning from the best practices in this paper, other test engineers could conduct more mature MBT in their test projects.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Mining user reviews of COVID contact-tracing apps: An exploratory analysis of nine European apps
Authors:
Vahid Garousi,
David Cutting,
Michael Felderer
Abstract:
Context: More than 50 countries have developed COVID contact-tracing apps to limit the spread of coronavirus. However, many experts and scientists cast doubt on the effectiveness of those apps. For each app, a large number of reviews have been entered by end-users in app stores. Objective: Our goal is to gain insights into the user reviews of those apps, and to find out the main problems that user…
▽ More
Context: More than 50 countries have developed COVID contact-tracing apps to limit the spread of coronavirus. However, many experts and scientists cast doubt on the effectiveness of those apps. For each app, a large number of reviews have been entered by end-users in app stores. Objective: Our goal is to gain insights into the user reviews of those apps, and to find out the main problems that users have reported. Our focus is to assess the "software in society" aspects of the apps, based on user reviews. Method: We selected nine European national apps for our analysis and used a commercial app-review analytics tool to extract and mine the user reviews. For all the apps combined, our dataset includes 39,425 user reviews. Results: Results show that users are generally dissatisfied with the nine apps under study, except the Scottish ("Protect Scotland") app. Some of the major issues that users have complained about are high battery drainage and doubts on whether apps are really working. Conclusion: Our results show that more work is needed by the stakeholders behind the apps (e.g., app developers, decision-makers, public health experts) to improve the public adoption, software quality and public perception of these apps.
△ Less
Submitted 25 December, 2020;
originally announced December 2020.
-
Retrieving and mining professional experience of software practice from grey literature: an exploratory review
Authors:
Austen Rainer,
Ashley Williams,
Vahid Garousi,
Michael Felderer
Abstract:
Background: Retrieving and mining practitioners' self--reports of their professional experience of software practice could provide valuable evidence for research. We are, however, unaware of any existing reviews of research conducted in this area. Objective: To review and classify previous research, and to identify insights into the challenges research confronts when retrieving and mining practiti…
▽ More
Background: Retrieving and mining practitioners' self--reports of their professional experience of software practice could provide valuable evidence for research. We are, however, unaware of any existing reviews of research conducted in this area. Objective: To review and classify previous research, and to identify insights into the challenges research confronts when retrieving and mining practitioners' self-reports of their experience of software practice. Method: We conduct an exploratory review to identify and classify 42 articles. We analyse a selection of those articles for insights on challenges to mining professional experience. Results: We identify only one directly relevant article. Even then this article concerns the software professional's emotional experiences rather than the professional's reporting of behaviour and events occurring during software practice. We discuss challenges concerning: the prevalence of professional experience; definitions, models and theories; the sparseness of data; units of discourse analysis; annotator agreement; evaluation of the performance of algorithms; and the lack of replications. Conclusion: No directly relevant prior research appears to have been conducted in this area. We discuss the value of reporting negative results in secondary studies. There are a range of research opportunities but also considerable challenges. We formulate a set of guiding questions for further research in this area.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Assessing the maturity of software testing services using CMMI-SVC: An industrial case study
Authors:
Vahid Garousi,
Seyfettin Arkan,
Gökhan Urul,
Çağrı Murat Karapıçak,
Michael Felderer
Abstract:
Context: While many companies conduct their software testing activities in-house, many other companies outsource their software testing needs to other firms who act as software testing service providers. As a result, Testing as a Service (TaaS) has emerged as a strong service industry in the last several decades. In the context of software testing services, there could be various challenges (e.g.,…
▽ More
Context: While many companies conduct their software testing activities in-house, many other companies outsource their software testing needs to other firms who act as software testing service providers. As a result, Testing as a Service (TaaS) has emerged as a strong service industry in the last several decades. In the context of software testing services, there could be various challenges (e.g., during the planning and service delivery phases) and, as a result, the quality of testing services is not always as expected. Objective: It is important, for both providers and also customers of testing services, to assess the quality and maturity of test services and subsequently improve them. Method: Motivated by a real industrial need in the context of several testing service providers, to assess the maturity of their software testing services, we chose the existing CMMI for Services maturity model (CMMI-SVC), and conducted a case study using it in the context of two Turkish testing service providers. Results: The case-study results show that maturity appraisal of testing services using CMMI-SVC was helpful for both companies and their test management teams by enabling them objectively assess the maturity of their testing services and also by pinpointing potential improvement areas. Conclusion: We empirically observed that, after some minor customization, CMMI-SVC is indeed a suitable model for maturity appraisal of testing services.
△ Less
Submitted 30 May, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Visual GUI testing in practice: An extended industrial case study
Authors:
Vahid Garousi,
Wasif Afzal,
Adem Çağlar,
İhsan Berk Işık,
Berker Baydan,
Seçkin Çaylak,
Ahmet Zeki Boyraz,
Burak Yolaçan,
Kadir Herkiloğlu
Abstract:
Context: Visual GUI testing (VGT) is referred to as the latest generation GUI-based testing. It is a tool-driven technique, which uses image recognition for interacting with and asserting the behavior of the system under test. Motivated by the industrial need of a large Turkish software and systems company providing solutions in the areas of defense and IT sector, an action-research project was re…
▽ More
Context: Visual GUI testing (VGT) is referred to as the latest generation GUI-based testing. It is a tool-driven technique, which uses image recognition for interacting with and asserting the behavior of the system under test. Motivated by the industrial need of a large Turkish software and systems company providing solutions in the areas of defense and IT sector, an action-research project was recently initiated to implement VGT in several teams and projects in the company.
Objective: To address the above needs, we planned and carried out an empirical investigation with the goal of assessing VGT using two tools (Sikuli and JAutomate). The purpose was to determine a suitable approach and tool for VGT of a given project (software product) in the company, increase the know-how in the company's test teams.
Method: Using an action-research case-study design, we investigated the use of VGT in the studied organization. Specifically, using the two selected VGT tools, we conducted a quantitative and a qualitative evaluation of VGT.
Results: By assessing the list of Challenges, Problems and Limitations (CPL), proposed in previous work, in the context of our empirical study, we found that test-tool- and SUT-related CPLs were quite comparable to a previous empirical study, e.g., the synchronization between SUT and test tools were not always robust and there were failures in test tools' image recognition features. When assessing the types of test maintenance activities, when executing the automated test cases on next versions of the SUTs, for the case of the two test tools, we found that about half of the test cases (59.1% and 47.8%) failed in the next version.
Conclusion: By our results, we confirm some of the previously-reported issues when conducting VGT. Further, we highlight some additional challenges in test maintenance when using VGT.
△ Less
Submitted 20 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Software-testing education: A systematic literature mapping
Authors:
Vahid Garousi,
Austen Rainer,
Per Lauvås jr,
Andrea Arcuri
Abstract:
Context: With the rising complexity and scale of software systems, there is an ever-increasing demand for sophisticated and cost-effective software testing. To meet such a demand, there is a need for a highly-skilled software testing work-force (test engineers) in the industry. To address that need, many university educators worldwide have included software-testing education in their software engi…
▽ More
Context: With the rising complexity and scale of software systems, there is an ever-increasing demand for sophisticated and cost-effective software testing. To meet such a demand, there is a need for a highly-skilled software testing work-force (test engineers) in the industry. To address that need, many university educators worldwide have included software-testing education in their software engineering (SE) or computer science (CS) programs. Objective: Our objective in this paper is to summarize the body of experience and knowledge in the area of software-testing education to benefit the readers (both educators and researchers) in designing and delivering software testing courses in university settings, and to also conduct further education research in this area. Method: To address the above need, we conducted a systematic literature mapping (SLM) to synthesize what the community of educators have published on this topic. After compiling a candidate pool of 307 papers, and applying a set of inclusion/exclusion criteria, our final pool included 204 papers published between 1992 and 2019. Results: The topic of software-testing education is becoming more active, as we can see by the increasing number of papers. Many pedagogical approaches (how to best teach testing), course-ware, and specific tools for testing education have been proposed. Many challenges in testing education and insights on how to overcome those challenges have been proposed. Conclusion: This paper provides educators and researchers with a classification of existing studies within software-testing education. We further synthesize challenges and insights reported when teaching software testing. The paper also provides a reference ("index") to the vast body of knowledge and experience on teaching software testing.
△ Less
Submitted 8 March, 2020;
originally announced March 2020.
-
Experience in engineering of scientific software: The case of an optimization software for oil pipelines
Authors:
Vahid Garousi,
Ehsan Abbasi,
Bedir Tekinerdogan
Abstract:
Development of scientific and engineering software is usually different and could be more challenging than the development of conventional enterprise software. The authors were involved in a technology-transfer project between academia and industry which focused on engineering, development and testing of a software for optimization of pumping energy costs for oil pipelines. Experts with different…
▽ More
Development of scientific and engineering software is usually different and could be more challenging than the development of conventional enterprise software. The authors were involved in a technology-transfer project between academia and industry which focused on engineering, development and testing of a software for optimization of pumping energy costs for oil pipelines. Experts with different skillsets (mechanical, power and software engineers) were involved. Given the complex nature of the software (a sophisticated underlying optimization model) and having experts from different fields, there were challenges in various software engineering aspects of the software system (e.g., requirements and testing). We report our observations and experience in addressing those challenges during our technology-transfer project, and aim to add to the existing body of experience and evidence in engineering of scientific and engineering software. We believe that our observations, experience and lessons learnt could be useful for other researchers and practitioners in engineering of other scientific and engineering software systems.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Benefitting from the Grey Literature in Software Engineering Research
Authors:
Vahid Garousi,
Michael Felderer,
Mika V. Mäntylä,
Austen Rainer
Abstract:
Researchers generally place the most trust in peer-reviewed, published information, such as journals and conference papers. By contrast, software engineering (SE) practitioners typically do not have the time, access or expertise to review and benefit from such publications. As a result, practitioners are more likely to turn to other sources of information that they trust, e.g., trade magazines, on…
▽ More
Researchers generally place the most trust in peer-reviewed, published information, such as journals and conference papers. By contrast, software engineering (SE) practitioners typically do not have the time, access or expertise to review and benefit from such publications. As a result, practitioners are more likely to turn to other sources of information that they trust, e.g., trade magazines, online blog-posts, survey results or technical reports, collectively referred to as Grey Literature (GL). Furthermore, practitioners also share their ideas and experiences as GL, which can serve as a valuable data source for research. While GL itself is not a new topic in SE, using, benefitting and synthesizing knowledge from the GL in SE is a contemporary topic in empirical SE research and we are seeing that researchers are increasingly benefitting from the knowledge available within GL. The goal of this chapter is to provide an overview to GL in SE, together with insights on how SE researchers can effectively use and benefit from the knowledge and evidence available in the vast amount of GL.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Maturity assessment and maturity models in healthcare: A multivocal literature review
Authors:
Ayça Kolukısa Tarhan,
Vahid Garousi,
Oktay Turetken,
Mehmet Söylemez,
Sonia Garossi
Abstract:
Context: Maturity of practices and infrastructure in healthcare domain directly impacts the quality and efficiency of healthcare services. Therefore, various healthcare administrations (e.g., hospital management to nation-wide health authority) need to assess and improve their operational maturity.
Objective: This study aims to review and classify studies that propose/use maturity assessment or…
▽ More
Context: Maturity of practices and infrastructure in healthcare domain directly impacts the quality and efficiency of healthcare services. Therefore, various healthcare administrations (e.g., hospital management to nation-wide health authority) need to assess and improve their operational maturity.
Objective: This study aims to review and classify studies that propose/use maturity assessment or maturity models (MMs) as a vehicle to achieve operational excellence in healthcare domain.
Method: To achieve this objective, we performed a Multivocal Literature Review (MLR) that is a form of Systematic Review and includes data from the grey literature (e.g., white papers and online documents) in addition to formal, peer-reviewed literature.
Results: Based on 101 sources, 80 of which are from the peer-reviewed literature and 21 are from the grey literature, we identified 68 different MMs on, e.g., telemedicine, care pathways, and digital imaging. We reviewed them with respect to various aspects including: types of research and contribution; list of MMs proposed/used with their subject focuses; elements of maturity/capability; and application scope or scale. In the synthesis of empirical benefits of using MMs, two were found significant: (1) Identifying issues and providing guidance for improvement in healthcare contexts; (2) Improving efficiency, effectiveness, performance, and productivity.
Conclusion: This MLR provides an overview of the landscape and serves as an index to the vast body of knowledge in this area. Our review creates an opportunity to cope with the challenges in getting an overview of the state-of-the-art and practice, choosing the most suitable models, or developing new models with further specialties.
△ Less
Submitted 5 October, 2019;
originally announced October 2019.
-
Citations in Software Engineering -- Paper-related, Journal-related, and Author-related Factors
Authors:
Mika Mäntylä,
Vahid Garousi
Abstract:
Many factors could affect the number of citations to a paper. Citations have an important role in research policy and in measuring the excellence of research and researchers. This work is the first study in software engineering (SE) to assess multiple factors affecting the number of citations to SE papers. We use (a) negative binomial regression and (b) quantile regression to study arithmetic mean…
▽ More
Many factors could affect the number of citations to a paper. Citations have an important role in research policy and in measuring the excellence of research and researchers. This work is the first study in software engineering (SE) to assess multiple factors affecting the number of citations to SE papers. We use (a) negative binomial regression and (b) quantile regression to study arithmetic mean and median expected citations of a paper. Our dataset includes all the 25,113 papers which have been published in a set of 16 main SE journals, between 1970 and 2018. Our results indicate that publication venue, author team's past citations, paper length, the number of references, and the recency of references are the most influential factors on the number of citations to SE papers. From our empirical findings, we present several implications and advice to researchers for getting higher citations on their papers, which are in addition to the obvious case of conducting high-quality technical research, e.g. (1) Aim for high-profile venues, (2) Build a high-quality author team with highly cited past papers, and (3) Aim for high-quality work that has comprehensive content (thus longer paper length and reference list).
△ Less
Submitted 13 August, 2019; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Video Game Development in a Rush: A Survey of the Global Game Jam Participants
Authors:
Markus Borg,
Vahid Garousi,
Anas Mahmoud,
Thomas Olsson,
Oskar Stålberg
Abstract:
Video game development is a complex endeavor, often involving complex software, large organizations, and aggressive release deadlines. Several studies have reported that periods of "crunch time" are prevalent in the video game industry, but there are few studies on the effects of time pressure. We conducted a survey with participants of the Global Game Jam (GGJ), a 48-hour hackathon. Based on 198…
▽ More
Video game development is a complex endeavor, often involving complex software, large organizations, and aggressive release deadlines. Several studies have reported that periods of "crunch time" are prevalent in the video game industry, but there are few studies on the effects of time pressure. We conducted a survey with participants of the Global Game Jam (GGJ), a 48-hour hackathon. Based on 198 responses, the results suggest that: (1) iterative brainstorming is the most popular method for conceptualizing initial requirements; (2) continuous integration, minimum viable product, scope management, version control, and stand-up meetings are frequently applied development practices; (3) regular communication, internal playtesting, and dynamic and proactive planning are the most common quality assurance activities; and (4) familiarity with agile development has a weak correlation with perception of success in GGJ. We conclude that GGJ teams rely on ad hoc approaches to development and face-to-face communication, and recommend some complementary practices with limited overhead. Furthermore, as our findings are similar to recommendations for software startups, we posit that game jams and the startup scene share contextual similarities. Finally, we discuss the drawbacks of systemic "crunch time" and argue that game jam organizers are in a good position to problematize the phenomenon.
△ Less
Submitted 31 March, 2019;
originally announced April 2019.
-
Closing the gap between software engineering education and industrial needs
Authors:
Vahid Garousi,
Görkem Giray,
Eray Tüzün,
Cagatay Catal,
Michael Felderer
Abstract:
According to different reports, many recent software engineering graduates often face difficulties when beginning their professional careers, due to misalignment of the skills learnt in their university education with what is needed in industry. To address that need, many studies have been conducted to align software engineering education with industry needs. To synthesize that body of knowledge,…
▽ More
According to different reports, many recent software engineering graduates often face difficulties when beginning their professional careers, due to misalignment of the skills learnt in their university education with what is needed in industry. To address that need, many studies have been conducted to align software engineering education with industry needs. To synthesize that body of knowledge, we present in this paper a systematic literature review (SLR) which summarizes the findings of 33 studies in this area. By doing a meta-analysis of all those studies and using data from 12 countries and over 4,000 data points, this study will enable educators and hiring managers to adapt their education / hiring efforts to best prepare the software engineering workforce.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Practical relevance of software engineering research: Synthesizing the community's voice
Authors:
Vahid Garousi,
Markus Borg,
Markku Oivo
Abstract:
Software engineering (SE) research should be relevant to industrial practice. There have been regular discussions in the SE community on this issue since the 1980's, led by pioneers such as Robert Glass. As we recently passed the milestone of "50 years of software engineering", some recent positive efforts have been made in this direction, e.g., establishing "industrial" tracks in several SE confe…
▽ More
Software engineering (SE) research should be relevant to industrial practice. There have been regular discussions in the SE community on this issue since the 1980's, led by pioneers such as Robert Glass. As we recently passed the milestone of "50 years of software engineering", some recent positive efforts have been made in this direction, e.g., establishing "industrial" tracks in several SE conferences. However, many researchers and practitioners believe that we, as a community, are still struggling with research relevance and utility. The goal of this paper is to synthesize the evidence and experience-based opinions shared on this topic so far in the SE community, and to encourage the community to further reflect and act on the research relevance. For this purpose, we have conducted a Multi-vocal Literature Review (MLR) of 54 systematically-selected sources (papers and non peer-reviewed articles). Instead of relying on and considering the individual opinions on research relevance, mentioned in each of the sources, the MLR aims to synthesize and provide the "holistic" view on the topic. The highlights of our MLR findings are as follows. The top three root causes of low relevance, discussed in the community, are: (1) Researchers having simplistic views (or wrong assumptions) about SE in practice; (2) Lack of connection with industry; and (3) Wrong identification of research problems. The top three suggestions for improving research relevance are: (1) Using appropriate research approaches such as action-research; (2) Choosing relevant research problems; and (3) Collaborating with industry. By synthesizing all the discussions on this important topic so far, this paper aims to encourage further discussions and actions in the community to increase our collective efforts to improve the research relevance.
△ Less
Submitted 21 January, 2020; v1 submitted 4 December, 2018;
originally announced December 2018.
-
NLP-assisted software testing: A systematic mapping of the literature
Authors:
Vahid Garousi,
Sara Bauer,
Michael Felderer
Abstract:
Context: To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large amount of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area.…
▽ More
Context: To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large amount of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area. Objective: Our objective is to summarize the state-of-the-art in NLP-assisted software testing which could benefit practitioners to potentially utilize those NLP-based techniques. Moreover, this can benefit researchers in providing an overview of the research landscape. Method: To address the above need, we conducted a survey in the form of a systematic literature mapping (classification). After compiling an initial pool of 95 papers, we conducted a systematic voting, and our final pool included 67 technical papers. Results: This review paper provides an overview of the contribution types presented in the papers, types of NLP approaches used to assist software testing, types of required input requirements, and a review of tool support in this area. Some key results we have detected are: (1) only four of the 38 tools (11%) presented in the papers are available for download; (2) a larger ratio of the papers (30 of 67) provided a shallow exposure to the NLP aspects (almost no details). Conclusion: This paper would benefit both practitioners and researchers by serving as an "index" to the body of knowledge in this area. The results could help practitioners utilizing the existing NLP-based techniques; this in turn reduces the cost of test-case design and decreases the amount of human resources spent on test activities. After sharing this review with some of our industrial collaborators, initial insights show that this review can indeed be useful and beneficial to practitioners.
△ Less
Submitted 21 March, 2020; v1 submitted 2 June, 2018;
originally announced June 2018.
-
A survey on software testability
Authors:
Vahid Garousi,
Michael Felderer,
Feyza Nur Kilicaslan
Abstract:
Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is oft…
▽ More
Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is often challenging for a practitioner or a new researcher. Objective: Our objective is to summarize the body of knowledge in this area and to benefit the readers (both practitioners and researchers) in preparing, measuring and improving software testability. Method: To address the above need, the authors conducted a survey in the form of a systematic literature mapping (classification) to find out what we as a community know about this topic. After compiling an initial pool of 303 papers, and applying a set of inclusion/exclusion criteria, our final pool included 208 papers. Results: The area of software testability has been comprehensively studied by researchers and practitioners. Approaches for measurement of testability and improvement of testability are the most-frequently addressed in the papers. The two most often mentioned factors affecting testability are observability and controllability. Common ways to improve testability are testability transformation, improving observability, adding assertions, and improving controllability. Conclusion: This paper serves for both researchers and practitioners as an "index" to the vast body of knowledge in the area of testability. The results could help practitioners measure and improve software testability in their projects.
△ Less
Submitted 6 December, 2018; v1 submitted 7 January, 2018;
originally announced January 2018.
-
Guidelines for including grey literature and conducting multivocal literature reviews in software engineering
Authors:
Vahid Garousi,
Michael Felderer,
Mika V. Mäntylä
Abstract:
Context: A Multivocal Literature Review (MLR) is a form of a Systematic Literature Review (SLR) which includes the grey literature (e.g., blog posts and white papers) in addition to the published (formal) literature (e.g., journal and conference papers). MLRs are useful for both researchers and practitioners since they provide summaries both the state-of-the art and -practice in a given area. Obje…
▽ More
Context: A Multivocal Literature Review (MLR) is a form of a Systematic Literature Review (SLR) which includes the grey literature (e.g., blog posts and white papers) in addition to the published (formal) literature (e.g., journal and conference papers). MLRs are useful for both researchers and practitioners since they provide summaries both the state-of-the art and -practice in a given area. Objective: There are several guidelines to conduct SLR studies in SE. However, given the facts that several phases of MLRs differ from those of traditional SLRs, for instance with respect to the search process and source quality assessment. Therefore, SLR guidelines are only partially useful for conducting MLR studies. Our goal in this paper is to present guidelines on how to conduct MLR studies in SE. Method: To develop the MLR guidelines, we benefit from three inputs: (1) existing SLR guidelines in SE, (2), a literature survey of MLR guidelines and experience papers in other fields, and (3) our own experiences in conducting several MLRs in SE. All derived guidelines are discussed in the context of three examples MLRs as running examples (two from SE and one MLR from the medical sciences). Results: The resulting guidelines cover all phases of conducting and reporting MLRs in SE from the planning phase, over conducting the review to the final reporting of the review. In particular, we believe that incorporating and adopting a vast set of recommendations from MLR guidelines and experience papers in other fields have enabled us to propose a set of guidelines with solid foundations. Conclusion: Having been developed on the basis of three types of solid experience and evidence, the provided MLR guidelines support researchers to effectively and efficiently conduct new MLRs in any area of SE.
△ Less
Submitted 18 September, 2018; v1 submitted 9 July, 2017;
originally announced July 2017.
-
A Survey of Software Engineering Practices in Turkey (extended version)
Authors:
Vahid Garousi,
Ahmet Coşkunçay,
Aysu Betin-Can,
Onur Demirörs
Abstract:
Context: Understanding the types of software engineering practices and techniques used in the industry is important. There is a wide spectrum in terms of the types and maturity of software engineering practices conducted in each software team and company. To characterize the type of software engineering practices conducted in software firms, a variety of surveys have been conducted in different co…
▽ More
Context: Understanding the types of software engineering practices and techniques used in the industry is important. There is a wide spectrum in terms of the types and maturity of software engineering practices conducted in each software team and company. To characterize the type of software engineering practices conducted in software firms, a variety of surveys have been conducted in different countries and regions. Turkey has a vibrant software industry and it is important to characterize and understand the state of software engineering practices in this industry. Objective: Our objective is to characterize and grasp a high-level view on type of software engineering practices in the Turkish software industry. Among the software engineering practices that we have surveyed in this study are the followings: software requirements, design, development, testing, maintenance, configuration management, release planning and support practices. The current survey is the most comprehensive of its type ever conducted in the context of Turkish software industry. Method: To achieve the above objective, we systematically designed an online survey with 46 questions based on our past experience in the Canadian and Turkish contexts and using the Software Engineering Body of Knowledge (SWEBOK). 202 practicing software engineers from the Turkish software industry participated in the survey. We analyze and report in this paper the results of the questions. Whenever possible, we also compare the trends and results of our survey with the results of a similar 2010 survey conducted in the Canadian software industry.
△ Less
Submitted 16 December, 2014; v1 submitted 15 December, 2014;
originally announced December 2014.