Search | arXiv e-print repository

Data-Driven Extract Method Recommendations: A Study at ING

Authors: David van der Leij, Jasper Binda, Robbert van Dalen, Pieter Vallen, Yaping Luo, Maurício Aniche

Abstract: The sound identification of refactoring opportunities is still an open problem in software engineering. Recent studies have shown the effectiveness of machine learning models in recommending methods that should undergo different refactoring operations. In this work, we experiment with such approaches to identify methods that should undergo an Extract Method refactoring, in the context of ING, a la… ▽ More The sound identification of refactoring opportunities is still an open problem in software engineering. Recent studies have shown the effectiveness of machine learning models in recommending methods that should undergo different refactoring operations. In this work, we experiment with such approaches to identify methods that should undergo an Extract Method refactoring, in the context of ING, a large financial organization. More specifically, we (i) compare the code metrics distributions, which are used as features by the models, between open-source and ING systems, (ii) measure the accuracy of different machine learning models in recommending Extract Method refactorings, (iii) compare the recommendations given by the models with the opinions of ING experts. Our results show that the feature distributions of ING systems and open-source systems are somewhat different, that machine learning models can recommend Extract Method refactorings with high accuracy, and that experts tend to agree with most of the recommendations of the model. △ Less

Submitted 22 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2105.02023 [pdf, other]

Interactive Static Software Performance Analysis in the IDE

Authors: Aaron Beigelbeck, Maurício Aniche, Jürgen Cito

Abstract: Detecting performance issues due to suboptimal code during the development process can be a daunting task, especially when it comes to localizing them after noticing performance degradation after deployment. Static analysis has the potential to provide early feedback on performance problems to developers without having to run profilers with expensive (and often unavailable) performance tests. We d… ▽ More Detecting performance issues due to suboptimal code during the development process can be a daunting task, especially when it comes to localizing them after noticing performance degradation after deployment. Static analysis has the potential to provide early feedback on performance problems to developers without having to run profilers with expensive (and often unavailable) performance tests. We develop a VSCode tool that integrates the static performance analysis results from Infer via code annotations and decorations (surfacing complexity analysis results in context) and side panel views showing details and overviews (enabling explainability of the results). Additionally, we design our system for interactivity to allow for more responsiveness to code changes as they happen. We evaluate the efficacy of our tool by measuring the overhead that the static performance analysis integration introduces in the development workflow. Further, we report on a case study that illustrates how our system can be used to reason about software performance in the context of a real performance bug in the ElasticSearch open-source project. Demo video: https://www.youtube.com/watch?v=-GqPb_YZMOs Repository: https://github.com/ipa-lab/vscode-infer-performance △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2104.03476 [pdf, other]

Secure Software Engineering in the Financial Services: A Practitioners' Perspective

Authors: Vivek Arora, Enrique Larios Vargas, Maurício Aniche, Arie van Deursen

Abstract: Secure software engineering is a fundamental activity in modern software development. However, while the field of security research has been advancing quite fast, in practice, there is still a vast knowledge gap between the security experts and the software development teams. After all, we cannot expect developers and other software practitioners to be security experts. Understanding how software… ▽ More Secure software engineering is a fundamental activity in modern software development. However, while the field of security research has been advancing quite fast, in practice, there is still a vast knowledge gap between the security experts and the software development teams. After all, we cannot expect developers and other software practitioners to be security experts. Understanding how software development teams incorporate security in their processes and the challenges they face is a step towards reducing this gap. In this paper, we study how financial services companies ensure the security of their software systems. To that aim, we performed a qualitative study based on semi-structured interviews with 16 software practitioners from 11 different financial companies in three continents. Our results shed light on the security considerations that practitioners take during the different phases of their software development processes, the different security practices that software teams make use of to ensure the security of their software systems, the improvements that practitioners perceive as important in existing state-of-the-practice security tools, the different knowledge-sharing and learning practices that developers use to learn more about software security, and the challenges that software practitioners currently face when it comes to secure their systems. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2104.02513 [pdf, other]

Logging Practices with Mobile Analytics: An Empirical Study on Firebase

Authors: Julian Harty, Haonan Zhang, Lili Wei, Luca Pascarella, Mauricio Aniche, Weiyi Shang

Abstract: Software logs are of great value in both industrial and open-source projects. Mobile analytics logging enables developers to collect logs remotely from their apps running on end user devices at the cost of recording and transmitting logs across the Internet to a centralised infrastructure. This paper makes a first step in characterising logging practices with a widely adopted mobile analytics lo… ▽ More Software logs are of great value in both industrial and open-source projects. Mobile analytics logging enables developers to collect logs remotely from their apps running on end user devices at the cost of recording and transmitting logs across the Internet to a centralised infrastructure. This paper makes a first step in characterising logging practices with a widely adopted mobile analytics logging library, namely Firebase Analytics. We provide an empirical evaluation of the use of Firebase Analytics in 57 open-source Android applications by studying the evolution of code-bases to understand: a) the needs-in-common that push practitioners to adopt logging practices on mobile devices, and b) the differences in the ways developers use local and remote logging. Our results indicate mobile analytics logs are less pervasive and less maintained than traditional logging code. Based on our analysis, we believe logging using mobile analytics is more user centered compared to traditional logging, where the latter is mainly used to record information for debugging purposes. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 8th IEEE/ACM International Conference on Mobile Software Engineering and Systems 2021

arXiv:2103.05424 [pdf, other]

Atoms of Confusion in Java

Authors: Chris Langhout, Maurício Aniche

Abstract: Although writing code seems trivial at times, problems arise when humans misinterpret what the code actually does. One of the potential causes are "atoms of confusion", the smallest possible patterns of misinterpretable source code. Previous research has investigated the impact of atoms of confusion in C code. Results show that developers make significantly more mistakes in code where atoms are pr… ▽ More Although writing code seems trivial at times, problems arise when humans misinterpret what the code actually does. One of the potential causes are "atoms of confusion", the smallest possible patterns of misinterpretable source code. Previous research has investigated the impact of atoms of confusion in C code. Results show that developers make significantly more mistakes in code where atoms are present. In this paper, we replicate the work of Gopstein et al. to the Java language. After deriving a set of atoms of confusion for Java, we perform a two-phase experiment with 132 computer science students (i.e., novice developers). Our results show that participants are 2.7 up to 56 times more likely to make mistakes in code snippets affected by 7 out of the 14 studied atoms of confusion, and when faced with both versions of the code snippets, participants perceived the version affected by the atom of confusion to be more confusing and/or less readable in 10 out of the 14 studied atoms of confusion. △ Less

Submitted 10 March, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

arXiv:2103.04146 [pdf, other]

The Prevalence of Code Smells in Machine Learning projects

Authors: Bart van Oort, Luís Cruz, Maurício Aniche, Arie van Deursen

Abstract: Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of software engineering experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding stan… ▽ More Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of software engineering experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. Our research set out to discover the most prevalent code smells in ML projects. We gathered a dataset of 74 open-source ML projects, installed their dependencies and ran Pylint on them. This resulted in a top 20 of all detected code smells, per category. Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation. More interestingly, however, we found several major obstructions to the maintainability and reproducibility of ML projects, primarily related to the dependency management of Python projects. We also found that Pylint cannot reliably check for correct usage of imported dependencies, including prominent ML libraries such as PyTorch. △ Less

Submitted 6 March, 2021; originally announced March 2021.

Comments: Submitted and accepted to 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN)

arXiv:2103.01783 [pdf, other]

How Developers Engineer Test Cases: An Observational Study

Authors: Maurício Aniche, Christoph Treude, Andy Zaidman

Abstract: One of the main challenges that developers face when testing their systems lies in engineering test cases that are good enough to reveal bugs. And while our body of knowledge on software testing and automated test case generation is already quite significant, in practice, developers are still the ones responsible for engineering test cases manually. Therefore, understanding the developers' thought… ▽ More One of the main challenges that developers face when testing their systems lies in engineering test cases that are good enough to reveal bugs. And while our body of knowledge on software testing and automated test case generation is already quite significant, in practice, developers are still the ones responsible for engineering test cases manually. Therefore, understanding the developers' thought- and decision-making processes while engineering test cases is a fundamental step in making developers better at testing software. In this paper, we observe 13 developers thinking-aloud while testing different real-world open-source methods, and use these observations to explain how developers engineer test cases. We then challenge and augment our main findings by surveying 72 software developers on their testing practices. We discuss our results from three different angles. First, we propose a general framework that explains how developers reason about testing. Second, we propose and describe in detail the three different overarching strategies that developers apply when testing. Third, we compare and relate our observations with the existing body of knowledge and propose future studies that would advance our knowledge on the topic. △ Less

Submitted 6 November, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2103.01755 [pdf, other]

An Exploratory Study of Log Placement Recommendation in an Enterprise System

Authors: Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen

Abstract: Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log pl… ▽ More Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log placement despite the data imbalance since logging is a fraction of the overall code base. However, it remains unknown how those techniques apply to an industry setting, and little is known about the effect of imbalanced data and sampling techniques. In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company. We analyze 34,526 Java files and 309,527 methods that sum up +2M SLOC. We systematically measure the effectiveness of five models based on code metrics, explore the effect of sampling techniques, understand which features models consider to be relevant for the prediction, and evaluate whether we can exploit 388,086 methods from 29 Apache projects to learn where to log in an industry setting. Our best performing model achieves 79% of balanced accuracy, 81% of precision, 60% of recall. While sampling techniques improve recall, they penalize precision at a prohibitive cost. Experiments with open-source data yield under-performing models over Adyen's test set; nevertheless, they are useful due to their low rate of false positives. Our supporting scripts and tools are available to the community. △ Less

Submitted 10 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.12429 [pdf, other]

Learning Off-By-One Mistakes: An Empirical Study

Authors: Hendrig Sellik, Onno van Paridon, Georgios Gousios, Maurício Aniche

Abstract: Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore t… ▽ More Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.00871 [pdf, other]

Automatically Identifying Parameter Constraints in Complex Web APIs: A Case Study at Adyen

Authors: Henk Grent, Aleksei Akimov, Maurício Aniche

Abstract: Web APIs may have constraints on parameters, such that not all parameters are either always required or always optional. Moreover, the presence or value of one parameter could cause another parameter to be required, or parameters could have restrictions on what kinds of values are valid. Having a clear overview of the constraints helps API consumers to integrate without the need for additional sup… ▽ More Web APIs may have constraints on parameters, such that not all parameters are either always required or always optional. Moreover, the presence or value of one parameter could cause another parameter to be required, or parameters could have restrictions on what kinds of values are valid. Having a clear overview of the constraints helps API consumers to integrate without the need for additional support and with fewer integration faults. We made use of existing documentation and code analysis approaches for identifying parameter constraints in complex web APIs. In this paper, we report our case study of several APIs at Adyen, a large-scale payment company that offers complex Web APIs to its customers. Our results show that the documentation- and code-based approach can identify 23% and 53% of the constraints respectively and, when combined, 68% of them. We also reflect on the current challenges that these approaches face. In particular, the absence of information that explicitly describes the constraints in the documentation (in the documentation analysis), and the engineering of a sound static code analyser that is sensitive to data-flow, maintains longer parameter references throughout the API's code, and that is able to symbolically execute the several libraries and frameworks used by the API (in the static analysis). △ Less

Submitted 1 February, 2021; originally announced February 2021.

Journal ref: Software Engineering in Practice of the 43rd International Conference on Software Engineering (ICSE-SEIP), 2021

arXiv:2102.00701 [pdf, other]

Search-Based Software Re-Modularization: A Case Study at Adyen

Authors: Casper Schröder, Adriaan van der Feltz, Annibale Panichella, Maurício Aniche

Abstract: Deciding what constitutes a single module, what classes belong to which module or the right set of modules for a specific software system has always been a challenging task. The problem is even harder in large-scale software systems composed of thousands of classes and hundreds of modules. Over the years, researchers have been proposing different techniques to support developers in re-modularizing… ▽ More Deciding what constitutes a single module, what classes belong to which module or the right set of modules for a specific software system has always been a challenging task. The problem is even harder in large-scale software systems composed of thousands of classes and hundreds of modules. Over the years, researchers have been proposing different techniques to support developers in re-modularizing their software systems. In particular, the search-based software re-modularization is an active research topic within the software engineering community for more than 20 years. This paper describes our efforts in applying search-based software re-modularization approaches at Adyen, a large-scale payment company. Adyen's code base has 5.5M+ lines of code, split into around hundreds of modules. We leveraged the existing body of knowledge in the field to devise our own search algorithm and applied it to our code base. Our results show that search-based approaches scale to large code bases as ours. Our algorithm can find solutions that improve the code base according to the metrics we optimize for, and developers see value in the recommendations. Based on our experiences, we then list a set of challenges and opportunities for future researchers, aiming at making search-based software re-modularization more efficient for large-scale software companies. △ Less

Submitted 9 April, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

Journal ref: Software Engineering in Practice of the 43rd International Conference on Software Engineering (ICSE-SEIP), 2021

arXiv:2005.12574 [pdf, other]

doi 10.1145/3368089.3409711

Selecting third-party libraries: The practitioners' perspective

Authors: Enrique Larios-Vargas, Maurício Aniche, Christoph Treude, Magiel Bruntink, Georgios Gousios

Abstract: The selection of third-party libraries is an essential element of virtually any software development project. However, deciding which libraries to choose is a challenging practical problem. Selecting the wrong library can severely impact a software project in terms of cost, time, and development effort, with the severity of the impact depending on the role of the library in the software architectu… ▽ More The selection of third-party libraries is an essential element of virtually any software development project. However, deciding which libraries to choose is a challenging practical problem. Selecting the wrong library can severely impact a software project in terms of cost, time, and development effort, with the severity of the impact depending on the role of the library in the software architecture, among others. Despite the importance of following a careful library selection process, in practice, the selection of third-party libraries is still conducted in an ad-hoc manner, where dozens of factors play an influential role in the decision. In this paper, we study the factors that influence the selection process of libraries, as perceived by industry developers. To that aim, we perform a cross-sectional interview study with 16 developers from 11 different businesses and survey 115 developers that are involved in the selection of libraries. We systematically devised a comprehensive set of 26 technical, human, and economic factors that developers take into consideration when selecting a software library. Eight of these factors are new to the literature. We explain each of these factors and how they play a role in the decision. Finally, we discuss the implications of our work to library maintainers, potential library users, package manager developers, and empirical software engineering researchers. △ Less

Submitted 9 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

arXiv:2001.03338 [pdf, other]

The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring

Authors: Maurício Aniche, Erick Maziero, Rafael Durelli, Vinicius Durelli

Abstract: Refactoring is the process of changing the internal structure of software to improve its quality without modifying its external behavior. Empirical studies have repeatedly shown that refactoring has a positive impact on the understandability and maintainability of software systems. However, before carrying out refactoring activities, developers need to identify refactoring opportunities. Currently… ▽ More Refactoring is the process of changing the internal structure of software to improve its quality without modifying its external behavior. Empirical studies have repeatedly shown that refactoring has a positive impact on the understandability and maintainability of software systems. However, before carrying out refactoring activities, developers need to identify refactoring opportunities. Currently, refactoring opportunity identification heavily relies on developers' expertise and intuition. In this paper, we investigate the effectiveness of machine learning algorithms in predicting software refactorings. More specifically, we train six different machine learning algorithms (i.e., Logistic Regression, Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, and Neural Network) with a dataset comprising over two million refactorings from 11,149 real-world projects from the Apache, F-Droid, and GitHub ecosystems. The resulting models predict 20 different refactorings at class, method, and variable-levels with an accuracy often higher than 90%. Our results show that (i) Random Forests are the best models for predicting software refactoring, (ii) process and ownership metrics seem to play a crucial role in the creation of better models, and (iii) models generalize well in different contexts. △ Less

Submitted 11 September, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: To appear in TSE

arXiv:1912.05878 [pdf, other]

Log-based software monitoring: a systematic mapping study

Authors: Jeanderson Barros Cândido, Maurício Finavaro Aniche, Arie van Deursen

Abstract: Modern software development and operations rely on monitoring to understand how systems behave in production. The data provided by application logs and runtime environment are essential to detect and diagnose undesired behavior and improve system reliability. However, despite the rich ecosystem around industry-ready log solutions, monitoring complex systems and getting insights from log data remai… ▽ More Modern software development and operations rely on monitoring to understand how systems behave in production. The data provided by application logs and runtime environment are essential to detect and diagnose undesired behavior and improve system reliability. However, despite the rich ecosystem around industry-ready log solutions, monitoring complex systems and getting insights from log data remains a challenge. Researchers and practitioners have been actively working to address several challenges related to logs, e.g., how to effectively provide better tooling support for logging decisions to developers, how to effectively process and store log data, and how to extract insights from log data. A holistic view of the research effort on logging practices and automated log analysis is key to provide directions and disseminate the state-of-the-art for technology transfer. In this paper, we study 108 papers (72 research track papers, 24 journals, and 12 industry track papers) from different communities (e.g., machine learning, software engineering, and systems) and structure the research field in light of the life-cycle of log data. Our analysis shows that (1) logging is challenging not only in open-source projects but also in industry, (2) machine learning is a promising approach to enable a contextual analysis of source code for log recommendation but further investigation is required to assess the usability of those tools in practice, (3) few studies approached efficient persistence of log data, and (4) there are open opportunities to analyze application logs and to evaluate state-of-the-art log analysis techniques in a DevOps context. △ Less

Submitted 5 March, 2021; v1 submitted 12 December, 2019; originally announced December 2019.

arXiv:1907.13365 [pdf, ps, other]

Comprehending Test Code: An Empirical Study

Authors: Chak Shun Yu, Christoph Treude, Maurício Aniche

Abstract: Developers spend a large portion of their time and effort on comprehending source code. While many studies have investigated how developers approach these comprehension tasks and what factors influence their success, less is known about how developers comprehend test code specifically, despite the undisputed importance of testing. In this paper, we report on the results of an empirical study with… ▽ More Developers spend a large portion of their time and effort on comprehending source code. While many studies have investigated how developers approach these comprehension tasks and what factors influence their success, less is known about how developers comprehend test code specifically, despite the undisputed importance of testing. In this paper, we report on the results of an empirical study with 44 developers to understand which factors influence developers when comprehending Java test code. We measured three dependent variables: the total time spent reading a test suite, the ability to identify the overall purpose of a test suite, and the ability to produce additional test cases to extend a test suite. The main findings of our study, with several implications for future research and practitioners, are that (i) prior knowledge of the software project decreases the total reading time, (ii) experience with Java affects the proportion of time spent on the Arrange and Assert sections of test cases, (iii) experience with Java and prior knowledge of the software project positively influence the ability to produce additional test cases of certain categories, and (iv) experience with automated tests is an influential factor towards understanding and extending an automated test suite. △ Less

Submitted 31 July, 2019; originally announced July 2019.

Comments: to appear as full paper at ICSME 2019, the 35th International Conference on Software Maintenance and Evolution

arXiv:1710.01943 [pdf, other]

Unusual Events in GitHub Repositories

Authors: Christoph Treude, Larissa Leite, Maurício Aniche

Abstract: In large and active software projects, it becomes impractical for a developer to stay aware of all project activity. While it might not be necessary to know about each commit or issue, it is arguably important to know about the ones that are unusual. To investigate this hypothesis, we identified unusual events in 200 GitHub projects using a comprehensive list of ways in which an artifact can be un… ▽ More In large and active software projects, it becomes impractical for a developer to stay aware of all project activity. While it might not be necessary to know about each commit or issue, it is arguably important to know about the ones that are unusual. To investigate this hypothesis, we identified unusual events in 200 GitHub projects using a comprehensive list of ways in which an artifact can be unusual and asked 140 developers responsible for or affected by these events to comment on the usefulness of the corresponding information. Based on 2,096 answers, we identify the subset of unusual events that developers consider particularly useful, including large code modifications and unusual amounts of reviewing activity, along with qualitative evidence on the reasons behind these answers. Our findings provide a means for reducing the amount of information that developers need to parse in order to stay up to date with development activity in their projects. △ Less

Submitted 30 April, 2018; v1 submitted 5 October, 2017; originally announced October 2017.

Comments: Accepted for publication in Journal of Systems and Software

Showing 1–16 of 16 results for author: Aniche, M