Skip to main content

Showing 1–50 of 60 results for author: Bavota, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.04464  [pdf, ps, other

    cs.SE cs.LG

    Leveraging Reward Models for Guiding Code Review Comment Generation

    Authors: Oussama Ben Sghaier, Rosalia Tufano, Gabriele Bavota, Houari Sahraoui

    Abstract: Code review is a crucial component of modern software development, involving the evaluation of code quality, providing feedback on potential issues, and refining the code to address identified problems. Despite these benefits, code review can be rather time consuming, and influenced by subjectivity and human factors. For these reasons, techniques to (partially) automate the code review process hav… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  2. arXiv:2504.06808  [pdf, ps, other

    cs.SE

    How do Copilot Suggestions Impact Developers' Frustration and Productivity?

    Authors: Emanuela Guglielmi, Venera Arnoudova, Gabriele Bavota, Rocco Oliveto, Simone Scalabrino

    Abstract: Context. AI-based development tools, such as GitHub Copilot, are transforming the software development process by offering real-time code suggestions. These tools promise to improve the productivity by reducing cognitive load and speeding up task completion. Previous exploratory studies, however, show that developers sometimes perceive the automatic suggestions as intrusive. As a result, they feel… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 8 pages

  3. arXiv:2503.14201  [pdf, other

    cs.SE

    Why Personalizing Deep Learning-Based Code Completion Tools Matters

    Authors: Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota

    Abstract: Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. I… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted for publication at ACM TOSEM

  4. arXiv:2503.11402  [pdf, other

    cs.SE

    Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation

    Authors: Cristina Improta, Rosalia Tufano, Pietro Liguori, Domenico Cotroneo, Gabriele Bavota

    Abstract: Deep Learning-based code generators have seen significant advancements in recent years. Tools such as GitHub Copilot are used by thousands of developers with the main promise of a boost in productivity. However, researchers have recently questioned their impact on code quality showing, for example, that code generated by DL-based tools may be affected by security vulnerabilities. Since DL models a… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted to the 33rd IEEE/ACM International Conference on Program Comprehension (ICPC 2025)

  5. arXiv:2503.09510  [pdf, other

    cs.SE

    Automating Code Review: A Systematic Literature Review

    Authors: Rosalia Tufano, Gabriele Bavota

    Abstract: Code Review consists in assessing the code written by teammates with the goal of increasing code quality. Empirical studies documented the benefits brought by such a practice that, however, has its cost to pay in terms of developers' time. For this reason, researchers have proposed techniques and tools to automate code review tasks such as the reviewers selection (i.e., identifying suitable review… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  6. arXiv:2503.08228  [pdf, other

    cs.SE cs.AI cs.CL cs.PF

    Investigating Execution-Aware Language Models for Code Optimization

    Authors: Federico Di Menna, Luca Traini, Gabriele Bavota, Vittorio Cortellessa

    Abstract: Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies effectively. Recent studies have shown that language models can play a significant role in automating code optimization. However, these models may have insuffici… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  7. arXiv:2503.07103  [pdf, other

    cs.SE

    Quantizing Large Language Models for Code Generation: A Differentiated Replication

    Authors: Alessandro Giagnorio, Antonio Mastropaolo, Saima Afrin, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language. The LLM effectiveness generally increases with its size: The higher the number of LLM's trainable parameters the better its ability to implement code. However, when it comes to deploying LLM-based code generators, larger LLMs… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  8. arXiv:2501.19085  [pdf, ps, other

    cs.SE

    Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

    Authors: Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota

    Abstract: The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages (i.e., niche programming languages characterized by the scarcity of training data), the limited availability of such data hampers the models' ability… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

    Comments: Accepted at ICPC'25

  9. arXiv:2501.05062  [pdf, other

    cs.SE

    Deep Learning-based Code Completion: On the Impact on Performance of Contextual Information

    Authors: Matteo Ciniselli, Luca Pascarella, Gabriele Bavota

    Abstract: Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We moved from predicting few code tokens to automatically generating entire functions. One important factor impacting the performance of DL-based code completion… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  10. arXiv:2501.05051  [pdf, other

    cs.SE

    On the Generalizability of Transformer Models to Code Completions of Different Lengths

    Authors: Nathan Cooper, Rosalia Tufano, Gabriele Bavota, Denys Poshyvanyk

    Abstract: The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related to code implementation (e.g., code completion) and comprehension (e.g., code summarization). Such a paradigm shift comes with a number of implications related to how software will be written, maintained, and evolved. Also, these LLMs are extremely expensive… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: Accepted for publication at ICSME 2024

  11. arXiv:2411.11401  [pdf, other

    cs.SE

    Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword?

    Authors: Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabić, Sonia Haiduc, Gabriele Bavota

    Abstract: Several techniques have been proposed to automate code review. Early support consisted in recommending the most suited reviewer for a given change or in prioritizing the review tasks. With the advent of deep learning in software engineering, the level of automation has been pushed to new heights, with approaches able to provide feedback on source code in natural language as a human reviewer would… ▽ More

    Submitted 29 November, 2024; v1 submitted 18 November, 2024; originally announced November 2024.

  12. arXiv:2409.18658  [pdf, other

    cs.SE

    SEART Data Hub: Streamlining Large-Scale Source Code Mining and Pre-Processing

    Authors: Ozren Dabić, Rosalia Tufano, Gabriele Bavota

    Abstract: Large-scale code datasets have acquired an increasingly central role in software engineering (SE) research. This is the result of (i) the success of the mining software repositories (MSR) community, that pushed the standards of empirical studies in SE; and (ii) the recent advent of deep learning (DL) in software engineering, with models trained and tested on large source code datasets. While there… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  13. arXiv:2409.11826  [pdf, other

    cs.SE

    A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems

    Authors: Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta

    Abstract: The development of Machine Learning (ML)- and, more recently, of Deep Learning (DL)-intensive systems requires suitable choices, e.g., in terms of technology, algorithms, and hyper-parameters. Such choices depend on developers' experience, as well as on proper experimentation. Due to limited time availability, developers may adopt suboptimal, sometimes temporary choices, leading to a technical deb… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Journal ref: Proceedings of the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME 2024)

  14. arXiv:2404.17896  [pdf, other

    cs.SE

    How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching

    Authors: Antonio Mastropaolo, Vittoria Nardone, Gabriele Bavota, Massimiliano Di Penta

    Abstract: Generative deep learning (DL) models have been successfully adopted for vulnerability patching. However, such models require the availability of a large dataset of patches to learn from. To overcome this issue, researchers have proposed to start from models pre-trained with general knowledge, either on the programming language or on similar tasks such as bug fixing. Despite the efforts in the area… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  15. arXiv:2403.15149  [pdf, other

    cs.SE

    On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions

    Authors: Matteo Ciniselli, Alberto Martin-Lopez, Gabriele Bavota

    Abstract: Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are often powered by deep learning (DL) models. However, the swift evolution of programming languages poses a critical challenge to the performance of DL-based code… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  16. arXiv:2402.16480  [pdf, other

    cs.SE

    Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study

    Authors: Rosalia Tufano, Antonio Mastropaolo, Federica Pepe, Ozren Dabić, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Large Language Models (LLMs) have gained significant attention in the software engineering community. Nowadays developers have the possibility to exploit these models through industrial-grade tools providing a handy interface toward LLMs, such as OpenAI's ChatGPT. While the potential of LLMs in assisting developers across several tasks has been documented in the literature, there is a lack of empi… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Paper accepted for publication at 21st International Conference on Mining Software Repositories (MASR'24)

  17. arXiv:2402.00519  [pdf, other

    cs.SE

    Towards Summarizing Code Snippets Using Pre-Trained Transformers

    Authors: Antonio Mastropaolo, Matteo Ciniselli, Luca Pascarella, Rosalia Tufano, Emad Aghajani, Gabriele Bavota

    Abstract: When comprehending code, a helping hand may come from the natural language comments documenting it that, unfortunately, are not always there. To support developers in such a scenario, several techniques have been presented to automatically generate natural language summaries for a given code. Most recent approaches exploit deep learning (DL) to automatically document classes or functions, while li… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  18. arXiv:2401.05136  [pdf, other

    cs.SE

    Code Review Automation: Strengths and Weaknesses of the State of the Art

    Authors: Rosalia Tufano, Ozren Dabić, Antonio Mastropaolo, Matteo Ciniselli, Gabriele Bavota

    Abstract: The automation of code review has been tackled by several researchers with the goal of reducing its cost. The adoption of deep learning in software engineering pushed the automation to new boundaries, with techniques imitating developers in generative tasks, such as commenting on a code change as a reviewer would do or addressing a reviewer's comment by modifying code. The performance of these tec… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  19. arXiv:2312.15475  [pdf, other

    cs.SE

    Evaluating Code Summarization Techniques: A New Metric and an Empirical Characterization

    Authors: Antonio Mastropaolo, Matteo Ciniselli, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Several code summarization techniques have been proposed in the literature to automatically document a code snippet or a function. Ideally, software developers should be involved in assessing the quality of the generated summaries. However, in most cases, researchers rely on automatic evaluation metrics such as BLEU, ROUGE, and METEOR. These metrics are all based on the same assumption: The higher… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  20. arXiv:2311.04587  [pdf, other

    cs.SE

    Log Statements Generation via Deep Learning: Widening the Support Provided to Developers

    Authors: Antonio Mastropaolo, Valentina Ferrari, Luca Pascarella, Gabriele Bavota

    Abstract: Logging assists in monitoring events that transpire during the execution of software. Previous research has highlighted the challenges confronted by developers when it comes to logging, including dilemmas such as where to log, what data to record, and which log level to employ (e.g., info, fatal). In this context, we introduced LANCE, an approach rooted in deep learning (DL) that has demonstrated… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  21. arXiv:2308.16774  [pdf, other

    cs.SE

    Toward Automatically Completing GitHub Workflows

    Authors: Antonio Mastropaolo, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta

    Abstract: Continuous integration and delivery (CI/CD) are nowadays at the core of software development. Their benefits come at the cost of setting up and maintaining the CI/CD pipeline, which requires knowledge and skills often orthogonal to those entailed in other software-related tasks. While several recommender systems have been proposed to support developers across a variety of tasks, little automated s… ▽ More

    Submitted 6 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

  22. arXiv:2308.08943  [pdf, other

    cs.SE cs.AI

    Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

    Authors: Antonio Mastropaolo, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e., the fact that software is released in a shape not as good as it should be, e.g., in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based ge… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  23. arXiv:2307.14749  [pdf, other

    cs.SE

    Using Gameplay Videos for Detecting Issues in Video Games

    Authors: Emanuela Guglielmi, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

    Abstract: Context. The game industry is increasingly growing in recent years. Every day, millions of people play video games, not only as a hobby, but also for professional competitions (e.g., e-sports or speed-running) or for making business by entertaining others (e.g., streamers). The latter daily produce a large amount of gameplay videos in which they also comment live what they experience. But no softw… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted at Empirical Software Engineering journal (EMSE). arXiv admin note: text overlap with arXiv:2204.04182

  24. arXiv:2303.15990  [pdf, other

    cs.SE

    Automatically Generating Dockerfiles via Deep Learning: Challenges and Promises

    Authors: Giovanni Rosa, Antonio Mastropaolo, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

    Abstract: Containerization allows developers to define the execution environment in which their software needs to be installed. Docker is the leading platform in this field, and developers that use it are required to write a Dockerfile for their software. Writing Dockerfiles is far from trivial, especially when the system has unusual requirements for its execution environment. Despite several tools exist to… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  25. arXiv:2302.04098  [pdf, other

    cs.SE

    Source Code Recommender Systems: The Practitioners' Perspective

    Authors: Matteo Ciniselli, Luca Pascarella, Emad Aghajani, Simone Scalabrino, Rocco Oliveto, Gabriele Bavota

    Abstract: The automatic generation of source code is one of the long-lasting dreams in software engineering research. Several techniques have been proposed to speed up the writing of new code. For example, code completion techniques can recommend to developers the next few tokens they are likely to type, while retrieval-based approaches can suggest code snippets relevant for the task at hand. Also, deep lea… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  26. arXiv:2302.04048  [pdf, other

    cs.SE cs.LG

    Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

    Authors: Rosalia Tufano, Luca Pascarella, Gabriele Bavota

    Abstract: Transformers have gained popularity in the software engineering (SE) literature. These deep learning models are usually pre-trained through a self-supervised objective, meant to provide the model with basic knowledge about a language of interest (e.g., Java). A classic pre-training objective is the masked language model (MLM), in which a percentage of tokens from the input (e.g., a Java method) is… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: Paper accepted at ICSE'23

  27. arXiv:2302.00438  [pdf, other

    cs.SE

    On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

    Authors: Antonio Mastropaolo, Luca Pascarella, Emanuela Guglielmi, Matteo Ciniselli, Simone Scalabrino, Rocco Oliveto, Gabriele Bavota

    Abstract: Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step forward, also because of its unprecedented ability to automatically generate even entire functions from their natural language description. While the usefulness of C… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  28. arXiv:2212.05738  [pdf, other

    cs.SE

    Automated Variable Renaming: Are We There Yet?

    Authors: Antonio Mastropaolo, Emad Aghajani, Luca Pascarella, Gabriele Bavota

    Abstract: Identifiers, such as method and variable names, form a large portion of source code. Therefore, low-quality identifiers can substantially hinder code comprehension. To support developers in using meaningful identifiers, several (semi-)automatic techniques have been proposed, mostly being data-driven (e.g. statistical language models, deep learning models) or relying on static code analysis. Still,… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  29. arXiv:2208.07624  [pdf, other

    cs.SE

    Don't Reinvent the Wheel: Towards Automatic Replacement of Custom Implementations with APIs

    Authors: Rosalia Tufano, Emad Aghajani, Gabriele Bavota

    Abstract: Reusing code is a common practice in software development: It helps developers speedup the implementation task while also reducing the chances of introducing bugs, given the assumption that the reused code has been tested, possibly in production. Despite these benefits, opportunities for reuse are not always in plain sight and, thus, developers may miss them. We present our preliminary steps in bu… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  30. Detecting Connectivity Issues in Android Apps

    Authors: Alejandro Mazuera-Rozo, Camilo Escobar-Velásquez, Juan Espitia-Acero, Mario Linares-Vásquez, Gabriele Bavota

    Abstract: Android is the most popular mobile operating system in the world, running on more than 70% of mobile devices. This implies a gigantic and very competitive market for Android apps. Being successful in such a market is far from trivial and requires, besides the tackling of a problem or need felt by a vast audience, the development of high-quality apps. As recently showed in the literature, connectiv… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in SANER 2022

  31. arXiv:2206.08574  [pdf, other

    cs.SE

    Using Transfer Learning for Code-Related Tasks

    Authors: Antonio Mastropaolo, Nathan Cooper, David Nader Palacio, Simone Scalabrino, Denys Poshyvanyk, Rocco Oliveto, Gabriele Bavota

    Abstract: Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also thanks to the excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these models is to first pre-train them on a generic dataset using a self-supervised task (e.g… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.02017

  32. AI-driven Development Is Here: Should You Worry?

    Authors: Neil Ernst, Gabriele Bavota

    Abstract: AI-Driven Development Environments (AIDEs) Integrate the power of modern AI into IDEs like Visual Studio Code and JetBrains IntelliJ. By leveraging massive language models and the plethora of openly available source code, AIDEs promise to automate many of the obvious, routine tasks in programming. At the same time, AIDEs come with new challenges to think about, such as bias, legal compliance, secu… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Journal ref: in IEEE Software, vol. 39, no. 2, pp. 106-110, March-April 2022

  33. arXiv:2204.06894  [pdf, other

    cs.SE

    To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?

    Authors: Matteo Ciniselli, Luca Pascarella, Gabriele Bavota

    Abstract: Deep Learning (DL) models have been widely used to support code completion. These models, once properly trained, can take as input an incomplete code component (e.g., an incomplete function) and predict the missing tokens to finalize it. GitHub Copilot is an example of code recommender built by training a DL model on millions of open source repositories: The source code of these repositories acts… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  34. arXiv:2204.04182  [pdf, other

    cs.SE

    Towards Using Gameplay Videos for Detecting Issues in Video Games

    Authors: Emanuela Guglielmi, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

    Abstract: Context. The game industry is increasingly growing in recent years. Every day, millions of people play video games, not only as a hobby, but also for professional competitions (e.g., e-sports or speed-running) or for making business by entertaining others (e.g., streamers). The latter daily produce a large amount of gameplay videos in which they also comment live what they experience. Since no sof… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at MSR 2022 Registered Reports Track

  35. arXiv:2201.11807  [pdf, other

    cs.CR cs.SE

    Taxonomy of Security Weaknesses in Java and Kotlin Android Apps

    Authors: Alejandro Mazuera-Rozo, Camilo Escobar-Velásquez, Juan Espitia-Acero, David Vega-Guzmán, Catia Trubiani, Mario Linares-Vásquez, Gabriele Bavota

    Abstract: Android is nowadays the most popular operating system in the world, not only in the realm of mobile devices, but also when considering desktop and laptop computers. Such a popularity makes it an attractive target for security attacks, also due to the sensitive information often manipulated by mobile apps. The latter are going through a transition in which the Android ecosystem is moving from the u… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted to JSS journal

  36. arXiv:2201.06865  [pdf, other

    cs.SE

    Using Reinforcement Learning for Load Testing of Video Games

    Authors: Rosalia Tufano, Simone Scalabrino, Luca Pascarella, Emad Aghajani, Rocco Oliveto, Gabriele Bavota

    Abstract: Different from what happens for most types of software systems, testing video games has largely remained a manual activity performed by human testers. This is mostly due to the continuous and intelligent user interaction video games require. Recently, reinforcement learning (RL) has been exploited to partially automate functional testing. RL enables training smart agents that can even achieve supe… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

    Comments: accepted for publication at ICSE 2022

  37. arXiv:2201.06850  [pdf, other

    cs.SE

    Using Pre-Trained Models to Boost Code Review Automation

    Authors: Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, Gabriele Bavota

    Abstract: Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently proposed Deep Learning (DL) models targeting the automation of two tasks: the first model takes as input a code submitted for review and implements in it changes like… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

    Comments: Accepted for publication at ICSE 2022

  38. arXiv:2201.04837  [pdf, other

    cs.SE

    Using Deep Learning to Generate Complete Log Statements

    Authors: Antonio Mastropaolo, Luca Pascarella, Gabriele Bavota

    Abstract: Logging is a practice widely adopted in several phases of the software lifecycle. For example, during software development log statements allow engineers to verify and debug the system by exposing fine-grained information of the running software. While the benefits of logging are undisputed, taking proper decisions about where to inject log statements, what information to log, and at which log lev… ▽ More

    Submitted 14 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  39. Studying Eventual Connectivity Issues in Android Apps

    Authors: Camilo Escobar-Velásquez, Alejandro Mazuera-Rozo, Claudia Bedoya, Michael Osorio-Riaño, Mario Linares-Vásquez, Gabriele Bavota

    Abstract: Mobile apps have become indispensable for daily life, not only for individuals but also for companies/organizations that offer their services digitally. Inherited by the mobility of devices, there are no limitations regarding the locations or conditions in which apps are being used. For example, apps can be used where no internet connection is available. Therefore, offline-first is a highly desire… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 41 pages, accepted to EMSE journal

  40. An Empirical Study on the Usage of Transformer Models for Code Completion

    Authors: Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Antonio Mastropaolo, Emad Aghajani, Denys Poshyvanyk, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenario of predicting the next token to type, with few e… ▽ More

    Submitted 18 November, 2021; v1 submitted 3 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2103.07115

  41. arXiv:2107.10544  [pdf, other

    cs.SE

    An Empirical Study on Code Comment Completion

    Authors: Antonio Mastropaolo, Emad Aghajani, Luca Pascarella, Gabriele Bavota

    Abstract: Code comments play a prominent role in program comprehension activities. However, source code is not always documented and code and comments not always co-evolve. To deal with these issues, researchers have proposed techniques to automatically generate comments documenting a given code at hand. The most recent works in the area applied deep learning (DL) techniques to support such a task. Despite… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the 37th International Conference on Software Maintenance and Evolution (ICSME 2021)

  42. arXiv:2103.11940  [pdf, other

    cs.SE

    Shallow or Deep? An Empirical Study on Detecting Vulnerabilities using Deep Learning

    Authors: Alejandro Mazuera-Rozo, Anamaria Mojica-Hanke, Mario Linares-Vásquez, Gabriele Bavota

    Abstract: Deep learning (DL) techniques are on the rise in the software engineering research community. More and more approaches have been developed on top of DL models, also due to the unprecedented amount of software-related data that can be used to train these models. One of the recent applications of DL in the software engineering domain concerns the automatic detection of software vulnerabilities. Whil… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

  43. arXiv:2103.07115  [pdf, other

    cs.SE

    An Empirical Study on the Usage of BERT Models for Code Completion

    Authors: Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Denys Poshyvanyk, Massimiliano Di Penta, Gabriele Bavota

    Abstract: Code completion is one of the main features of modern Integrated Development Environments (IDEs). Its objective is to speed up code writing by predicting the next code token(s) the developer is likely to write. Research in this area has substantially bolstered the predictive performance of these techniques. However, the support to developers is still limited to the prediction of the next few token… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: Accepted to the 18th International Conference on Mining Software Repositories (MSR 2021)

  44. arXiv:2103.04682  [pdf, other

    cs.SE

    Sampling Projects in GitHub for MSR Studies

    Authors: Ozren Dabic, Emad Aghajani, Gabriele Bavota

    Abstract: Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of the subject software repositories. These repositories are usually collected from hosting services like GitHub using specific selection criteria dictated by the study goal. For example, a study related to licensing might be interested in selecting projects explicitly declaring a license. Once the selecti… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted to the 18th International Conference on Mining Software Repositories (MSR 2021)

  45. arXiv:2103.04586  [pdf, other

    cs.SE

    Siri, Write the Next Method

    Authors: Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, Gabriele Bavota

    Abstract: Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  46. arXiv:2102.03300  [pdf, other

    cs.SE

    Evaluating SZZ Implementations Through a Developer-informed Oracle

    Authors: Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, Rocco Oliveto

    Abstract: The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  47. arXiv:2102.02017  [pdf, other

    cs.SE

    Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

    Authors: Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, Gabriele Bavota

    Abstract: Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance fo… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  48. arXiv:2101.02518  [pdf, other

    cs.SE

    Towards Automating Code Review Activities

    Authors: Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, Gabriele Bavota

    Abstract: Code reviews are popular in both industrial and open source projects. The benefits of code reviews are widely recognized and include better code quality and lower likelihood of introducing bugs. However, since code review is a manual activity it comes at the cost of spending developers' time on reviewing their teammates' code. Our goal is to make the first step towards partially automating the c… ▽ More

    Submitted 19 May, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  49. Why Developers Refactor Source Code: A Mining-based Study

    Authors: Jevgenija Pantiuchina, Fiorella Zampetti, Simone Scalabrino, Valentina Piantadosi, Rocco Oliveto, Gabriele Bavota, Massimiliano Di Penta

    Abstract: Refactoring aims at improving code non-functional attributes without modifying its external behavior. Previous studies investigated the motivations behind refactoring by surveying developers. With the aim of generalizing and complementing their findings, we present a large-scale study quantitatively and qualitatively investigating why developers perform refactoring in open source projects. First,… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

    Comments: Accepted to the ACM Transactions on Software Engineering and Methodology

  50. arXiv:2009.13113  [pdf, other

    cs.SE

    Automated Identification of On-hold Self-admitted Technical Debt

    Authors: Rungroj Maipradit, Bin Lin, Csaba Nagy, Gabriele Bavota, Michele Lanza, Hideaki Hata, Kenichi Matsumoto

    Abstract: Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of "technical debt", a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SAT… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 11 pages, 20th IEEE International Working Conference on Source Code Analysis and Manipulation