Skip to main content

Showing 1–28 of 28 results for author: Tufano, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.01821  [pdf, other

    cs.SE cs.AI

    Agentic Bug Reproduction for Effective Automated Program Repair at Google

    Authors: Runxiang Cheng, Michele Tufano, Jürgen Cito, José Cambronero, Pat Rondon, Renyao Wei, Aaron Sun, Satish Chandra

    Abstract: Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been resolved, are crucial for debugging, but they are rarely included in bug reports, both in open-source and in industrial settings. Thus, automatically generating BRTs from bug reports has the potential t… ▽ More

    Submitted 10 March, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  2. arXiv:2501.07531  [pdf, other

    cs.SE cs.AI

    Evaluating Agent-based Program Repair at Google

    Authors: Pat Rondon, Renyao Wei, José Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra

    Abstract: Agent-based program repair offers to automatically resolve complex bugs end-to-end by combining the planning, tool use, and code generation abilities of modern LLMs. Recent work has explored the use of agent-based repair approaches on the popular open-source SWE-Bench, a collection of bugs from highly-rated GitHub Python projects. In addition, various agentic approaches such as SWE-Agent have been… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  3. arXiv:2412.14308   

    cs.SE cs.LG

    Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

    Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we pr… ▽ More

    Submitted 6 January, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: This work was intended as a replacement of arXiv:2310.02368 and any subsequent updates will appear there

  4. arXiv:2403.08299  [pdf, other

    cs.SE cs.AI

    AutoDev: Automated AI-Driven Development

    Authors: Michele Tufano, Anisha Agarwal, Jinu Jang, Roshanak Zilouchian Moghaddam, Neel Sundaresan

    Abstract: The landscape of software development has witnessed a paradigm shift with the advent of AI-powered assistants, exemplified by GitHub Copilot. However, existing solutions are not leveraging all the potential capabilities available in an IDE such as building, testing, executing code, git operations, etc. Therefore, they are constrained by their limited capabilities, primarily focusing on suggesting… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2402.14261  [pdf, other

    cs.SE cs.AI

    Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

    Authors: Anisha Agarwal, Aaron Chan, Shubham Chandel, Jinu Jang, Shaun Miller, Roshanak Zilouchian Moghaddam, Yevhen Mohylevskyy, Neel Sundaresan, Michele Tufano

    Abstract: The integration of Large Language Models (LLMs) into Development Environments (IDEs) has become a focal point in modern software development. LLMs such as OpenAI GPT-3.5/4 and Code Llama offer the potential to significantly augment developer productivity by serving as intelligent, chat-driven programming assistants. However, utilizing LLMs out of the box is unlikely to be optimal for any given sce… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  6. arXiv:2310.02368  [pdf, other

    cs.SE cs.LG

    Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

    Authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Software testing is a crucial aspect of software development, and the creation of high-quality tests that adhere to best practices is essential for effective maintenance. Recently, Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases. However, these LLMs are often trained on vast amounts of publicly available code, which may includ… ▽ More

    Submitted 6 January, 2025; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted to DeepTest 2025 (ICSE Workshop). Previously this version appeared as arXiv:2412.14308 which was submitted as a new work by accident

  7. arXiv:2307.13383  [pdf, other

    cs.SE cs.AI

    Predicting Code Coverage without Execution

    Authors: Michele Tufano, Shubham Chandel, Anisha Agarwal, Neel Sundaresan, Colin Clement

    Abstract: Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution with additional overhead for the instrumentation. Furthermore, computing coverage of any snippet of code requires the whole program context. Using Machine Learn… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  8. arXiv:2303.07263  [pdf, other

    cs.SE

    InferFix: End-to-End Program Repair with LLMs

    Authors: Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Software development life cycle is profoundly influenced by bugs: their introduction, identification, and eventual resolution account for a significant portion of software cost. This has motivated software engineering researchers and practitioners to propose different approaches for automating the identification and repair of software defects. Large language models have been adapted to the program… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  9. arXiv:2301.01224  [pdf, other

    cs.SE cs.AI cs.CV cs.LG

    An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

    Authors: Kevin Moran, Ali Yachnes, George Purnell, Junayed Mahmud, Michele Tufano, Carlos Bernal-Cárdenas, Denys Poshyvanyk, Zach H'Doubler

    Abstract: Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently e… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Published in the Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'22), Honolulu, Hawaii, March 15-18, 2022, pp. 514-525

  10. arXiv:2208.13928  [pdf, other

    cs.SE cs.CL cs.LG

    Exploring and Evaluating Personalized Models for Code Generation

    Authors: Andrei Zlotchevski, Dawn Drain, Alexey Svyatkovskiy, Colin Clement, Neel Sundaresan, Michele Tufano

    Abstract: Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular dow… ▽ More

    Submitted 19 September, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: Accepted to the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022), Industry Track - Singapore, November 14-18, 2022, to appear 9 pages

  11. Methods2Test: A dataset of focal methods mapped to test cases

    Authors: Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has emerged as viable approach to help software developers generate automated unit tests. However, generating reliable unit test cases that are semantically correct and capable of catching software bugs or un… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted for publication in the proceedings of The 2022 Mining Software Repositories Conference (MSR 2022) - Data and Tool track

  12. arXiv:2109.08780  [pdf, other

    cs.LG cs.SE

    Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

    Authors: Colin B. Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, Alexey Svyatkovskiy

    Abstract: Statistical language modeling and translation with transformers have found many successful applications in program understanding and generation tasks, setting high benchmarks for tools in modern software development environments. The finite context window of these neural models means, however, that they will be unable to leverage the entire relevant context of large files and packages for any give… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 camera ready

  13. arXiv:2102.04664  [pdf, other

    cs.SE cs.CL

    CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

    Authors: Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

    Abstract: Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems,… ▽ More

    Submitted 16 March, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: 14 pages; Revise CodeBLEU scores for all models on text-to-code task

  14. arXiv:2101.02518  [pdf, other

    cs.SE

    Towards Automating Code Review Activities

    Authors: Rosalia Tufano, Luca Pascarella, Michele Tufano, Denys Poshyvanyk, Gabriele Bavota

    Abstract: Code reviews are popular in both industrial and open source projects. The benefits of code reviews are widely recognized and include better code quality and lower likelihood of introducing bugs. However, since code review is a manual activity it comes at the cost of spending developers' time on reviewing their teammates' code. Our goal is to make the first step towards partially automating the c… ▽ More

    Submitted 19 May, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering (ICSE 2021)

  15. arXiv:2009.08366  [pdf, other

    cs.SE cs.CL

    GraphCodeBERT: Pre-training Code Representations with Data Flow

    Authors: Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou

    Abstract: Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code snippet as a sequence of tokens, while ignoring the inherent structure of code, which provides crucial code semantics and would enhance the code understanding pr… ▽ More

    Submitted 13 September, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted by ICLR2021

  16. Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers

    Authors: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Neel Sundaresan

    Abstract: Unit testing represents the foundational basis of the software testing pyramid, beneath integration and end-to-end testing. Automated software testing researchers have proposed a variety of techniques to assist developers in this time-consuming task. In this paper we present an approach to support developers in writing unit test cases by generating accurate and useful assert statements. Our approa… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

  17. arXiv:2009.05617  [pdf, other

    cs.SE cs.CL cs.LG

    Unit Test Case Generation with Transformers and Focal Context

    Authors: Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, Neel Sundaresan

    Abstract: Automated unit test case generation tools facilitate test-driven development and support developers by suggesting tests intended to identify flaws in their code. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult for developers to read or understand. In this paper we propose AthenaTest, an approach that aims to generate un… ▽ More

    Submitted 20 May, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

  18. On Learning Meaningful Assert Statements for Unit Test Cases

    Authors: Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, Denys Poshyvanyk

    Abstract: Software testing is an essential part of the software lifecycle and requires a substantial amount of time and effort. It has been estimated that software developers spend close to 50% of their time on testing the code they write. For these reasons, a long standing goal within the research community is to (partially) automate software testing. While several techniques and tools have been proposed t… ▽ More

    Submitted 18 February, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

  19. arXiv:2002.04760  [pdf, other

    cs.SE cs.CL cs.LG

    DeepMutation: A Neural Mutation Tool

    Authors: Michele Tufano, Jason Kimko, Shiya Wang, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Denys Poshyvanyk

    Abstract: Mutation testing can be used to assess the fault-detection capabilities of a given test suite. To this aim, two characteristics of mutation testing frameworks are of paramount importance: (i) they should generate mutants that are representative of real faults; and (ii) they should provide a complete tool chain able to automatically generate, inject, and test the mutants. To address the first point… ▽ More

    Submitted 12 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: Accepted to the 42nd ACM/IEEE International Conference on Software Engineering (ICSE 2020), Demonstrations Track - Seoul, South Korea, May 23-29, 2020, 4 pages

  20. arXiv:1901.09102  [pdf, other

    cs.SE cs.CL cs.LG

    On Learning Meaningful Code Changes via Neural Machine Translation

    Authors: Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, Denys Poshyvanyk

    Abstract: Recent years have seen the rise of Deep Learning (DL) techniques applied to source code. Researchers have exploited DL to automate several development and maintenance tasks, such as writing commit messages, generating comments and detecting vulnerabilities among others. One of the long lasting dreams of applying DL to source code is the possibility to automate non-trivial coding activities. While… ▽ More

    Submitted 25 January, 2019; originally announced January 2019.

    Comments: Accepted to the 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019) - Montreal, QC, Canada, May 25-31, 2019, 12 pages

  21. arXiv:1901.07142  [pdf, ps, other

    cs.SE

    Towards Predicting the Impact of Software Changes on Building Activities

    Authors: Michele Tufano, Hitesh Sajnani, Kim Herzig

    Abstract: The pervasive adoption of Continuous Integration practices -- both in industry and open source projects -- has led software building to become a daily activity for thousands of developers around the world. Companies such as Microsoft have invested in in-house infrastructures with the goal of optimizing the build process. CloudBuild, a distributed and caching build service developed internally by M… ▽ More

    Submitted 25 January, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: Accepted to the 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019), New Ideas and Emerging Results - Montreal, QC, Canada, May 25-31, 2019, to appear 4 pages

  22. arXiv:1901.01808  [pdf, other

    cs.SE cs.LG stat.ML

    SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

    Authors: Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, Martin Monperrus

    Abstract: This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 s… ▽ More

    Submitted 9 September, 2019; v1 submitted 24 December, 2018; originally announced January 2019.

    Comments: 21 pages, 15 figures

    Journal ref: IEEE Transactions on Software Engineering, 2019

  23. arXiv:1901.00891  [pdf, other

    cs.SE

    Guigle: A GUI Search Engine for Android Apps

    Authors: Carlos Bernal-Cardenas, Kevin Moran, Michele Tufano, Zichang Liu, Linyong Nan, Zhehan Shi, Denys Poshyvanyk

    Abstract: The process of developing a mobile application typically starts with the ideation and conceptualization of its user interface. This concept is then translated into a set of mock-ups to help determine how well the user interface embodies the intended features of the app. After the creation of mock-ups developers then translate it into an app that runs in a mobile device. In this paper we propose an… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: Accepted to 41st ACM/IEEE International Conference on Software Engineering, Formal Tool Demonstrations Track

  24. arXiv:1812.10772  [pdf, other

    cs.SE

    Learning How to Mutate Source Code from Bug-Fixes

    Authors: Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, Denys Poshyvanyk

    Abstract: Mutation testing has been widely accepted as an approach to guide test case generation or to assess the effectiveness of test suites. Empirical studies have shown that mutants are representative of real faults; yet they also indicated a clear need for better, possibly customized, mutation operators and strategies. While methods to devise domain-specific or general-purpose mutation operators from r… ▽ More

    Submitted 29 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

    Comments: Accepted to the 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019) - Cleveland, OH, USA, October 2-4, 2019, to appear 12 pages

  25. arXiv:1812.08693  [pdf, other

    cs.SE

    An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

    Authors: Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, Denys Poshyvanyk

    Abstract: Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. First, we… ▽ More

    Submitted 20 May, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: Accepted to the ACM Transactions on Software Engineering and Methodology

  26. MDroid+: A Mutation Testing Framework for Android

    Authors: Kevin Moran, Michele Tufano, Carlos Bernal-Cárdenas, Mario Linares-Vásquez, Gabriele Bavota, Christopher Vendome, Massimiliano Di Penta, Denys Poshyvanyk

    Abstract: Mutation testing has shown great promise in assessing the effectiveness of test suites while exhibiting additional applications to test-case generation, selection, and prioritization. Traditional mutation testing typically utilizes a set of simple language specific source code transformations, called operators, to introduce faults. However, empirical studies have shown that for mutation testing to… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

    Comments: 4 Pages, Accepted to the Formal Tool Demonstration Track at the 40th International Conference on Software Engineering (ICSE'18)

  27. Enabling Mutation Testing for Android Apps

    Authors: Mario Linares-Vásquez, Gabriele Bavota, Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, Denys Poshyvanyk

    Abstract: Mutation testing has been widely used to assess the fault-detection effectiveness of a test suite, as well as to guide test case generation or prioritization. Empirical studies have shown that, while mutants are generally representative of real faults, an effective application of mutation testing requires "traditional" operators designed for programming languages to be augmented with operators spe… ▽ More

    Submitted 31 July, 2017; v1 submitted 27 July, 2017; originally announced July 2017.

    Comments: Accepted at 11TH Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 17)

  28. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities

    Authors: Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, Denys Poshyvanyk

    Abstract: In the field of automated program repair, the redundancy assumption claims large programs contain the seeds of their own repair. However, most redundancy-based program repair techniques do not reason about the repair ingredients---the code that is reused to craft a patch. We aim to reason about the repair ingredients by using code similarities to prioritize and transform statements in a codebase f… ▽ More

    Submitted 30 December, 2018; v1 submitted 15 July, 2017; originally announced July 2017.

    Comments: camera-ready paper for SANER 2019

    Journal ref: Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering, 2019