Skip to main content

Showing 1–13 of 13 results for author: Buratti, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08311  [pdf, ps, other

    cs.SE cs.AI

    Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study

    Authors: Ira Ceka, Saurabh Pujar, Shyam Ramji, Luca Buratti, Gail Kaiser, Baishakhi Ray

    Abstract: With the advent of large language models (LLMs), software engineering agents (SWE agents) have emerged as a powerful paradigm for automating a range of software tasks -- from code generation and repair to test case synthesis. These agents operate autonomously by interpreting user input and responding to environmental feedback. While various agent architectures have demonstrated strong empirical pe… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  2. arXiv:2504.08696  [pdf, other

    cs.SE cs.LG

    SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow

    Authors: Timothy Bula, Saurabh Pujar, Luca Buratti, Mihaela Bornea, Avirup Sil

    Abstract: Auto-regressive LLM-based software engineering (SWE) agents, henceforth SWE agents, have made tremendous progress (>60% on SWE-Bench Verified) on real-world coding challenges including GitHub issue resolution. SWE agents use a combination of reasoning, environment interaction and self-reflection to resolve issues thereby generating "trajectories". Analysis of SWE agent trajectories is difficult, n… ▽ More

    Submitted 14 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures

  3. arXiv:2406.14712  [pdf, ps, other

    quant-ph cs.AI

    Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models

    Authors: Sanjay Vishwakarma, Francis Harkins, Siddharth Golecha, Vishal Sharathchandra Bajpe, Nicolas Dupuis, Luca Buratti, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito

    Abstract: Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of quantum computing necessitates new tools to streamline this development process, and one such tool could be Generative Artificial intelligence (GenAI). In this study, we introduce and use the Qiskit HumanEval dataset, a hand-curated collection of tasks designed to benchmark the ability… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2405.19495  [pdf, ps, other

    quant-ph cs.AI

    Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

    Authors: Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito

    Abstract: Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ signifi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2402.17442  [pdf, other

    cs.SE cs.AI cs.PL

    Insights from the Usage of the Ansible Lightspeed Code Completion Service

    Authors: Priyam Sahoo, Saurabh Pujar, Ganesh Nalawade, Richard Gebhardt, Louis Mandel, Luca Buratti

    Abstract: The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which developers use to write software are often used as an interface to interact with LLMs. Although many such tools have been released, almost all of them focus on general-purpose programming languages. Dom… ▽ More

    Submitted 22 October, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: This paper has been published at the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024), Industry Showcase under the title "Ansible Lightspeed: A Code Generation Service for IT Automation"

  6. arXiv:2310.16937  [pdf, ps, other

    cs.CL

    Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study

    Authors: Razan Baltaji, Saurabh Pujar, Louis Mandel, Martin Hirzel, Luca Buratti, Lav Varshney

    Abstract: Large language models (LLMs) have achieved state-of-the-art performance in various software engineering tasks, including error detection, clone detection, and code translation, primarily leveraging high-resource programming languages like Python and Java. However, many critical languages, such as COBOL, as well as emerging languages, such as Rust and Swift, remain low-resource due to limited openl… ▽ More

    Submitted 10 June, 2025; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Published in Transactions on Machine Learning Research (06/2025) 26 pages, 5 figures, 10 tables

    ACM Class: I.2.7; I.2.5

  7. arXiv:2310.14053  [pdf, other

    cs.LG cs.CL cs.SE

    Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

    Authors: Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates the performance of Code LLMs on a set of individual tasks, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications f… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

    MSC Class: 68 ACM Class: I.2; D.2

  8. arXiv:2306.03234  [pdf, other

    cs.SE

    CONCORD: Clone-aware Contrastive Learning for Source Code

    Authors: Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, Baishakhi Ray

    Abstract: Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Camera-ready for 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 23)

  9. arXiv:2305.02783  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.PL

    Automated Code generation for Information Technology Tasks in YAML through Large Language Models

    Authors: Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matthew Jones, Alessandro Morari, Ruchir Puri

    Abstract: The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as the ones used for IT Automation, have received far less attention, despite involving many active developers and being an essential component of modern cloud platforms. This work focuses on the generation of Ans… ▽ More

    Submitted 23 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  10. arXiv:2110.03868  [pdf, other

    cs.PL cs.AI cs.LG cs.SE

    Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

    Authors: Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

    Abstract: Understanding the functional (dis)-similarity of source code is significant for code modeling tasks such as software vulnerability and code clone detection. We present DISCO(DIS-similarity of COde), a novel self-supervised model focusing on identifying (dis)similar functionalities of source code. Different from existing works, our approach does not require a huge amount of randomly collected datas… ▽ More

    Submitted 20 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Camera-Ready

  11. arXiv:2105.12655  [pdf, other

    cs.SE cs.AI

    CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

    Authors: Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss

    Abstract: Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled numerous breakthroughs,… ▽ More

    Submitted 29 August, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: 22 pages including references

  12. arXiv:2102.07995  [pdf, other

    cs.SE cs.AI cs.LG

    D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis

    Authors: Yunhui Zheng, Saurabh Pujar, Burn Lewis, Luca Buratti, Edward Epstein, Bo Yang, Jim Laredo, Alessandro Morari, Zhong Su

    Abstract: Static analysis tools are widely used for vulnerability detection as they understand programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to understand programming languages opens new possibilities when applied to static analysis. However, exist… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '21)

  13. arXiv:2006.12641  [pdf, ps, other

    cs.CL cs.LG cs.PL

    Exploring Software Naturalness through Neural Language Models

    Authors: Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, Giacomo Domeniconi

    Abstract: The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks. Present approaches to code analysis depend heavily on features derived from the Abstract Syntax Tree (AST) while our trans… ▽ More

    Submitted 24 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.