-
Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks
Authors:
Aayush Kumar,
Yasharth Bajpai,
Sumit Gulwani,
Gustavo Soares,
Emerson Murphy-Hill
Abstract:
Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication cha…
▽ More
Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication challenges that arise in such interactions, we observed 19 developers using an in-IDE agent to resolve 33 open issues in repositories to which they had previously contributed. Participants successfully resolved about half of these issues, with participants solving issues incrementally having greater success than those using a one-shot approach. Participants who actively collaborated with the agent and iterated on its outputs were also more successful, though they faced challenges in trusting the agent's responses and collaborating on debugging and testing. These results have implications for successful developer-agent collaborations, and for the design of more effective SWE agents.
△ Less
Submitted 17 June, 2025; v1 submitted 14 June, 2025;
originally announced June 2025.
-
TableTalk: Scaffolding Spreadsheet Development with a Language Agent
Authors:
Jenny T. Liang,
Aayush Kumar,
Yasharth Bajpai,
Sumit Gulwani,
Vu Le,
Chris Parnin,
Arjun Radhakrishna,
Ashish Tiwari,
Emerson Murphy-Hill,
Guastavo Soares
Abstract:
Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan,…
▽ More
Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan, use tools, and take iterative actions to complete complex tasks. These agents observe, plan, and act, making them well-suited to scaffold spreadsheet programming by following expert processes.
We present TableTalk, a language agent that helps programmers build spreadsheets conversationally. Its design reifies three design principles -- scaffolding, flexibility, and incrementality -- which we derived from two studies of seven programmers and 62 Excel templates. TableTalk structures spreadsheet development by generating step-by-step plans and suggesting three next steps users can choose from. It also integrates tools that enable incremental spreadsheet construction. A user study with 20 programmers shows that TableTalk produces spreadsheets 2.3 times more likely to be preferred over a baseline agent, while reducing cognitive load and time spent reasoning about spreadsheet actions by 12.6%. TableTalk's approach has implications for human-agent collaboration. This includes providing persistent direct manipulation interfaces for stopping or undoing agent actions, while ensuring that such interfaces for accepting actions can be deactivated.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Automatic Web Security Unit Testing: XSS Vulnerability Detection
Authors:
Mahmoud Mohammadi,
Bill Chu,
Heather Richter Lipford,
Emerson Murphy-Hill
Abstract:
Integrating security testing into the workflow of software developers not only can save resources for separate security testing but also reduce the cost of fixing security vulnerabilities by detecting them early in the development cycle. We present an automatic testing approach to detect a common type of Cross Site Scripting (XSS) vulnerability caused by improper encoding of untrusted data. We aut…
▽ More
Integrating security testing into the workflow of software developers not only can save resources for separate security testing but also reduce the cost of fixing security vulnerabilities by detecting them early in the development cycle. We present an automatic testing approach to detect a common type of Cross Site Scripting (XSS) vulnerability caused by improper encoding of untrusted data. We automatically extract encoding functions used in a web application to sanitize untrusted inputs and then evaluate their effectiveness by automatically generating XSS attack strings. Our evaluations show that this technique can detect 0-day XSS vulnerabilities that cannot be found by static analysis tools. We will also show that our approach can efficiently cover a common type of XSS vulnerability. This approach can be generalized to test for input validation against other types injections such as command line injection.
△ Less
Submitted 2 April, 2018;
originally announced April 2018.
-
How the Sando Search Tool Recommends Queries
Authors:
Xi Ge,
David Shepherd,
Kostadin Damevski,
Emerson Murphy-Hill
Abstract:
Developers spend a significant amount of time searching their local codebase. To help them search efficiently, researchers have proposed novel tools that apply state-of-the-art information retrieval algorithms to retrieve relevant code snippets from the local codebase. However, these tools still rely on the developer to craft an effective query, which requires that the developer is familiar with t…
▽ More
Developers spend a significant amount of time searching their local codebase. To help them search efficiently, researchers have proposed novel tools that apply state-of-the-art information retrieval algorithms to retrieve relevant code snippets from the local codebase. However, these tools still rely on the developer to craft an effective query, which requires that the developer is familiar with the terms contained in the related code snippets. Our empirical data from a state-of-the-art local code search tool, called Sando, suggests that developers are sometimes unacquainted with their local codebase. In order to bridge the gap between developers and their ever-increasing local codebase, in this paper we demonstrate the recommendation techniques integrated in Sando.
△ Less
Submitted 27 January, 2014;
originally announced January 2014.