-
What do professional software developers need to know to succeed in an age of Artificial Intelligence?
Authors:
Matthew Kam,
Cody Miller,
Miaoxin Wang,
Abey Tidwell,
Irene A. Lee,
Joyce Malyn-Smith,
Beatriz Perez,
Vikram Tiwari,
Joshua Kenitzer,
Andrew Macvean,
Erin Barrar
Abstract:
Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From…
▽ More
Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From all of these, we distilled our findings in the form of 5 insights. We found that the skills & knowledge to be a successful AI-enhanced developer are organized into four domains (using Generative AI effectively, core software engineering, adjacent engineering, and adjacent non-engineering) deployed at critical junctures throughout a 6-step task workflow. In order to "future proof" developers for this age of AI, on-the-job learning initiatives and computer science degree programs will need to target both "soft" skills and the technical skills & knowledge in all four domains to reskill, upskill and safeguard against deskilling.
△ Less
Submitted 23 June, 2025; v1 submitted 30 May, 2025;
originally announced June 2025.
-
Responsive Parallelism with Synchronization
Authors:
Stefan K. Muller,
Kyle Singer,
Devyn Terra Keeney,
Andrew Neth,
Kunal Agrawal,
I-Ting Angelina Lee,
Umut A. Acar
Abstract:
Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority…
▽ More
Many concurrent programs assign priorities to threads to improve responsiveness. When used in conjunction with synchronization mechanisms such as mutexes and condition variables, however, priorities can lead to priority inversions, in which high-priority threads are delayed by low-priority ones. Priority inversions in the use of mutexes are easily handled using dynamic techniques such as priority inheritance, but priority inversions in the use of condition variables are not well-studied and dynamic techniques are not suitable.
In this work, we use a combination of static and dynamic techniques to prevent priority inversion in code that uses mutexes and condition variables. A type system ensures that condition variables are used safely, even while dynamic techniques change thread priorities at runtime to eliminate priority inversions in the use of mutexes. We prove the soundness of our system, using a model of priority inversions based on cost models for parallel programs. To show that the type system is practical to implement, we encode it within the type systems of Rust and C++, and show that the restrictions are not overly burdensome by writing sizeable case studies using these encodings, including porting the Memcached object server to use our C++ implementation.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Responsive Parallelism with Futures and State
Authors:
Stefan K. Muller,
Kyle Singer,
Noah Goldstein,
Umut A. Acar,
Kunal Agrawal,
I-Ting Angelina Lee
Abstract:
Motivated by the increasing shift to multicore computers, recent work has developed language support for responsive parallel applications that mix compute-intensive tasks with latency-sensitive, usually interactive, tasks. These developments include calculi that allow assigning priorities to threads, type systems that can rule out priority inversions, and accompanying cost models for predicting re…
▽ More
Motivated by the increasing shift to multicore computers, recent work has developed language support for responsive parallel applications that mix compute-intensive tasks with latency-sensitive, usually interactive, tasks. These developments include calculi that allow assigning priorities to threads, type systems that can rule out priority inversions, and accompanying cost models for predicting responsiveness. These advances share one important limitation: all of this work assumes purely functional programming. This is a significant restriction, because many realistic interactive applications, from games to robots to web servers, use mutable state, e.g., for communication between threads.
In this paper, we lift the restriction concerning the use of state. We present $λ_i^4$, a calculus with implicit parallelism in the form of prioritized futures and mutable state in the form of references. Because both futures and references are first-class values, $λ_i^4$ programs can exhibit complex dependencies, including interaction between threads and with the external world (users, network, etc). To reason about the responsiveness of $λ_i^4$ programs, we extend traditional graph-based cost models for parallelism to account for dependencies created via mutable state, and we present a type system to outlaw priority inversions that can lead to unbounded blocking. We show that these techniques are practical by implementing them in C++ and present an empirical evaluation.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Reduced I/O Latency with Futures
Authors:
Kyle Singer,
Kunal Agrawal,
I-Ting Angelina Lee
Abstract:
Task parallelism research has traditionally focused on optimizing computation-intensive applications. Due to the proliferation of commodity parallel processors, there has been recent interest in supporting interactive applications. Such interactive applications frequently rely on I/O operations that may incur significant latency. In order to increase performance, when a particular thread of contro…
▽ More
Task parallelism research has traditionally focused on optimizing computation-intensive applications. Due to the proliferation of commodity parallel processors, there has been recent interest in supporting interactive applications. Such interactive applications frequently rely on I/O operations that may incur significant latency. In order to increase performance, when a particular thread of control is blocked on an I/O operation, ideally we would like to hide this latency by using the processing resources to do other ready work instead of blocking or spin waiting on this I/O. There has been limited prior work on hiding this latency and only one result that provides a theoretical bound for interactive applications that use I/Os. In this work, we propose a method for hiding the latency of I/O operations by using the futures abstraction. We provide a theoretical analysis of our algorithm that shows our algorithm provides better execution time guarantees than prior work. We also implemented the algorithm in a practically efficient prototype library that runs on top of the Cilk-F runtime, a runtime system that supports futures within the context of the Cilk Plus language, and performed experiments that demonstrate the efficiency of our implementation.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Efficient Race Detection with Futures
Authors:
Robert Utterback,
Kunal Agrawal,
Jeremy Fineman,
I-Ting Angelina Lee
Abstract:
This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we cons…
▽ More
This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that it can be race detected more efficiently than general use of futures.
Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time $O(T_1 α(m,n))$, where $T_1$ is the sequential running time of the program, $α$ is the inverse Ackermann's function, $m$ is the total number of memory accesses, $n$ is the dynamic count of places at which parallelism is created. Since $α$ is a very slowly growing function (upper bounded by $4$ for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ an extension of MultiBags that target programs with general use of futures. It runs in time $O((T_1+k^2)α(m,n))$ where $T_1$, $α$, $m$ and $n$ are defined as before, and $k$ is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle
Authors:
Justin Deters,
Jiaye Wu,
Yifan Xu,
I-Ting Angelina Lee
Abstract:
Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced…
▽ More
Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.
△ Less
Submitted 7 January, 2019; v1 submitted 28 June, 2018;
originally announced June 2018.