-
Generating Planning Feedback for Open-Ended Programming Exercises with LLMs
Authors:
Mehmet Arif Demirtaş,
Claire Zheng,
Max Fowler,
Kathryn Cunningham
Abstract:
To complete an open-ended programming exercise, students need to both plan a high-level solution and implement it using the appropriate syntax. However, these problems are often autograded on the correctness of the final submission through test cases, and students cannot get feedback on their planning process. Large language models (LLM) may be able to generate this feedback by detecting the overa…
▽ More
To complete an open-ended programming exercise, students need to both plan a high-level solution and implement it using the appropriate syntax. However, these problems are often autograded on the correctness of the final submission through test cases, and students cannot get feedback on their planning process. Large language models (LLM) may be able to generate this feedback by detecting the overall code structure even for submissions with syntax errors. To this end, we propose an approach that detects which high-level goals and patterns (i.e. programming plans) exist in a student program with LLMs. We show that both the full GPT-4o model and a small variant (GPT-4o-mini) can detect these plans with remarkable accuracy, outperforming baselines inspired by conventional approaches to code analysis. We further show that the smaller, cost-effective variant (GPT-4o-mini) achieves results on par with state-of-the-art (GPT-4o) after fine-tuning, creating promising implications for smaller models for real-time grading. These smaller models can be incorporated into autograders for open-ended code-writing exercises to provide feedback for students' implicit planning skills, even when their program is syntactically incorrect. Furthermore, LLMs may be useful in providing feedback for problems in other domains where students start with a set of high-level solution steps and iteratively compute the output, such as math and physics problems.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Counting the Trees in the Forest: Evaluating Prompt Segmentation for Classifying Code Comprehension Level
Authors:
David H. Smith IV,
Max Fowler,
Paul Denny,
Craig Zilles
Abstract:
Reading and understanding code are fundamental skills for novice programmers, and especially important with the growing prevalence of AI-generated code and the need to evaluate its accuracy and reliability. ``Explain in Plain English'' questions are a widely used approach for assessing code comprehension, but providing automated feedback, particularly on comprehension levels, is a challenging task…
▽ More
Reading and understanding code are fundamental skills for novice programmers, and especially important with the growing prevalence of AI-generated code and the need to evaluate its accuracy and reliability. ``Explain in Plain English'' questions are a widely used approach for assessing code comprehension, but providing automated feedback, particularly on comprehension levels, is a challenging task. This paper introduces a novel method for automatically assessing the comprehension level of responses to ``Explain in Plain English'' questions. Central to this is the ability to distinguish between two response types: multi-structural, where students describe the code line-by-line, and relational, where they explain the code's overall purpose. Using a Large Language Model (LLM) to segment both the student's description and the code, we aim to determine whether the student describes each line individually (many segments) or the code as a whole (fewer segments). We evaluate this approach's effectiveness by comparing segmentation results with human classifications, achieving substantial agreement. We conclude with how this approach, which we release as an open source Python package, could be used as a formative feedback mechanism.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
ReDefining Code Comprehension: Function Naming as a Mechanism for Evaluating Code Comprehension
Authors:
David H. Smith IV,
Max Fowler,
Paul Denny,
Craig Zilles
Abstract:
"Explain in Plain English" (EiPE) questions are widely used to assess code comprehension skills but are challenging to grade automatically. Recent approaches like Code Generation Based Grading (CGBG) leverage large language models (LLMs) to generate code from student explanations and validate its equivalence to the original code using unit tests. However, this approach does not differentiate betwe…
▽ More
"Explain in Plain English" (EiPE) questions are widely used to assess code comprehension skills but are challenging to grade automatically. Recent approaches like Code Generation Based Grading (CGBG) leverage large language models (LLMs) to generate code from student explanations and validate its equivalence to the original code using unit tests. However, this approach does not differentiate between high-level, purpose-focused responses and low-level, implementation-focused ones, limiting its effectiveness in assessing comprehension level. We propose a modified approach where students generate function names, emphasizing the function's purpose over implementation details. We evaluate this method in an introductory programming course and analyze it using Item Response Theory (IRT) to understand its effectiveness as exam items and its alignment with traditional EiPE grading standards. We also publish this work as an open source Python package for autograding EiPE questions, providing a scalable solution for adoption.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Mining Hierarchies with Conviction: Constructing the CS1 Skill Hierarchy with Pairwise Comparisons over Skill Distributions
Authors:
Dip Kiran Pradhan Newar,
Max Fowler,
David H. Smith IV,
Seth Poulsen
Abstract:
Background and Context: Some skills taught in introductory programming courses are categorized into 1) explaining code, 2) arranging lines of code in correct sequence, 3) tracing through the execution of a program, and 4) writing code from scratch. Objective: Knowing if a programming skill is a prerequisite to another would benefit teachers in properly planning the course and structuring the order…
▽ More
Background and Context: Some skills taught in introductory programming courses are categorized into 1) explaining code, 2) arranging lines of code in correct sequence, 3) tracing through the execution of a program, and 4) writing code from scratch. Objective: Knowing if a programming skill is a prerequisite to another would benefit teachers in properly planning the course and structuring the order in which they present activities relating to new content. Prior attempts to establish a skill hierarchy have suffered from methodological issues. Method: In this study, we used the conviction measure from association rule mining to perform pair-wise comparisons of five skills: Write, Trace, Reverse trace, Sequence, and Explain code. We used the data from four exams with more than 600 participants where students solved programming assignments of different skills for several programming topics. Findings: Our findings matched the previous finding that tracing is a prerequisite for students to learn to write code. Contradicting the previous claims, our analysis showed that using the mean threshold writing code is a prerequisite to explaining code. However, there is no clear relationship when we change the threshold to the median. Unlike prior work, we did not find a clear prerequisite relationship between sequencing code and writing or explaining code. Implications: Our research can help instructors by systematically arranging the skills students exercise when encountering a new topic. The goal is to help instructors properly teach and assess programming in a fashion most effective for learning by leveraging the relationship between skills.
△ Less
Submitted 7 February, 2025; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills
Authors:
Paul Denny,
David H. Smith IV,
Max Fowler,
James Prather,
Brett A. Becker,
Juho Leinonen
Abstract:
Reading, understanding and explaining code have traditionally been important skills for novices learning programming. As large language models (LLMs) become prevalent, these foundational skills are more important than ever given the increasing need to understand and evaluate model-generated code. Brand new skills are also needed, such as the ability to formulate clear prompts that can elicit inten…
▽ More
Reading, understanding and explaining code have traditionally been important skills for novices learning programming. As large language models (LLMs) become prevalent, these foundational skills are more important than ever given the increasing need to understand and evaluate model-generated code. Brand new skills are also needed, such as the ability to formulate clear prompts that can elicit intended code from an LLM. Thus, there is great interest in integrating pedagogical approaches for the development of both traditional coding competencies and the novel skills required to interact with LLMs. One effective way to develop and assess code comprehension ability is with ``Explain in plain English'' (EiPE) questions, where students succinctly explain the purpose of a fragment of code. However, grading EiPE questions has always been difficult given the subjective nature of evaluating written explanations and this has stifled their uptake. In this paper, we explore a natural synergy between EiPE questions and code-generating LLMs to overcome this limitation. We propose using an LLM to generate code based on students' responses to EiPE questions -- not only enabling EiPE responses to be assessed automatically, but helping students develop essential code comprehension and prompt crafting skills in parallel. We investigate this idea in an introductory programming course and report student success in creating effective prompts for solving EiPE questions. We also examine student perceptions of this activity and how it influences their views on the use of LLMs for aiding and assessing learning.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
An Orbital Solution for WASP-12 b: Updated Ephemeris and Evidence for Decay Leveraging Citizen Science Data
Authors:
Avinash S. Nediyedath,
Martin J. Fowler,
A. Norris,
Shivaraj R. Maidur,
Kyle A. Pearson,
S. Dixon,
P. Lewin,
Andre O. Kovacs,
A. Odasso,
K. Davis,
M. Primm,
P. Das,
Bryan E. Martin,
D. Lalla
Abstract:
NASA Citizen Scientists have used Exoplanet Transit Interpretation Code (EXOTIC) to reduce 40 sets of time-series images of WASP-12 taken by privately owned telescopes and a 6-inch telescope operated by the Center for Astrophysics | Harvard & Smithsonian MicroObservatory (MOBs). Of these sets, 24 result in clean transit light curves of WASP-12 b which are included in the NASA Exoplanet Watch websi…
▽ More
NASA Citizen Scientists have used Exoplanet Transit Interpretation Code (EXOTIC) to reduce 40 sets of time-series images of WASP-12 taken by privately owned telescopes and a 6-inch telescope operated by the Center for Astrophysics | Harvard & Smithsonian MicroObservatory (MOBs). Of these sets, 24 result in clean transit light curves of WASP-12 b which are included in the NASA Exoplanet Watch website. We use priors from the NASA Exoplanet Archive to calculate the ephemeris of the planet and combine it with ETD (Exoplanet Transit Database), ExoClock, and TESS (Transiting Exoplanet Survey Satellite) observations. Combining these datasets gives an updated ephemeris for the WASP-12 b system of 2454508.97923 +/- 0.000051 BJDTDB with an orbital period of 1.09141935 +/- 2.16e-08 days which can be used to inform the efficient scheduling of future space telescope observations. The orbital decay of the planet was found to be -6.89e-10 +/- 4.01e-11 days/epoch. These results show the benefits of long-term observations by amateur astronomers that citizen scientists can analyze to augment the field of Exoplanet research.
△ Less
Submitted 10 November, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Deep Learning for RF Signal Classification in Unknown and Dynamic Spectrum Environments
Authors:
Yi Shi,
Kemal Davaslioglu,
Yalin E. Sagduyu,
William C. Headley,
Michael Fowler,
Gilbert Green
Abstract:
Dynamic spectrum access (DSA) benefits from detection and classification of interference sources including in-network users, out-network users, and jammers that may all coexist in a wireless network. We present a deep learning based signal (modulation) classification solution in a realistic wireless network setting, where 1) signal types may change over time; 2) some signal types may be unknown fo…
▽ More
Dynamic spectrum access (DSA) benefits from detection and classification of interference sources including in-network users, out-network users, and jammers that may all coexist in a wireless network. We present a deep learning based signal (modulation) classification solution in a realistic wireless network setting, where 1) signal types may change over time; 2) some signal types may be unknown for which there is no training data; 3) signals may be spoofed such as the smart jammers replaying other signal types; and 4) different signal types may be superimposed due to the interference from concurrent transmissions. For case 1, we apply continual learning and train a Convolutional Neural Network (CNN) using an Elastic Weight Consolidation (EWC) based loss. For case 2, we detect unknown signals via outlier detection applied to the outputs of convolutional layers using Minimum Covariance Determinant (MCD) and k-means clustering methods. For case 3, we extend the CNN structure to capture phase shifts due to radio hardware effects to identify the spoofing signal sources. For case 4, we apply blind source separation using Independent Component Analysis (ICA) to separate interfering signals. We utilize the signal classification results in a distributed scheduling protocol, where in-network (secondary) users employ signal classification scores to make channel access decisions and share the spectrum with each other while avoiding interference with out-network (primary) users and jammers. Compared with benchmark TDMA-based schemes, we show that distributed scheduling constructed upon signal classification results provides major improvements to in-network user throughput and out-network user success ratio.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Intelligent Knowledge Distribution: Constrained-Action POMDPs for Resource-Aware Multi-Agent Communication
Authors:
Michael C. Fowler,
T. Charles Clancy,
Ryan K. Williams
Abstract:
This paper addresses a fundamental question of multi-agent knowledge distribution: what information should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on…
▽ More
This paper addresses a fundamental question of multi-agent knowledge distribution: what information should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this paper introduces two concepts for partially observable Markov decision processes (POMDPs): 1) action-based constraints which yield constrained-action POMDPs (CA-POMDPs); and 2) soft probabilistic constraint satisfaction for the resulting infinite-horizon controllers. To enable constraint analysis over an infinite horizon, an unconstrained policy is first represented as a Finite State Controller (FSC) and optimized with policy iteration. The FSC representation then allows for a combination of Markov chain Monte Carlo and discrete optimization to improve the probabilistic constraint satisfaction of the controller while minimizing the impact to the value function. Within the CA-POMDP framework we then propose Intelligent Knowledge Distribution (IKD) which yields per-agent policies for distributing knowledge between agents subject to interaction constraints. Finally, the CA-POMDP and IKD concepts are validated using an asset tracking problem where multiple unmanned aerial vehicles (UAVs) with heterogeneous sensors collaborate to localize a ground asset to assist in avoiding unseen obstacles in a disaster area. The IKD model was able to maintain asset tracking through multi-agent communications while only violating soft power and bandwidth constraints 3% of the time, while greedy and naive approaches violated constraints more than 60% of the time.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Application of Cybernetics and Control Theory for a New Paradigm in Cybersecurity
Authors:
Michael D. Adams,
Seth D. Hitefield,
Bruce Hoy,
Michael C. Fowler,
T. Charles Clancy
Abstract:
A significant limitation of current cyber security research and techniques is its reactive and applied nature. This leads to a continuous 'cyber cycle' of attackers scanning networks, developing exploits and attacking systems, with defenders detecting attacks, analyzing exploits and patching systems. This reactive nature leaves sensitive systems highly vulnerable to attack due to un-patched system…
▽ More
A significant limitation of current cyber security research and techniques is its reactive and applied nature. This leads to a continuous 'cyber cycle' of attackers scanning networks, developing exploits and attacking systems, with defenders detecting attacks, analyzing exploits and patching systems. This reactive nature leaves sensitive systems highly vulnerable to attack due to un-patched systems and undetected exploits. Some current research attempts to address this major limitation by introducing systems that implement moving target defense. However, these ideas are typically based on the intuition that a moving target defense will make it much harder for attackers to find and scan vulnerable systems, and not on theoretical mathematical foundations. The continuing lack of fundamental science and principles for developing more secure systems has drawn increased interest into establishing a 'science of cyber security'. This paper introduces the concept of using cybernetics, an interdisciplinary approach of control theory, systems theory, information theory and game theory applied to regulatory systems, as a foundational approach for developing cyber security principles. It explores potential applications of cybernetics to cyber security from a defensive perspective, while suggesting the potential use for offensive applications. Additionally, this paper introduces the fundamental principles for building non-stationary systems, which is a more general solution than moving target defenses. Lastly, the paper discusses related works concerning the limitations of moving target defense and one implementation based on non-stationary principles.
△ Less
Submitted 1 November, 2013;
originally announced November 2013.