-
Predicting At-Risk Programming Students in Small Imbalanced Datasets using Synthetic Data
Authors:
Daniel Flood,
Matthew England,
Beate Grawemeyer
Abstract:
This study is part of a larger project focused on measuring, understanding, and improving student engagement in programming education. We investigate whether synthetic data generation can help identify at-risk students earlier in a small, imbalanced dataset from an introductory programming module. The analysis used anonymised records from 379 students, with 15\% marked as failing, and applied seve…
▽ More
This study is part of a larger project focused on measuring, understanding, and improving student engagement in programming education. We investigate whether synthetic data generation can help identify at-risk students earlier in a small, imbalanced dataset from an introductory programming module. The analysis used anonymised records from 379 students, with 15\% marked as failing, and applied several machine learning algorithms. The first experiments showed poor recall for the failing group. However, using synthetic data generation methods led to a significant improvement in performance. Our results suggest that machine learning can help identify at-risk students early in programming courses when combined with synthetic data. This research lays the groundwork for validating and using these models with live student cohorts in the future, to allow for timely and effective interventions that can improve student outcomes. It also includes feature importance analysis to refine formative tasks. Overall, this study contributes to developing practical workflows that help detect disengagement early and improve student success in programming education.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Investigating Estimated Kolmogorov Complexity as a Means of Regularization for Link Prediction
Authors:
Paris D. L. Flood,
Ramon Viñas,
Pietro Liò
Abstract:
Link prediction in graphs is an important task in the fields of network science and machine learning. We investigate a flexible means of regularization for link prediction based on an approximation of the Kolmogorov complexity of graphs that is differentiable and compatible with recent advances in link prediction algorithms. Informally, the Kolmogorov complexity of an object is the length of the s…
▽ More
Link prediction in graphs is an important task in the fields of network science and machine learning. We investigate a flexible means of regularization for link prediction based on an approximation of the Kolmogorov complexity of graphs that is differentiable and compatible with recent advances in link prediction algorithms. Informally, the Kolmogorov complexity of an object is the length of the shortest computer program that produces the object. Complex networks are often generated, in part, by simple mechanisms; for example, many citation networks and social networks are approximately scale-free and can be explained by preferential attachment. A preference for predicting graphs with simpler generating mechanisms motivates our choice of Kolmogorov complexity as a regularization term. In our experiments the regularization method shows good performance on many diverse real-world networks, however we determine that this is likely due to an aggregation method rather than any actual estimation of Kolmogorov complexity.
△ Less
Submitted 23 February, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Spreadsheets on the Move: An Evaluation of Mobile Spreadsheets
Authors:
Derek Flood,
Rachel Harrison,
Kevin McDaid
Abstract:
The power of mobile devices has increased dramatically in the last few years. These devices are becoming more sophisticated allowing users to accomplish a wide variety of tasks while on the move. The increasingly mobile nature of business has meant that more users will need access to spreadsheets while away from their desktop and laptop computers. Existing mobile applications suffer from a number…
▽ More
The power of mobile devices has increased dramatically in the last few years. These devices are becoming more sophisticated allowing users to accomplish a wide variety of tasks while on the move. The increasingly mobile nature of business has meant that more users will need access to spreadsheets while away from their desktop and laptop computers. Existing mobile applications suffer from a number of usability issues that make using spreadsheets in this way more difficult. This work represents the first evaluation of mobile spreadsheet applications. Through a pilot survey the needs and experiences of experienced spreadsheet users was examined. The range of spreadsheet apps available for the iOS platform was also evaluated in light of these users' needs.
△ Less
Submitted 18 December, 2011;
originally announced December 2011.
-
NLP-SIR: A Natural Language Approach for Spreadsheet Information Retrieval
Authors:
Derek Flood,
Kevin Mc Daid,
Fergal Mc Caffery
Abstract:
Spreadsheets are a ubiquitous software tool, used for a wide variety of tasks such as financial modelling, statistical analysis and inventory management. Extracting meaningful information from such data can be a difficult task, especially for novice users unfamiliar with the advanced data processing features of many spreadsheet applications. We believe that through the use of Natural Language Pr…
▽ More
Spreadsheets are a ubiquitous software tool, used for a wide variety of tasks such as financial modelling, statistical analysis and inventory management. Extracting meaningful information from such data can be a difficult task, especially for novice users unfamiliar with the advanced data processing features of many spreadsheet applications. We believe that through the use of Natural Language Processing (NLP) techniques this task can be made considerably easier. This paper introduces NLP-SIR, a Natural language interface for spreadsheet information retrieval. The results of a recent evaluation which compared NLP-SIR with existing Information retrieval tools are also outlined. This evaluation has shown that NLP-SIR is a more effective method of spreadsheet information retrieval.
△ Less
Submitted 8 August, 2009;
originally announced August 2009.
-
Evaluation of an Intelligent Assistive Technology for Voice Navigation of Spreadsheets
Authors:
Derek Flood,
Kevin Mc Daid,
Fergal Mc Caffery,
Brian Bishop
Abstract:
An integral part of spreadsheet auditing is navigation. For sufferers of Repetitive Strain Injury who need to use voice recognition technology this navigation can be highly problematic. To counter this the authors have developed an intelligent voice navigation system, iVoice, which replicates common spreadsheet auditing behaviours through simple voice commands. This paper outlines the iVoice sys…
▽ More
An integral part of spreadsheet auditing is navigation. For sufferers of Repetitive Strain Injury who need to use voice recognition technology this navigation can be highly problematic. To counter this the authors have developed an intelligent voice navigation system, iVoice, which replicates common spreadsheet auditing behaviours through simple voice commands. This paper outlines the iVoice system and summarizes the results of a study to evaluate iVoice when compared to a leading voice recognition technology.
△ Less
Submitted 21 September, 2008;
originally announced September 2008.
-
Voice-controlled Debugging of Spreadsheets
Authors:
Derek Flood,
Kevin Mc Daid
Abstract:
Developments in Mobile Computing are putting pressure on the software industry to research new modes of interaction that do not rely on the traditional keyboard and mouse combination. Computer users suffering from Repetitive Strain Injury also seek an alternative to keyboard and mouse devices to reduce suffering in wrist and finger joints. Voice-control is an alternative approach to spreadsheet…
▽ More
Developments in Mobile Computing are putting pressure on the software industry to research new modes of interaction that do not rely on the traditional keyboard and mouse combination. Computer users suffering from Repetitive Strain Injury also seek an alternative to keyboard and mouse devices to reduce suffering in wrist and finger joints. Voice-control is an alternative approach to spreadsheet development and debugging that has been researched and used successfully in other domains. While voice-control technology for spreadsheets is available its effectiveness has not been investigated. This study is the first to compare the performance of a set of expert spreadsheet developers that debugged a spreadsheet using voice-control technology and another set that debugged the same spreadsheet using keyboard and mouse. The study showed that voice, despite its advantages, proved to be slower and less accurate. However, it also revealed ways in which the technology might be improved to redress this imbalance.
△ Less
Submitted 23 February, 2008;
originally announced February 2008.