-
Exploratory Data Analysis on Code-mixed Misogynistic Comments
Authors:
Sargam Yadav,
Abhishek Kaushik,
Kevin McDaid
Abstract:
The problems of online hate speech and cyberbullying have significantly worsened since the increase in popularity of social media platforms such as YouTube and Twitter (X). Natural Language Processing (NLP) techniques have proven to provide a great advantage in automatic filtering such toxic content. Women are disproportionately more likely to be victims of online abuse. However, there appears to…
▽ More
The problems of online hate speech and cyberbullying have significantly worsened since the increase in popularity of social media platforms such as YouTube and Twitter (X). Natural Language Processing (NLP) techniques have proven to provide a great advantage in automatic filtering such toxic content. Women are disproportionately more likely to be victims of online abuse. However, there appears to be a lack of studies that tackle misogyny detection in under-resourced languages. In this short paper, we present a novel dataset of YouTube comments in mix-code Hinglish collected from YouTube videos which have been weak labelled as `Misogynistic' and `Non-misogynistic'. Pre-processing and Exploratory Data Analysis (EDA) techniques have been applied on the dataset to gain insights on its characteristics. The process has provided a better understanding of the dataset through sentiment scores, word clouds, etc.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Leveraging Weakly Annotated Data for Hate Speech Detection in Code-Mixed Hinglish: A Feasibility-Driven Transfer Learning Approach with Large Language Models
Authors:
Sargam Yadav,
Abhishek Kaushik,
Kevin McDaid
Abstract:
The advent of Large Language Models (LLMs) has advanced the benchmark in various Natural Language Processing (NLP) tasks. However, large amounts of labelled training data are required to train LLMs. Furthermore, data annotation and training are computationally expensive and time-consuming. Zero and few-shot learning have recently emerged as viable options for labelling data using large pre-trained…
▽ More
The advent of Large Language Models (LLMs) has advanced the benchmark in various Natural Language Processing (NLP) tasks. However, large amounts of labelled training data are required to train LLMs. Furthermore, data annotation and training are computationally expensive and time-consuming. Zero and few-shot learning have recently emerged as viable options for labelling data using large pre-trained models. Hate speech detection in mix-code low-resource languages is an active problem area where the use of LLMs has proven beneficial. In this study, we have compiled a dataset of 100 YouTube comments, and weakly labelled them for coarse and fine-grained misogyny classification in mix-code Hinglish. Weak annotation was applied due to the labor-intensive annotation process. Zero-shot learning, one-shot learning, and few-shot learning and prompting approaches have then been applied to assign labels to the comments and compare them to human-assigned labels. Out of all the approaches, zero-shot classification using the Bidirectional Auto-Regressive Transformers (BART) large model and few-shot prompting using Generative Pre-trained Transformer- 3 (ChatGPT-3) achieve the best results
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Spreadsheets on the Move: An Evaluation of Mobile Spreadsheets
Authors:
Derek Flood,
Rachel Harrison,
Kevin McDaid
Abstract:
The power of mobile devices has increased dramatically in the last few years. These devices are becoming more sophisticated allowing users to accomplish a wide variety of tasks while on the move. The increasingly mobile nature of business has meant that more users will need access to spreadsheets while away from their desktop and laptop computers. Existing mobile applications suffer from a number…
▽ More
The power of mobile devices has increased dramatically in the last few years. These devices are becoming more sophisticated allowing users to accomplish a wide variety of tasks while on the move. The increasingly mobile nature of business has meant that more users will need access to spreadsheets while away from their desktop and laptop computers. Existing mobile applications suffer from a number of usability issues that make using spreadsheets in this way more difficult. This work represents the first evaluation of mobile spreadsheet applications. Through a pilot survey the needs and experiences of experienced spreadsheet users was examined. The range of spreadsheet apps available for the iOS platform was also evaluated in light of these users' needs.
△ Less
Submitted 18 December, 2011;
originally announced December 2011.
-
Effect of Range Naming Conventions on Reliability and Development Time for Simple Spreadsheet Formulas
Authors:
Ruth McKeever,
Kevin McDaid
Abstract:
Practitioners often argue that range names make spreadsheets easier to understand and use, akin to the role of good variable names in traditional programming languages, yet there is no supporting scientific evidence. The authors previously published experiments that disproved this theory in relation to debugging, and now turn their focus to development. This paper presents the results of two itera…
▽ More
Practitioners often argue that range names make spreadsheets easier to understand and use, akin to the role of good variable names in traditional programming languages, yet there is no supporting scientific evidence. The authors previously published experiments that disproved this theory in relation to debugging, and now turn their focus to development. This paper presents the results of two iterations of a new experiment, which measure the effect of range names on the correctness of, and the time it takes to develop, simple summation formulas. Our findings, supported by statistically significant results, show that formulas developed by non-experts using range names are more likely to contain errors and take longer to develop. Taking these findings with the findings from previous experiments, we conclude that range names do not improve the quality of spreadsheets developed by novice and intermediate users. This paper is important in that it finds that the choice of naming convention can have a significant impact on novice and intermediate users' performance in formula development, with less structured naming conventions resulting in poorer performance by users.
△ Less
Submitted 29 November, 2011;
originally announced November 2011.
-
Spreadsheets in Financial Departments: An Automated Analysis of 65,000 Spreadsheets using the Luminous Technology
Authors:
Kevin McDaid,
Ronan MacRuairi,
Neil Clynch,
Kevin Logue,
Cian Clancy,
Shane Hayes
Abstract:
Spreadsheet technology is a cornerstone of IT systems in most organisations. It is often the glue that binds more structured transaction-based systems together. Financial operations are a case in point where spreadsheets fill the gaps left by dedicated accounting systems, particularly covering reporting and business process operations. However, little is understood as to the nature of spreadsheet…
▽ More
Spreadsheet technology is a cornerstone of IT systems in most organisations. It is often the glue that binds more structured transaction-based systems together. Financial operations are a case in point where spreadsheets fill the gaps left by dedicated accounting systems, particularly covering reporting and business process operations. However, little is understood as to the nature of spreadsheet usage in organisations and the contents and structure of these spreadsheets as they relate to key business functions with few, if any, comprehensive analyses of spreadsheet repositories in real organisations. As such this paper represents an important attempt at profiling real and substantial spreadsheet repositories.
Using the Luminous technology an analysis of 65,000 spreadsheets for the financial departments of both a government and a private commercial organisation was conducted. This provides an important insight into the nature and structure of these spreadsheets, the links between them, the existence and nature of macros and the level of repetitive processes performed through the spreadsheets. Furthermore it highlights the organisational dependence on spreadsheets and the range and number of spreadsheets dealt with by individuals on a daily basis. In so doing, this paper prompts important questions that can frame future research in the domain.
△ Less
Submitted 29 November, 2011;
originally announced November 2011.
-
How do Range Names Hinder Novice Spreadsheet Debugging Performance?
Authors:
Ruth McKeever,
Kevin McDaid
Abstract:
Although experts diverge on how best to improve spreadsheet quality, it is generally agreed that more time needs to be spent testing spreadsheets. Ideally, experienced and trained spreadsheet engineers would carry this out, but quite often this is neither practical nor possible. Many spreadsheets are a legacy, developed by staff that have since moved on, or indeed modified by many staff no longer…
▽ More
Although experts diverge on how best to improve spreadsheet quality, it is generally agreed that more time needs to be spent testing spreadsheets. Ideally, experienced and trained spreadsheet engineers would carry this out, but quite often this is neither practical nor possible. Many spreadsheets are a legacy, developed by staff that have since moved on, or indeed modified by many staff no longer employed by the organisation. When such spreadsheets fall into the hands of inexperienced, non-experts, any features that reduce error visibility may become a risk. Range names are one such feature, and this paper, building on previous research, investigates in a more structured and controlled manner the effect they have on the debugging performance of novice spreadsheet users.
△ Less
Submitted 14 September, 2010;
originally announced September 2010.
-
Error Estimation in Large Spreadsheets using Bayesian Statistics
Authors:
Leslie Bradley,
Kevin McDaid
Abstract:
Spreadsheets are ubiquitous in business with the financial sector particularly heavily reliant on the technology. It is known that the level of spreadsheet error can be high and that it is often necessary to review spreadsheets based on a structured methodology which includes a cell by cell examination of the spreadsheet. This paper outlines the early research that has been carried out into the…
▽ More
Spreadsheets are ubiquitous in business with the financial sector particularly heavily reliant on the technology. It is known that the level of spreadsheet error can be high and that it is often necessary to review spreadsheets based on a structured methodology which includes a cell by cell examination of the spreadsheet. This paper outlines the early research that has been carried out into the use of Bayesian Statistical methods to estimate the level of error in large spreadsheets during cell be cell examination based on expert knowledge and partial spreadsheet test data. The estimate can aid in the decision as to the quality of the spreadsheet and the necessity to conduct further testing or not.
△ Less
Submitted 8 August, 2009;
originally announced August 2009.
-
An Exploratory Analysis of the Impact of Named Ranges on the Debugging Performance of Novice Users
Authors:
Ruth McKeever,
Kevin McDaid,
Brian Bishop
Abstract:
This paper describes an exploratory empirical study of the effect of named ranges on spreadsheet debugging performance. Named ranges are advocated in both academia and industry, yet no experimental evidence has been cited to back up these recommendations. This paper describes an exploratory experiment involving 21 participants that assesses the performance of novices debugging a spreadsheet cont…
▽ More
This paper describes an exploratory empirical study of the effect of named ranges on spreadsheet debugging performance. Named ranges are advocated in both academia and industry, yet no experimental evidence has been cited to back up these recommendations. This paper describes an exploratory experiment involving 21 participants that assesses the performance of novices debugging a spreadsheet containing named ranges. The results are compared with the performance of a different set of novices debugging the same spreadsheet without named ranges. The findings suggest that novice users debug on average significantly fewer errors if the spreadsheet contains named ranges. The purpose of the investigative study is to derive a detailed and coherent set of research questions regarding the impact of range names on the debugging performance and behaviour of spreadsheet users. These will be answered through future controlled experiments.
△ Less
Submitted 6 August, 2009;
originally announced August 2009.
-
Spreadsheet End-User Behaviour Analysis
Authors:
Brian Bishop,
Kevin McDaid
Abstract:
To aid the development of spreadsheet debugging tools, a knowledge of end-users natural behaviour within the Excel environment would be advantageous. This paper details the design and application of a novel data acquisition tool, which can be used for the unobtrusive recording of end-users mouse, keyboard and Excel specific actions during the debugging of Excel spreadsheets. A debugging experime…
▽ More
To aid the development of spreadsheet debugging tools, a knowledge of end-users natural behaviour within the Excel environment would be advantageous. This paper details the design and application of a novel data acquisition tool, which can be used for the unobtrusive recording of end-users mouse, keyboard and Excel specific actions during the debugging of Excel spreadsheets. A debugging experiment was conducted using this data acquisition tool, and based on analysis of end-users performance and behaviour data, the authors developed a "spreadsheet cell coverage feedback" debugging tool. Results from the debugging experiment are presented in terms of enduser debugging performance and behaviour, and the outcomes of an evaluation experiment with the debugging tool are detailed.
△ Less
Submitted 21 September, 2008;
originally announced September 2008.
-
An Empirical Study of End-User Behaviour in Spreadsheet Error Detection & Correction
Authors:
Brian Bishop,
Kevin McDaid
Abstract:
Very little is known about the process by which end-user developers detect and correct spreadsheet errors. Any research pertaining to the development of spreadsheet testing methodologies or auditing tools would benefit from information on how end-users perform the debugging process in practice. Thirteen industry-based professionals and thirty-four accounting & finance students took part in a cur…
▽ More
Very little is known about the process by which end-user developers detect and correct spreadsheet errors. Any research pertaining to the development of spreadsheet testing methodologies or auditing tools would benefit from information on how end-users perform the debugging process in practice. Thirteen industry-based professionals and thirty-four accounting & finance students took part in a current ongoing experiment designed to record and analyse end-user behaviour in spreadsheet error detection and correction. Professionals significantly outperformed students in correcting certain error types. Time-based cell activity analysis showed that a strong correlation exists between the percentage of cells inspected and the number of errors corrected. The cell activity data was gathered through a purpose written VBA Excel plug-in that records the time and detail of all cell selection and cell change actions of individuals.
△ Less
Submitted 23 February, 2008;
originally announced February 2008.
-
Investigating the Potential of Test-Driven Development for Spreadsheet Engineering
Authors:
Alan Rust,
Brian Bishop,
Kevin McDaid
Abstract:
It is widely documented that the absence of a structured approach to spreadsheet engineering is a key factor in the high level of spreadsheet errors. In this paper we propose and investigate the application of Test-Driven Development to the creation of spreadsheets. Test-Driven Development is an emerging development technique in software engineering that has been shown to result in better qualit…
▽ More
It is widely documented that the absence of a structured approach to spreadsheet engineering is a key factor in the high level of spreadsheet errors. In this paper we propose and investigate the application of Test-Driven Development to the creation of spreadsheets. Test-Driven Development is an emerging development technique in software engineering that has been shown to result in better quality software code. It has also been shown that this code requires less testing and is easier to maintain. Through a pair of case studies we demonstrate that Test-Driven Development can be applied to the development of spreadsheets. We present the detail of these studies preceded by a clear explanation of the technique and its application to spreadsheet engineering. A supporting tool under development by the authors is also documented along with proposed research to determine the effectiveness of the methodology and the associated tool.
△ Less
Submitted 30 January, 2008;
originally announced January 2008.