-
Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers
Authors:
Martin Ruskov
Abstract:
Finding balanced ways to employ Large Language Models (LLMs) in education is a challenge due to inherent risks of poor understanding of the technology and of a susceptible audience. This is particularly so with younger children, who are known to have difficulties with pervasive screen time. Working with a tangible programming robot called Cubetto, we propose an approach to benefit from the capabil…
▽ More
Finding balanced ways to employ Large Language Models (LLMs) in education is a challenge due to inherent risks of poor understanding of the technology and of a susceptible audience. This is particularly so with younger children, who are known to have difficulties with pervasive screen time. Working with a tangible programming robot called Cubetto, we propose an approach to benefit from the capabilities of LLMs by employing such models in the preparation of personalised storytelling, necessary for preschool children to get accustomed to the practice of commanding the robot. We engage in action research to develop an early version of a formalised process to rapidly prototype game stories for Cubetto. Our approach has both reproducible results, because it employs open weight models, and is model-agnostic, because we test it with 5 different LLMs. We document on one hand the process, the used materials and prompts, and on the other the learning experience and outcomes. We deem the generation successful for the intended purposes of using the results as a teacher aid. Testing the models on 4 different task scenarios, we encounter issues of consistency and hallucinations and document the corresponding evaluation process and attempts (some successful and some not) to overcome these issues. Importantly, the process does not expose children to LLMs directly. Rather, the technology is used to help teachers easily develop personalised narratives on children's preferred topics. We believe our method is adequate for preschool classes and we are planning to further experiment in real-world educational settings.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Understanding Learner-LLM Chatbot Interactions and the Impact of Prompting Guidelines
Authors:
Cansu Koyuturk,
Emily Theophilou,
Sabrina Patania,
Gregor Donabauer,
Andrea Martinenghi,
Chiara Antico,
Alessia Telari,
Alessia Testa,
Sathya Bursic,
Franca Garzotto,
Davinia Hernandez-Leo,
Udo Kruschwitz,
Davide Taibi,
Simona Amenta,
Martin Ruskov,
Dimitri Ognibene
Abstract:
Large Language Models (LLMs) have transformed human-computer interaction by enabling natural language-based communication with AI-powered chatbots. These models are designed to be intuitive and user-friendly, allowing users to articulate requests with minimal effort. However, despite their accessibility, studies reveal that users often struggle with effective prompting, resulting in inefficient re…
▽ More
Large Language Models (LLMs) have transformed human-computer interaction by enabling natural language-based communication with AI-powered chatbots. These models are designed to be intuitive and user-friendly, allowing users to articulate requests with minimal effort. However, despite their accessibility, studies reveal that users often struggle with effective prompting, resulting in inefficient responses. Existing research has highlighted both the limitations of LLMs in interpreting vague or poorly structured prompts and the difficulties users face in crafting precise queries. This study investigates learner-AI interactions through an educational experiment in which participants receive structured guidance on effective prompting. We introduce and compare three types of prompting guidelines: a task-specific framework developed through a structured methodology and two baseline approaches. To assess user behavior and prompting efficacy, we analyze a dataset of 642 interactions from 107 users. Using Von NeuMidas, an extended pragmatic annotation schema for LLM interaction analysis, we categorize common prompting errors and identify recurring behavioral patterns. We then evaluate the impact of different guidelines by examining changes in user behavior, adherence to prompting strategies, and the overall quality of AI-generated responses. Our findings provide a deeper understanding of how users engage with LLMs and the role of structured prompting guidance in enhancing AI-assisted communication. By comparing different instructional frameworks, we offer insights into more effective approaches for improving user competency in AI interactions, with implications for AI literacy, chatbot usability, and the design of more responsive AI systems.
△ Less
Submitted 11 May, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development
Authors:
Dimitri Ognibene,
Gregor Donabauer,
Emily Theophilou,
Cansu Koyuturk,
Mona Yavari,
Sathya Bursic,
Alessia Telari,
Alessia Testa,
Raffaele Boiano,
Davide Taibi,
Davinia Hernandez-Leo,
Udo Kruschwitz,
Martin Ruskov
Abstract:
The use of large language model (LLM)-powered chatbots, such as ChatGPT, has become popular across various domains, supporting a range of tasks and processes. However, due to the intrinsic complexity of LLMs, effective prompting is more challenging than it may seem. This highlights the need for innovative educational and support strategies that are both widely accessible and seamlessly integrated…
▽ More
The use of large language model (LLM)-powered chatbots, such as ChatGPT, has become popular across various domains, supporting a range of tasks and processes. However, due to the intrinsic complexity of LLMs, effective prompting is more challenging than it may seem. This highlights the need for innovative educational and support strategies that are both widely accessible and seamlessly integrated into task workflows. Yet, LLM prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples. Our framework transforms these guidelines into features that can be identified within learners' prompts. Using these feature descriptions and annotated examples, we create few-shot learning detectors. We then evaluate different configurations of these detectors, testing three state-of-the-art LLMs and ensembles. We run experiments with cross-validation on a sample of original prompts, as well as tests on prompts collected from task-naive learners. Our results show how LLMs perform on feature detection. Notably, GPT- 4 demonstrates strong performance on most features, while closely related models, such as GPT-3 and GPT-3.5 Turbo (Instruct), show inconsistent behaviors in feature classification. These differences highlight the need for further research into how design choices impact feature selection and prompt detection. Our findings contribute to the fields of generative AI literacy and computer-supported learning assessment, offering valuable insights for both researchers and practitioners.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Values That Are Explicitly Present in Fairy Tales: Comparing Samples from German, Italian and Portuguese Traditions
Authors:
Alba Morollon Diaz-Faes,
Carla Sofia Ribeiro Murteira,
Martin Ruskov
Abstract:
Looking at how social values are represented in fairy tales can give insights about the variations in communication of values across cultures. We study how values are communicated in fairy tales from Portugal, Italy and Germany using a technique called word embedding with a compass to quantify vocabulary differences and commonalities. We study how these three national traditions differ in their ex…
▽ More
Looking at how social values are represented in fairy tales can give insights about the variations in communication of values across cultures. We study how values are communicated in fairy tales from Portugal, Italy and Germany using a technique called word embedding with a compass to quantify vocabulary differences and commonalities. We study how these three national traditions differ in their explicit references to values. To do this, we specify a list of value-charged tokens, consider their word stems and analyse the distance between these in a bespoke pre-trained Word2Vec model. We triangulate and critically discuss the validity of the resulting hypotheses emerging from this quantitative model. Our claim is that this is a reusable and reproducible method for the study of the values explicitly referenced in historical corpora. Finally, our preliminary findings hint at a shared cultural understanding and the expression of values such as Benevolence, Conformity, and Universalism across the studied cultures, suggesting the potential existence of a pan-European cultural memory.
△ Less
Submitted 6 May, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models
Authors:
Miriam Cuscito,
Alfio Ferrara,
Martin Ruskov
Abstract:
In this paper, we explore the idea of analysing the historical bias of contextual language models based on BERT by measuring their adequacy with respect to Early Modern (EME) and Modern (ME) English. In our preliminary experiments, we perform fill-in-the-blank tests with 60 masked sentences (20 EME-specific, 20 ME-specific and 20 generic) and three different models (i.e., BERT Base, MacBERTh, Engl…
▽ More
In this paper, we explore the idea of analysing the historical bias of contextual language models based on BERT by measuring their adequacy with respect to Early Modern (EME) and Modern (ME) English. In our preliminary experiments, we perform fill-in-the-blank tests with 60 masked sentences (20 EME-specific, 20 ME-specific and 20 generic) and three different models (i.e., BERT Base, MacBERTh, English HLM). We then rate the model predictions according to a 5-point bipolar scale between the two language varieties and derive a weighted score to measure the adequacy of each model to EME and ME varieties of English.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Learning to Prompt in the Classroom to Understand AI Limits: A pilot study
Authors:
Emily Theophilou,
Cansu Koyuturk,
Mona Yavari,
Sathya Bursic,
Gregor Donabauer,
Alessia Telari,
Alessia Testa,
Raffaele Boiano,
Davinia Hernandez-Leo,
Martin Ruskov,
Davide Taibi,
Alessandro Gabbiadini,
Dimitri Ognibene
Abstract:
Artificial intelligence's (AI) progress holds great promise in tackling pressing societal concerns such as health and climate. Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. However, the ensuing excitement has led to negative se…
▽ More
Artificial intelligence's (AI) progress holds great promise in tackling pressing societal concerns such as health and climate. Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. However, the ensuing excitement has led to negative sentiments, even as AI methods demonstrate remarkable contributions (e.g. in health and genetics). A key factor contributing to this sentiment is the misleading perception that LLMs can effortlessly provide solutions across domains, ignoring their limitations such as hallucinations and reasoning constraints. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. This necessitates comprehensive AI literacy interventions that educate the public about LLM constraints and effective usage techniques, i.e prompting strategies. With this aim, a pilot educational intervention was performed in a high school with 21 students. It involved presenting high-level concepts about intelligence, AI, and LLMs, followed by practical exercises involving ChatGPT in creating natural educational conversations and applying established prompting strategies. Encouraging preliminary results emerged, including high appreciation of the activity, improved interaction quality with the LLM, reduced negative AI sentiments, and a better grasp of limitations, specifically unreliability, limited understanding of commands leading to unsatisfactory responses, and limited presentation flexibility. Our aim is to explore AI acceptance factors and refine this approach for more controlled future studies.
△ Less
Submitted 1 September, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Developing Effective Educational Chatbots with ChatGPT prompts: Insights from Preliminary Tests in a Case Study on Social Media Literacy (with appendix)
Authors:
Cansu Koyuturk,
Mona Yavari,
Emily Theophilou,
Sathya Bursic,
Gregor Donabauer,
Alessia Telari,
Alessia Testa,
Raffaele Boiano,
Alessandro Gabbiadini,
Davinia Hernandez-Leo,
Martin Ruskov,
Dimitri Ognibene
Abstract:
Educational chatbots come with a promise of interactive and personalized learning experiences, yet their development has been limited by the restricted free interaction capabilities of available platforms and the difficulty of encoding knowledge in a suitable format. Recent advances in language learning models with zero-shot learning capabilities, such as ChatGPT, suggest a new possibility for dev…
▽ More
Educational chatbots come with a promise of interactive and personalized learning experiences, yet their development has been limited by the restricted free interaction capabilities of available platforms and the difficulty of encoding knowledge in a suitable format. Recent advances in language learning models with zero-shot learning capabilities, such as ChatGPT, suggest a new possibility for developing educational chatbots using a prompt-based approach. We present a case study with a simple system that enables mixed-turn chatbot interactions and discuss the insights and preliminary guidelines obtained from initial tests. We examine ChatGPT's ability to pursue multiple interconnected learning objectives, adapt the educational activity to users' characteristics, such as culture, age, and level of education, and its ability to use diverse educational strategies and conversational styles. Although the results are encouraging, challenges are posed by the limited history maintained for the conversation and the highly structured form of responses by ChatGPT, as well as their variability, which can lead to an unexpected switch of the chatbot's role from a teacher to a therapist. We provide some initial guidelines to address these issues and to facilitate the development of effective educational chatbots.
△ Less
Submitted 10 August, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales
Authors:
Martin Ruskov
Abstract:
The quality of text-to-image generation is continuously improving, yet the boundaries of its applicability are still unclear. In particular, refinement of the text input with the objective of achieving better results - commonly called prompt engineering - so far seems to have not been geared towards work with pre-existing texts. We investigate whether text-to-image generation and prompt engineerin…
▽ More
The quality of text-to-image generation is continuously improving, yet the boundaries of its applicability are still unclear. In particular, refinement of the text input with the objective of achieving better results - commonly called prompt engineering - so far seems to have not been geared towards work with pre-existing texts. We investigate whether text-to-image generation and prompt engineering could be used to generate basic illustrations of popular fairytales. Using Midjourney v4, we engage in action research with a dual aim: to attempt to generate 5 believable illustrations for each of 5 popular fairytales, and to define a prompt engineering process that starts from a pre-existing text and arrives at an illustration of it. We arrive at a tentative 4-stage process: i) initial prompt, ii) composition adjustment, iii) style refinement, and iv) variation selection. We also discuss three reasons why the generation model struggles with certain illustrations: difficulties with counts, bias from stereotypical configurations and inability to depict overly fantastic situations. Our findings are not limited to the specific generation model and are intended to be generalisable to future ones.
△ Less
Submitted 25 August, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Computer-Aided Modelling of the Bilingual Word Indices to the Ninth-Century Uchitel'noe evangelie
Authors:
Martin Ruskov,
Lora Taseva
Abstract:
The development of bilingual dictionaries to medieval translations presents diverse difficulties. These result from two types of philological circumstances: a) the asymmetry between the source language and the target language; and b) the varying available sources of both the original and translated texts. In particular, the full critical edition of Tihova of Constantine of Preslav's Uchitel'noe ev…
▽ More
The development of bilingual dictionaries to medieval translations presents diverse difficulties. These result from two types of philological circumstances: a) the asymmetry between the source language and the target language; and b) the varying available sources of both the original and translated texts. In particular, the full critical edition of Tihova of Constantine of Preslav's Uchitel'noe evangelie ('Didactic Gospel') gives a relatively good idea of the Old Church Slavonic translation but not of its Greek source text. This is due to the fact that Cramer's edition of the catenae - used as the parallel text in it - is based on several codices whose text does not fully coincide with the Slavonic. This leads to the addition of the newly-discovered parallels from Byzantine manuscripts and John Chrysostom's homilies. Our approach to these issues is a step-wise process with two main goals: a) to facilitate the philological annotation of input data and b) to consider the manifestations of the mentioned challenges, first, separately in order to simplify their resolution, and, then, in their combination. We demonstrate how we model various types of asymmetric translation correlates and the variability resulting from the pluralism of sources. We also demonstrate how all these constructions are being modelled and processed into the final indices. Our approach is designed with generalisation in mind and is intended to be applicable also for other translations from Greek into Old Church Slavonic.
△ Less
Submitted 25 October, 2022;
originally announced November 2022.
-
Getting Users Smart Quick about Security: Results from 90 Minutes of Using a Persuasive Toolkit for Facilitating Information Security Problem Solving by Non-Professionals
Authors:
Martin Ruskov,
Paul Ekblom,
M. Angela Sasse
Abstract:
There is a conflict between the need for security compliance by users and the fact that commonly they cannot afford to dedicate much of their time and energy to that security. A balanced level of user engagement in security is difficult to achieve due to difference of priorities between the business perspective and the security perspective. We sought to find a way to engage users minimally, yet ef…
▽ More
There is a conflict between the need for security compliance by users and the fact that commonly they cannot afford to dedicate much of their time and energy to that security. A balanced level of user engagement in security is difficult to achieve due to difference of priorities between the business perspective and the security perspective. We sought to find a way to engage users minimally, yet efficiently, so that they would both improve their security awareness and provide necessary feedback for improvement purposes to security designers. We have developed a persuasive software toolkit to engage users in structured discussions about security vulnerabilities in their company and potential interventions addressing these. In the toolkit we have adapted and integrated an established framework from conventional crime prevention. In the research reported here we examine how non-professionals perceived security problems through a short-term use of the toolkit. We present perceptions from a pilot lab study in which randomly recruited participants had to analyze a crafted insider threat problem using the toolkit. Results demonstrate that study participants were able to successfully identify causes, propose interventions and engage in providing feedback on proposed interventions. Subsequent interviews show that participants have developed greater awareness of information security issues and the framework to address these, which in a real setting would lead ultimately to significant benefits for the organization. These results indicate that when well-structured such short-term engagement is sufficient for users to meaningfully take part in complex security discussions and develop in-depth understanding of theoretical principles of security.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.