-
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?
Authors:
Santiago Góngora,
Ignacio Sastre,
Santiago Robaina,
Ignacio Remersaro,
Luis Chiruzzo,
Aiala Rosá
Abstract:
In this paper, we present the RETUYT-INCO participation at the BEA 2025 shared task. Our participation was characterized by the decision of using relatively small models, with fewer than 1B parameters. This self-imposed restriction tries to represent the conditions in which many research labs or institutions are in the Global South, where computational power is not easily accessible due to its pro…
▽ More
In this paper, we present the RETUYT-INCO participation at the BEA 2025 shared task. Our participation was characterized by the decision of using relatively small models, with fewer than 1B parameters. This self-imposed restriction tries to represent the conditions in which many research labs or institutions are in the Global South, where computational power is not easily accessible due to its prohibitive cost. Even under this restrictive self-imposed setting, our models managed to stay competitive with the rest of teams that participated in the shared task. According to the $exact\ F_1$ scores published by the organizers, the performance gaps between our models and the winners were as follows: $6.46$ in Track 1; $10.24$ in Track 2; $7.85$ in Track 3; $9.56$ in Track 4; and $13.13$ in Track 5. Considering that the minimum difference with a winner team is $6.46$ points -- and the maximum difference is $13.13$ -- according to the $exact\ F_1$ score, we find that models with a size smaller than 1B parameters are competitive for these tasks, all of which can be run on computers with a low-budget GPU or even without a GPU.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
A Platform for Generating Educational Activities to Teach English as a Second Language
Authors:
Aiala Rosá,
Santiago Góngora,
Juan Pablo Filevich,
Ignacio Sastre,
Laura Musto,
Brian Carpenter,
Luis Chiruzzo
Abstract:
We present a platform for the generation of educational activities oriented to teaching English as a foreign language. The different activities -- games and language practice exercises -- are strongly based on Natural Language Processing techniques. The platform offers the possibility of playing out-of-the-box games, generated from resources created semi-automatically and then manually curated. It…
▽ More
We present a platform for the generation of educational activities oriented to teaching English as a foreign language. The different activities -- games and language practice exercises -- are strongly based on Natural Language Processing techniques. The platform offers the possibility of playing out-of-the-box games, generated from resources created semi-automatically and then manually curated. It can also generate games or exercises of greater complexity from texts entered by teachers, providing a stage of review and edition of the generated content before use. As a way of expanding the variety of activities in the platform, we are currently experimenting with image and text generation. In order to integrate them and improve the performance of other neural tools already integrated, we are working on migrating the platform to a more powerful server. In this paper we describe the development of our platform and its deployment for end users, discussing the challenges faced and how we overcame them, and also detail our future work plans.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
PAYADOR: A Minimalist Approach to Grounding Language Models on Structured Data for Interactive Storytelling and Role-playing Games
Authors:
Santiago Góngora,
Luis Chiruzzo,
Gonzalo Méndez,
Pablo Gervás
Abstract:
Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In thi…
▽ More
Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In this paper we present PAYADOR, a different approach that focuses on predicting the outcomes of the actions instead of representing the actions themselves. To implement this approach, we ground a Large Language Model to a minimal representation of the fictional world, obtaining promising results. We make this contribution open-source, so it can be adapted and used for other related research on unleashing the co-creativity power of RPGs.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Why AI Is WEIRD and Should Not Be This Way: Towards AI For Everyone, With Everyone, By Everyone
Authors:
Rada Mihalcea,
Oana Ignat,
Longju Bai,
Angana Borah,
Luis Chiruzzo,
Zhijing Jin,
Claude Kwizera,
Joan Nwatu,
Soujanya Poria,
Thamar Solorio
Abstract:
This paper presents a vision for creating AI systems that are inclusive at every stage of development, from data collection to model design and evaluation. We address key limitations in the current AI pipeline and its WEIRD representation, such as lack of data diversity, biases in model performance, and narrow evaluation metrics. We also focus on the need for diverse representation among the devel…
▽ More
This paper presents a vision for creating AI systems that are inclusive at every stage of development, from data collection to model design and evaluation. We address key limitations in the current AI pipeline and its WEIRD representation, such as lack of data diversity, biases in model performance, and narrow evaluation metrics. We also focus on the need for diverse representation among the developers of these systems, as well as incentives that are not skewed toward certain groups. We highlight opportunities to develop AI systems that are for everyone (with diverse stakeholders in mind), with everyone (inclusive of diverse data and annotators), and by everyone (designed and developed by a globally diverse workforce).
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games
Authors:
Santiago Góngora,
Luis Chiruzzo,
Gonzalo Méndez,
Pablo Gervás
Abstract:
In role-playing games a Game Master (GM) is the player in charge of the game, who must design the challenges the players face and narrate the outcomes of their actions. In this work we discuss some challenges to model GMs from an Interactive Storytelling and Natural Language Processing perspective. Following those challenges we propose three test categories to evaluate such dialogue systems, and w…
▽ More
In role-playing games a Game Master (GM) is the player in charge of the game, who must design the challenges the players face and narrate the outcomes of their actions. In this work we discuss some challenges to model GMs from an Interactive Storytelling and Natural Language Processing perspective. Following those challenges we propose three test categories to evaluate such dialogue systems, and we use them to test ChatGPT, Bard and OpenAssistant as out-of-the-box GMs.
△ Less
Submitted 30 September, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching Analysis
Authors:
Luis Chiruzzo,
Marvin Agüero-Torales,
Gustavo Giménez-Lugo,
Aldo Alvarez,
Yliana Rodríguez,
Santiago Góngora,
Thamar Solorio
Abstract:
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand toke…
▽ More
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models
Authors:
Abteen Ebrahimi,
Arya D. McCarthy,
Arturo Oncevay,
Luis Chiruzzo,
John E. Ortega,
Gustavo A. Giménez-Lugo,
Rolando Coto-Solano,
Katharina Kann
Abstract:
Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribu…
▽ More
Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri--Spanish, Guarani--Spanish, Quechua--Spanish, and Shipibo-Konibo--Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Don't Take it Personally: Analyzing Gender and Age Differences in Ratings of Online Humor
Authors:
J. A. Meaney,
Steven R. Wilson,
Luis Chiruzzo,
Walid Magdy
Abstract:
Computational humor detection systems rarely model the subjectivity of humor responses, or consider alternative reactions to humor - namely offense. We analyzed a large dataset of humor and offense ratings by male and female annotators of different age groups. We find that women link these two concepts more strongly than men, and they tend to give lower humor ratings and higher offense scores. We…
▽ More
Computational humor detection systems rarely model the subjectivity of humor responses, or consider alternative reactions to humor - namely offense. We analyzed a large dataset of humor and offense ratings by male and female annotators of different age groups. We find that women link these two concepts more strongly than men, and they tend to give lower humor ratings and higher offense scores. We also find that the correlation between humor and offense increases with age. Although there were no gender or age differences in humor detection, women and older annotators signalled that they did not understand joke texts more often than men. We discuss implications for computational humor detection and downstream tasks.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
Authors:
Abteen Ebrahimi,
Manuel Mager,
Arturo Oncevay,
Vishrav Chaudhary,
Luis Chiruzzo,
Angela Fan,
John Ortega,
Ricardo Ramos,
Annette Rios,
Ivan Meza-Ruiz,
Gustavo A. Giménez-Lugo,
Elisabeth Mager,
Graham Neubig,
Alexis Palmer,
Rolando Coto-Solano,
Ngoc Thang Vu,
Katharina Kann
Abstract:
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we…
▽ More
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%. Continued pretraining offers improvements, with an average accuracy of 44.05%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 48.72%.
△ Less
Submitted 16 March, 2022; v1 submitted 18 April, 2021;
originally announced April 2021.
-
RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN
Authors:
Aiala Rosá,
Luis Chiruzzo,
Mathias Etcheverry,
Santiago Castro
Abstract:
This article presents classifiers based on SVM and Convolutional Neural Networks (CNN) for the TASS 2017 challenge on tweets sentiment analysis. The classifier with the best performance in general uses a combination of SVM and CNN. The use of word embeddings was particularly useful for improving the classifiers performance.
This article presents classifiers based on SVM and Convolutional Neural Networks (CNN) for the TASS 2017 challenge on tweets sentiment analysis. The classifier with the best performance in general uses a combination of SVM and CNN. The use of word embeddings was particularly useful for improving the classifiers performance.
△ Less
Submitted 17 October, 2017;
originally announced October 2017.
-
A Crowd-Annotated Spanish Corpus for Humor Analysis
Authors:
Santiago Castro,
Luis Chiruzzo,
Aiala Rosá,
Diego Garat,
Guillermo Moncecchi
Abstract:
Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between…
▽ More
Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The inter-annotator agreement Krippendorff's alpha value is 0.5710. The dataset is available for general use and can serve as a basis for humor detection and as a first step to tackle subjectivity.
△ Less
Submitted 19 July, 2018; v1 submitted 2 October, 2017;
originally announced October 2017.