-
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Authors:
Daniele Barolo,
Chiara Valentin,
Fariba Karimi,
Luis Galárraga,
Gonzalo G. Méndez,
Lisette Espín-Noboa
Abstract:
This paper evaluates the performance of six open-weight LLMs (llama3-8b, llama3.1-8b, gemma2-9b, mixtral-8x7b, llama3-70b, llama3.1-70b) in recommending experts in physics across five tasks: top-k experts by field, influential scientists by discipline, epoch, seniority, and scholar counterparts. The evaluation examines consistency, factuality, and biases related to gender, ethnicity, academic popu…
▽ More
This paper evaluates the performance of six open-weight LLMs (llama3-8b, llama3.1-8b, gemma2-9b, mixtral-8x7b, llama3-70b, llama3.1-70b) in recommending experts in physics across five tasks: top-k experts by field, influential scientists by discipline, epoch, seniority, and scholar counterparts. The evaluation examines consistency, factuality, and biases related to gender, ethnicity, academic popularity, and scholar similarity. Using ground-truth data from the American Physical Society and OpenAlex, we establish scholarly benchmarks by comparing model outputs to real-world academic records. Our analysis reveals inconsistencies and biases across all models. mixtral-8x7b produces the most stable outputs, while llama3.1-70b shows the highest variability. Many models exhibit duplication, and some, particularly gemma2-9b and llama3.1-8b, struggle with formatting errors. LLMs generally recommend real scientists, but accuracy drops in field-, epoch-, and seniority-specific queries, consistently favoring senior scholars. Representation biases persist, replicating gender imbalances (reflecting male predominance), under-representing Asian scientists, and over-representing White scholars. Despite some diversity in institutional and collaboration networks, models favor highly cited and productive scholars, reinforcing the rich-getricher effect while offering limited geographical representation. These findings highlight the need to improve LLMs for more reliable and equitable scholarly recommendations.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
PAYADOR: A Minimalist Approach to Grounding Language Models on Structured Data for Interactive Storytelling and Role-playing Games
Authors:
Santiago Góngora,
Luis Chiruzzo,
Gonzalo Méndez,
Pablo Gervás
Abstract:
Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In thi…
▽ More
Every time an Interactive Storytelling (IS) system gets a player input, it is facing the world-update problem. Classical approaches to this problem consist in mapping that input to known preprogrammed actions, what can severely constrain the free will of the player. When the expected experience has a strong focus on improvisation, like in Role-playing Games (RPGs), this problem is critical. In this paper we present PAYADOR, a different approach that focuses on predicting the outcomes of the actions instead of representing the actions themselves. To implement this approach, we ground a Large Language Model to a minimal representation of the fictional world, obtaining promising results. We make this contribution open-source, so it can be adapted and used for other related research on unleashing the co-creativity power of RPGs.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Simulation, Modelling and Classification of Wiki Contributors: Spotting The Good, The Bad, and The Ugly
Authors:
Silvia García Méndez,
Fátima Leal,
Benedita Malheiro,
Juan Carlos Burguillo Rial,
Bruno Veloso,
Adriana E. Chis,
Horacio González Vélez
Abstract:
Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi…
▽ More
Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage - a free worldwide wiki travel guide open to contribution from the general public - as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A System for Automatic English Text Expansion
Authors:
Silvia García Méndez,
Milagros Fernández Gavilanes,
Enrique Costa Montenegro,
Jonathan Juncal Martínez,
Francisco Javier González Castaño,
Ehud Reiter
Abstract:
We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptabilit…
▽ More
We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Interpretable classification of wiki-review streams
Authors:
Silvia García Méndez,
Fátima Leal,
Benedita Malheiro,
Juan Carlos Burguillo Rial
Abstract:
Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and pro…
▽ More
Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90 % values for all evaluation metrics (accuracy, precision, recall, and F-measure).
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-playing Games
Authors:
Santiago Góngora,
Luis Chiruzzo,
Gonzalo Méndez,
Pablo Gervás
Abstract:
In role-playing games a Game Master (GM) is the player in charge of the game, who must design the challenges the players face and narrate the outcomes of their actions. In this work we discuss some challenges to model GMs from an Interactive Storytelling and Natural Language Processing perspective. Following those challenges we propose three test categories to evaluate such dialogue systems, and w…
▽ More
In role-playing games a Game Master (GM) is the player in charge of the game, who must design the challenges the players face and narrate the outcomes of their actions. In this work we discuss some challenges to model GMs from an Interactive Storytelling and Natural Language Processing perspective. Following those challenges we propose three test categories to evaluate such dialogue systems, and we use them to test ChatGPT, Bard and OpenAssistant as out-of-the-box GMs.
△ Less
Submitted 30 September, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Showing Academic Performance Predictions during Term Planning: Effects on Students' Decisions, Behaviors, and Preferences
Authors:
Gonzalo Gabriel Méndez,
Luis Galárraga,
Katherine Chiluiza
Abstract:
Course selection is a crucial activity for students as it directly impacts their workload and performance. It is also time-consuming, prone to subjectivity, and often carried out based on incomplete information. This task can, nevertheless, be assisted with computational tools, for instance, by predicting performance based on historical data. We investigate the effects of showing grade predictions…
▽ More
Course selection is a crucial activity for students as it directly impacts their workload and performance. It is also time-consuming, prone to subjectivity, and often carried out based on incomplete information. This task can, nevertheless, be assisted with computational tools, for instance, by predicting performance based on historical data. We investigate the effects of showing grade predictions to students through an interactive visualization tool. A qualitative study suggests that in the presence of predictions, students may focus too much on maximizing their performance, to the detriment of other factors such as the workload. A follow-up quantitative study explored whether these effects are mitigated by changing how predictions are conveyed. Our observations suggest the presence of a framing effect that induces students to put more effort into course selection when faced with more specific predictions. We discuss these and other findings and outline considerations for designing better data-driven course selection tools.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
An approach to Beethoven's 10th Symphony
Authors:
Paula Muñoz-Lago,
Gonzalo Méndez
Abstract:
Ludwig van Beethoven composed his symphonies between 1799 and 1825, when he was writing his Tenth symphony. As we dispose of a great amount of data belonging to his work, the purpose of this paper is to investigate the possibility of extracting patterns on his compositional model from symbolic data and generate what would have been his last symphony, the Tenth. A neural network model has been buil…
▽ More
Ludwig van Beethoven composed his symphonies between 1799 and 1825, when he was writing his Tenth symphony. As we dispose of a great amount of data belonging to his work, the purpose of this paper is to investigate the possibility of extracting patterns on his compositional model from symbolic data and generate what would have been his last symphony, the Tenth. A neural network model has been built based on the Long Short-Therm Memory (LSTM) neural networks. After training the model, the generated music has been analysed by comparing the input data with the results, and establishing differences between the generated outputs based on the training data used to obtain them. The structure of the outputs strongly depends on the symphonies used to train the network.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
ReConstructor: A Scalable Constructive Visualization Tool
Authors:
Gonzalo Gabriel Méndez,
Jagoda Walny,
Søren Knudsen,
Charles Perin,
Samuel Huron,
Jo Vermeulen,
Richard Pusch,
Sheelagh Carpendale
Abstract:
Constructive approaches to visualization authoring have been shown to offer advantages such as providing options for flexible outputs, scaffolding and ideation of new data mappings, personalized exploration of data, as well as supporting data understanding and literacy. However, visualization authoring tools based on a constructive approach do not scale well to larger datasets. As construction oft…
▽ More
Constructive approaches to visualization authoring have been shown to offer advantages such as providing options for flexible outputs, scaffolding and ideation of new data mappings, personalized exploration of data, as well as supporting data understanding and literacy. However, visualization authoring tools based on a constructive approach do not scale well to larger datasets. As construction often involves manipulating small pieces of data and visuals, it requires a significant amount of time, effort, and repetitive steps. We present ReConstructor, an authoring tool in which a visualization is constructed by instantiating its structural and functional components through four interaction elements (objects, modifiers, activators, and tools). This design preserves most of the benefits of a constructive process while avoiding scalability issues by allowing designers to propagate individual mapping steps to all the elements of a visualization. We also discuss the perceived benefits of our approach and propose avenues for future research in this area.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Knowledge And The Action Description Language A
Authors:
Jorge Lobo,
Gisela Mendez,
Stuart R. Taylor
Abstract:
We introduce Ak, an extension of the action description language A (Gelfond and Lifschitz, 1993) to handle actions which affect knowledge. We use sensing actions to increase an agent's knowledge of the world and non-deterministic actions to remove knowledge. We include complex plans involving conditionals and loops in our query language for hypothetical reasoning. We also present a translation o…
▽ More
We introduce Ak, an extension of the action description language A (Gelfond and Lifschitz, 1993) to handle actions which affect knowledge. We use sensing actions to increase an agent's knowledge of the world and non-deterministic actions to remove knowledge. We include complex plans involving conditionals and loops in our query language for hypothetical reasoning. We also present a translation of Ak domain descriptions into epistemic logic programs.
△ Less
Submitted 24 April, 2004;
originally announced April 2004.