-
Flow-Lenia: Emergent evolutionary dynamics in mass conservative continuous cellular automata
Authors:
Erwan Plantec,
Gautier Hamon,
Mayalen Etcheverry,
Bert Wang-Chak Chan,
Pierre-Yves Oudeyer,
Clément Moulin-Frier
Abstract:
Central to the artificial life endeavour is the creation of artificial systems spontaneously generating properties found in the living world such as autopoiesis, self-replication, evolution and open-endedness. While numerous models and paradigms have been proposed, cellular automata (CA) have taken a very important place in the field notably as they enable the study of phenomenons like self-reprod…
▽ More
Central to the artificial life endeavour is the creation of artificial systems spontaneously generating properties found in the living world such as autopoiesis, self-replication, evolution and open-endedness. While numerous models and paradigms have been proposed, cellular automata (CA) have taken a very important place in the field notably as they enable the study of phenomenons like self-reproduction and autopoiesis. Continuous CA like Lenia have been showed to produce life-like patterns reminiscent, on an aesthetic and ontological point of view, of biological organisms we call creatures. We propose in this paper Flow-Lenia, a mass conservative extension of Lenia. We present experiments demonstrating its effectiveness in generating spatially-localized patters (SLPs) with complex behaviors and show that the update rule parameters can be optimized to generate complex creatures showing behaviors of interest. Furthermore, we show that Flow-Lenia allows us to embed the parameters of the model, defining the properties of the emerging patterns, within its own dynamics thus allowing for multispecies simulations. By using the evolutionary activity framework as well as other metrics, we shed light on the emergent evolutionary dynamics taking place in this system.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making
Authors:
Guillaume Levy,
Cedric Colas,
Pierre-Yves Oudeyer,
Thomas Carta,
Clement Romac
Abstract:
Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Baye…
▽ More
Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Bayesian inference and autonomous active exploration with reinforcement learning. WorldLLM leverages the in-context learning abilities of LLMs to guide an LLM-based world model's predictions using natural language hypotheses given in its prompt. These hypotheses are iteratively refined through a Bayesian inference framework that leverages a second LLM as the proposal distribution given collected evidence. This evidence is collected using a curiosity-driven reinforcement learning policy that explores the environment to find transitions with a low log-likelihood under our LLM-based predictive model using the current hypotheses. By alternating between refining hypotheses and collecting new evidence, our framework autonomously drives continual improvement of the predictions. Our experiments demonstrate the effectiveness of WorldLLM in a textual game environment that requires agents to manipulate and combine objects. The framework not only enhances predictive accuracy, but also generates human-interpretable theories of environment dynamics.
△ Less
Submitted 7 June, 2025;
originally announced June 2025.
-
Exploring Flow-Lenia Universes with a Curiosity-driven AI Scientist: Discovering Diverse Ecosystem Dynamics
Authors:
Thomas Michel,
Marko Cvjetko,
Gautier Hamon,
Pierre-Yves Oudeyer,
Clément Moulin-Frier
Abstract:
We present a method for the automated discovery of system-level dynamics in Flow-Lenia--a continuous cellular automaton (CA) with mass conservation and parameter localization-using a curiosity--driven AI scientist. This method aims to uncover processes leading to self-organization of evolutionary and ecosystemic dynamics in CAs. We build on previous work which uses diversity search algorithms in L…
▽ More
We present a method for the automated discovery of system-level dynamics in Flow-Lenia--a continuous cellular automaton (CA) with mass conservation and parameter localization-using a curiosity--driven AI scientist. This method aims to uncover processes leading to self-organization of evolutionary and ecosystemic dynamics in CAs. We build on previous work which uses diversity search algorithms in Lenia to find self-organized individual patterns, and extend it to large environments that support distinct interacting patterns. We adapt Intrinsically Motivated Goal Exploration Processes (IMGEPs) to drive exploration of diverse Flow-Lenia environments using simulation-wide metrics, such as evolutionary activity, compression-based complexity, and multi-scale entropy. We test our method in two experiments, showcasing its ability to illuminate significantly more diverse dynamics compared to random search. We show qualitative results illustrating how ecosystemic simulations enable self-organization of complex collective behaviors not captured by previous individual pattern search and analysis. We complement automated discovery with an interactive exploration tool, creating an effective human-AI collaborative workflow for scientific investigation. Though demonstrated specifically with Flow-Lenia, this methodology provides a framework potentially applicable to other parameterizable complex systems where understanding emergent collective properties is of interest.
△ Less
Submitted 2 June, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Investigating Middle School Students Question-Asking and Answer-Evaluation Skills When Using ChatGPT for Science Investigation
Authors:
Rania Abdelghani,
Kou Murayama,
Celeste Kidd,
Hélène Sauzéon,
Pierre-Yves Oudeyer
Abstract:
Generative AI (GenAI) tools such as ChatGPT allow users, including school students without prior AI expertise, to explore and address a wide range of tasks. Surveys show that most students aged eleven and older already use these tools for school-related activities. However, little is known about how they actually use GenAI and how it impacts their learning.
This study addresses this gap by exami…
▽ More
Generative AI (GenAI) tools such as ChatGPT allow users, including school students without prior AI expertise, to explore and address a wide range of tasks. Surveys show that most students aged eleven and older already use these tools for school-related activities. However, little is known about how they actually use GenAI and how it impacts their learning.
This study addresses this gap by examining middle school students ability to ask effective questions and critically evaluate ChatGPT responses, two essential skills for active learning and productive interactions with GenAI. 63 students aged 14 to 15 were tasked with solving science investigation problems using ChatGPT. We analyzed their interactions with the model, as well as their resulting learning outcomes.
Findings show that students often over-relied on ChatGPT in both the question-asking and answer-evaluation phases. Many struggled to use clear questions aligned with task goals and had difficulty judging the quality of responses or knowing when to seek clarification. As a result, their learning performance remained moderate: their explanations of the scientific concepts tended to be vague, incomplete, or inaccurate, even after unrestricted use of ChatGPT. This pattern held even in domains where students reported strong prior knowledge.
Furthermore, students self-reported understanding and use of ChatGPT were negatively associated with their ability to select effective questions and evaluate responses, suggesting misconceptions about the tool and its limitations. In contrast, higher metacognitive skills were positively linked to better QA-related skills.
These findings underscore the need for educational interventions that promote AI literacy and foster question-asking strategies to support effective learning with GenAI.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
Authors:
Grgur Kovač,
Jérémy Perez,
Rémy Portelas,
Peter Ford Dominey,
Pierre-Yves Oudeyer
Abstract:
Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops were shown to lead to distribution shifts - models misrepresenting the true underlying distributions of human data (also called model collapse). However, how human data properties affect such shifts rema…
▽ More
Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops were shown to lead to distribution shifts - models misrepresenting the true underlying distributions of human data (also called model collapse). However, how human data properties affect such shifts remains poorly understood. In this paper, we provide the first empirical examination of the effect of such properties on the outcome of recursive training. We first confirm that using different human datasets leads to distribution shifts of different magnitudes. Through exhaustive manipulation of dataset properties combined with regression analyses, we then identify a set of properties predicting distribution shift magnitudes. Lexical diversity is found to amplify these shifts, while semantic diversity and data quality mitigate them. Furthermore, we find that these influences are highly modular: data scrapped from a given internet domain has little influence on the content generated for another domain. Finally, experiments on political bias reveal that human data properties affect whether the initial bias will be amplified or reduced. Overall, our results portray a novel view, where different parts of internet may undergo different types of distribution shift.
△ Less
Submitted 2 July, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
Authors:
Loris Gaven,
Thomas Carta,
Clément Romac,
Cédric Colas,
Sylvain Lamprier,
Olivier Sigaud,
Pierre-Yves Oudeyer
Abstract:
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches e…
▽ More
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches either require extensive sampling or rely on brittle expert-defined goal groupings. We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, we show that MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space. These results demonstrate how augmenting LLM agents with a metacognitive ability for LP predictions can effectively scale curriculum learning to open-ended goal spaces.
△ Less
Submitted 17 June, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology
Authors:
Junior Cedric Tonga,
Benjamin Clement,
Pierre-Yves Oudeyer
Abstract:
The automatic generation of hints by Large Language Models (LLMs) within Intelligent Tutoring Systems (ITSs) has shown potential to enhance student learning. However, generating pedagogically sound hints that address student misconceptions and adhere to specific educational objectives remains challenging. This work explores using LLMs (GPT-4o and Llama-3-8B-instruct) as teachers to generate effect…
▽ More
The automatic generation of hints by Large Language Models (LLMs) within Intelligent Tutoring Systems (ITSs) has shown potential to enhance student learning. However, generating pedagogically sound hints that address student misconceptions and adhere to specific educational objectives remains challenging. This work explores using LLMs (GPT-4o and Llama-3-8B-instruct) as teachers to generate effective hints for students simulated through LLMs (GPT-3.5-turbo, Llama-3-8B-Instruct, or Mistral-7B-instruct-v0.3) tackling math exercises designed for human high-school students, and designed using cognitive science principles. We present here the study of several dimensions: 1) identifying error patterns made by simulated students on secondary-level math exercises; 2) developing various prompts for GPT-4o as a teacher and evaluating their effectiveness in generating hints that enable simulated students to self-correct; and 3) testing the best-performing prompts, based on their ability to produce relevant hints and facilitate error correction, with Llama-3-8B-Instruct as the teacher, allowing for a performance comparison with GPT-4o. The results show that model errors increase with higher temperature settings. Notably, when hints are generated by GPT-4o, the most effective prompts include prompts tailored to specific errors as well as prompts providing general hints based on common mathematical errors. Interestingly, Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o. Also the problem-solving and response revision capabilities of the LLMs as students, particularly GPT-3.5-turbo, improved significantly after receiving hints, especially at lower temperature settings. However, models like Mistral-7B-Instruct demonstrated a decline in performance as the temperature increased.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Authors:
Mohamed Salim Aissi,
Clement Romac,
Thomas Carta,
Sylvain Lamprier,
Pierre-Yves Oudeyer,
Olivier Sigaud,
Laure Soulier,
Nicolas Thome
Abstract:
Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL…
▽ More
Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.
△ Less
Submitted 29 October, 2024; v1 submitted 25 October, 2024;
originally announced October 2024.
-
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
Authors:
Loris Gaven,
Clement Romac,
Thomas Carta,
Sylvain Lamprier,
Olivier Sigaud,
Pierre-Yves Oudeyer
Abstract:
The past years have seen Large Language Models (LLMs) strive not only as generative models but also as agents solving textual sequential decision-making tasks. When facing complex environments where their zero-shot abilities are insufficient, recent work showed online Reinforcement Learning (RL) could be used for the LLM agent to discover and learn efficient strategies interactively. However, most…
▽ More
The past years have seen Large Language Models (LLMs) strive not only as generative models but also as agents solving textual sequential decision-making tasks. When facing complex environments where their zero-shot abilities are insufficient, recent work showed online Reinforcement Learning (RL) could be used for the LLM agent to discover and learn efficient strategies interactively. However, most prior work sticks to on-policy algorithms, which greatly reduces the scope of methods such agents could use for both exploration and exploitation, such as experience replay and hindsight relabeling. Yet, such methods may be key for LLM learning agents, and in particular when designing autonomous intrinsically motivated agents sampling and pursuing their own goals (i.e. autotelic agents). This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method not only paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
LogProber: Disentangling confidence from contamination in LLM responses
Authors:
Nicolas Yax,
Pierre-Yves Oudeyer,
Stefano Palminteri
Abstract:
In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to f…
▽ More
In machine learning, contamination refers to situations where testing data leak into the training set. The issue is particularly relevant for the evaluation of the performance of Large Language Models (LLMs), which are generally trained on gargantuan, and generally opaque, corpora of text scraped from the world wide web. Developing tools to detect contamination is therefore crucial to be able to fairly and properly track the evolution of the performance of LLMs. To date, only a few recent studies have attempted to address the issue of quantifying and detecting contamination in short text sequences, such as those commonly found in benchmarks. However, these methods have limitations that can sometimes render them impractical. In the present paper, we introduce LogProber, a novel, efficient algorithm that we show to be able to detect contamination in a black box setting that tries to tackle some of these drawbacks by focusing on the familiarity with the question rather than the answer. Here, we explore the properties of the proposed method in comparison with concurrent approaches, identify its advantages and limitations, and illustrate how different forms of contamination can go undetected depending on the design of the detection algorithm.
△ Less
Submitted 20 June, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Collective Innovation in Groups of Large Language Models
Authors:
Eleni Nisioti,
Sebastian Risi,
Ida Momennejad,
Pierre-Yves Oudeyer,
Clément Moulin-Frier
Abstract:
Human culture relies on collective innovation: our ability to continuously explore how existing elements in our environment can be combined to create new ones. Language is hypothesized to play a key role in human culture, driving individual cognitive capacities and shaping communication. Yet the majority of models of collective innovation assign no cognitive capacities or language abilities to age…
▽ More
Human culture relies on collective innovation: our ability to continuously explore how existing elements in our environment can be combined to create new ones. Language is hypothesized to play a key role in human culture, driving individual cognitive capacities and shaping communication. Yet the majority of models of collective innovation assign no cognitive capacities or language abilities to agents. Here, we contribute a computational study of collective innovation where agents are Large Language Models (LLMs) that play Little Alchemy 2, a creative video game originally developed for humans that, as we argue, captures useful aspects of innovation landscapes not present in previous test-beds. We, first, study an LLM in isolation and discover that it exhibits both useful skills and crucial limitations. We, then, study groups of LLMs that share information related to their behaviour and focus on the effect of social connectivity on collective performance. In agreement with previous human and computational studies, we observe that groups with dynamic connectivity out-compete fully-connected groups. Our work reveals opportunities and challenges for future studies of collective innovation that are becoming increasingly relevant as Generative Artificial Intelligence algorithms and humans innovate alongside each other.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Authors:
Jérémy Perez,
Grgur Kovač,
Corentin Léger,
Cédric Colas,
Gaia Molinaro,
Maxime Derex,
Pierre-Yves Oudeyer,
Clément Moulin-Frier
Abstract:
As large language models (LLMs) start interacting with each other and generating an increasing amount of text online, it becomes crucial to better understand how information is transformed as it passes from one LLM to the next. While significant research has examined individual LLM behaviors, existing studies have largely overlooked the collective behaviors and information distortions arising from…
▽ More
As large language models (LLMs) start interacting with each other and generating an increasing amount of text online, it becomes crucial to better understand how information is transformed as it passes from one LLM to the next. While significant research has examined individual LLM behaviors, existing studies have largely overlooked the collective behaviors and information distortions arising from iterated LLM interactions. Small biases, negligible at the single output level, risk being amplified in iterated interactions, potentially leading the content to evolve towards attractor states. In a series of telephone game experiments, we apply a transmission chain design borrowed from the human cultural evolution literature: LLM agents iteratively receive, produce, and transmit texts from the previous to the next agent in the chain. By tracking the evolution of text toxicity, positivity, difficulty, and length across transmission chains, we uncover the existence of biases and attractors, and study their dependence on the initial text, the instructions, language model, and model size. For instance, we find that more open-ended instructions lead to stronger attraction effects compared to more constrained tasks. We also find that different text properties display different sensitivity to attraction effects, with toxicity leading to stronger attractors than length. These findings highlight the importance of accounting for multi-step transmission dynamics and represent a first step towards a more comprehensive understanding of LLM cultural dynamics.
△ Less
Submitted 2 June, 2025; v1 submitted 5 July, 2024;
originally announced July 2024.
-
PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Authors:
Nicolas Yax,
Pierre-Yves Oudeyer,
Stefano Palminteri
Abstract:
This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known re…
▽ More
This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information.
△ Less
Submitted 16 June, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Cultural evolution in populations of Large Language Models
Authors:
Jérémy Perez,
Corentin Léger,
Marcela Ovando-Tellez,
Chris Foulon,
Joan Dussauld,
Pierre-Yves Oudeyer,
Clément Moulin-Frier
Abstract:
Research in cultural evolution aims at providing causal explanations for the change of culture over time. Over the past decades, this field has generated an important body of knowledge, using experimental, historical, and computational methods. While computational models have been very successful at generating testable hypotheses about the effects of several factors, such as population structure o…
▽ More
Research in cultural evolution aims at providing causal explanations for the change of culture over time. Over the past decades, this field has generated an important body of knowledge, using experimental, historical, and computational methods. While computational models have been very successful at generating testable hypotheses about the effects of several factors, such as population structure or transmission biases, some phenomena have so far been more complex to capture using agent-based and formal models. This is in particular the case for the effect of the transformations of social information induced by evolved cognitive mechanisms. We here propose that leveraging the capacity of Large Language Models (LLMs) to mimic human behavior may be fruitful to address this gap. On top of being an useful approximation of human cultural dynamics, multi-agents models featuring generative agents are also important to study for their own sake. Indeed, as artificial agents are bound to participate more and more to the evolution of culture, it is crucial to better understand the dynamics of machine-generated cultural evolution. We here present a framework for simulating cultural evolution in populations of LLMs, allowing the manipulation of variables known to be important in cultural evolution, such as network structure, personality, and the way social information is aggregated and transformed. The software we developed for conducting these simulations is open-source and features an intuitive user-interface, which we hope will help to build bridges between the fields of cultural evolution and generative artificial intelligence.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Interactive environments for training children's curiosity through the practice of metacognitive skills: a pilot study
Authors:
Rania Abdelghani,
Edith Law,
Chloé Desvaux,
Pierre-Yves Oudeyer,
Hélène Sauzéon
Abstract:
Curiosity-driven learning has shown significant positive effects on students' learning experiences and outcomes. But despite this importance, reports show that children lack this skill, especially in formal educational settings. To address this challenge, we propose an 8-session workshop that aims to enhance children's curiosity through training a set of specific metacognitive skills we hypothesiz…
▽ More
Curiosity-driven learning has shown significant positive effects on students' learning experiences and outcomes. But despite this importance, reports show that children lack this skill, especially in formal educational settings. To address this challenge, we propose an 8-session workshop that aims to enhance children's curiosity through training a set of specific metacognitive skills we hypothesize are involved in its process. Our workshop contains animated videos presenting declarative knowledge about curiosity and the said metacognitive skills as well as practice sessions to apply these skills during a reading-comprehension task, using a web platform designed for this study (e.g. expressing uncertainty, formulating questions, etc). We conduct a pilot study with 15 primary school students, aged between 8 and 10. Our first results show a positive impact on children's metacognitive efficiency and their ability to express their curiosity through question-asking behaviors.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Stick to your Role! Stability of Personal Values Expressed in Large Language Models
Authors:
Grgur Kovač,
Rémy Portelas,
Masataka Sawayama,
Peter Ford Dominey,
Pierre-Yves Oudeyer
Abstract:
The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed t…
▽ More
The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence (specifically, value stability) should be studied as a specific property of LLMs and used as another dimension of LLM comparison (alongside others such as cognitive abilities, knowledge, or model size). We present a case-study on the stability of value expression over different contexts (simulated conversations on different topics) as measured using a standard psychology questionnaire (PVQ) and on behavioral downstream tasks. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We consider two settings (with and without instructing LLMs to simulate particular personas), two simulated populations, and three downstream tasks. We observe consistent trends in the stability of models and model families - Mixtral, Mistral, GPT-3.5 and Qwen families are more stable than LLaMa-2 and Phi. The consistency of these trends implies that some models exhibit higher value stability than others, and that stability can be estimated with the set of introduced methodological tools. When instructed to simulate particular personas, LLMs exhibit low Rank-order stability, which further diminishes with conversation length. This highlights the need for future research on LLMs that coherently simulate different personas. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs.
△ Less
Submitted 28 August, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Discovering Sensorimotor Agency in Cellular Automata using Diversity Search
Authors:
Gautier Hamon,
Mayalen Etcheverry,
Bert Wang-Chak Chan,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
The research field of Artificial Life studies how life-like phenomena such as autopoiesis, agency, or self-regulation can self-organize in computer simulations. In cellular automata (CA), a key open-question has been whether it it is possible to find environment rules that self-organize robust "individuals" from an initial state with no prior existence of things like "bodies", "brain", "perception…
▽ More
The research field of Artificial Life studies how life-like phenomena such as autopoiesis, agency, or self-regulation can self-organize in computer simulations. In cellular automata (CA), a key open-question has been whether it it is possible to find environment rules that self-organize robust "individuals" from an initial state with no prior existence of things like "bodies", "brain", "perception" or "action". In this paper, we leverage recent advances in machine learning, combining algorithms for diversity search, curriculum learning and gradient descent, to automate the search of such "individuals", i.e. localized structures that move around with the ability to react in a coherent manner to external obstacles and maintain their integrity, hence primitive forms of sensorimotor agency. We show that this approach enables to find systematically environmental conditions in CA leading to self-organization of such basic forms of agency. Through multiple experiments, we show that the discovered agents have surprisingly robust capabilities to move, maintain their body integrity and navigate among various obstacles. They also show strong generalization abilities, with robustness to changes of scale, random updates or perturbations from the environment not seen during training. We discuss how this approach opens new perspectives in AI and synthetic bioengineering.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner Choice
Authors:
Benjamin Clément,
Hélène Sauzéon,
Didier Roy,
Pierre-Yves Oudeyer
Abstract:
Large class sizes challenge personalized learning in schools, prompting the use of educational technologies such as intelligent tutoring systems. To address this, we present an AI-driven personalization system, called ZPDES, based on the Learning Progress Hypothesis - modeling curiosity-driven learning - and multi-armed bandit techniques. It sequences exercises that maximize learning progress for…
▽ More
Large class sizes challenge personalized learning in schools, prompting the use of educational technologies such as intelligent tutoring systems. To address this, we present an AI-driven personalization system, called ZPDES, based on the Learning Progress Hypothesis - modeling curiosity-driven learning - and multi-armed bandit techniques. It sequences exercises that maximize learning progress for each student. While previous studies demonstrated its efficacy in enhancing learning compared to hand-made curricula, its impact on student motivation remained unexplored. Furthermore, ZPDES previously lacked features allowing student choice, a limitation in agency that conflicts with its foundation on models of curiosity-driven learning. This study investigates how integrating choice, as a gamification element unrelated to exercise difficulty, affects both learning outcomes and motivation. We conducted an extensive field study (265 7-8 years old children, RCT design), comparing ZPDES with and without choice against a hand-designed curriculum. Results show that ZPDES improves both learning performance and the learning experience. Moreover adding choice to ZPDES enhances intrinsic motivation and further strengthens its learning benefits. In contrast, incorporating choice into a fixed, linear curriculum negatively impacts learning outcomes. These findings highlight that the intrinsic motivation elicited by choice (gamification) is beneficial only when paired with an adaptive personalized learning system. This insight is critical as gamified features become increasingly prevalent in educational technologies.
△ Less
Submitted 5 March, 2025; v1 submitted 16 January, 2024;
originally announced February 2024.
-
Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?
Authors:
Mayalen Etcheverry,
Bert Wang-Chak Chan,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a…
▽ More
Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search. Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial "discovery assistant" incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity. To implement those ideas in the Minecraft environment, we simulate an artificial "chemistry" system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial "discovery assistant" (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Machine Culture
Authors:
Levin Brinkmann,
Fabian Baumann,
Jean-François Bonnefon,
Maxime Derex,
Thomas F. Müller,
Anne-Marie Nussberger,
Agnieszka Czaplicka,
Alberto Acerbi,
Thomas L. Griffiths,
Joseph Henrich,
Joel Z. Leibo,
Richard McElreath,
Pierre-Yves Oudeyer,
Jonathan Stray,
Iyad Rahwan
Abstract:
The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algo…
▽ More
The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algorithms are altering social learning dynamics. Chatbots are forming a new mode of cultural transmission, serving as cultural models. Furthermore, intelligent machines are evolving as contributors in generating cultural traits--from game strategies and visual art to scientific results. We provide a conceptual framework for studying the present and anticipated future impact of machines on cultural evolution, and present a research agenda for the study of machine culture.
△ Less
Submitted 22 November, 2023; v1 submitted 19 November, 2023;
originally announced November 2023.
-
A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents
Authors:
Olivier Sigaud,
Gianluca Baldassarre,
Cedric Colas,
Stephane Doncieux,
Richard Duro,
Pierre-Yves Oudeyer,
Nicolas Perrin-Gilbert,
Vieri Giuliano Santucci
Abstract:
A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute…
▽ More
A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to produce elements from time to time (e.g., observations, options, reward functions, and goals), over an infinite horizon, that are considered novel from an observer's perspective. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems in which agents can learn a growing repertoire of goal-driven skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind.
△ Less
Submitted 7 June, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models
Authors:
Julien Pourcel,
Cédric Colas,
Gaia Molinaro,
Pierre-Yves Oudeyer,
Laetitia Teodorescu
Abstract:
The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically…
▽ More
The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically motivated literature, Autotelic CodE Search (ACES) jointly optimizes for the diversity and difficulty of generated problems. We represent problems in a space of LLM-generated semantic descriptors describing the programming skills required to solve them (e.g. string manipulation, dynamic programming, etc.) and measure their difficulty empirically as a linearly decreasing function of the success rate of Llama-3-70B, a state-of-the-art LLM problem solver. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors (goal-directed exploration) using previously generated problems as in-context examples. ACES generates problems that are more diverse and more challenging than problems produced by baseline methods and three times more challenging than problems found in existing Python programming benchmarks on average across 11 state-of-the-art code LLMs.
△ Less
Submitted 29 May, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Generative AI in the Classroom: Can Students Remain Active Learners?
Authors:
Rania Abdelghani,
Hélène Sauzéon,
Pierre-Yves Oudeyer
Abstract:
Generative Artificial Intelligence (GAI) can be seen as a double-edged weapon in education. Indeed, it may provide personalized, interactive and empowering pedagogical sequences that could favor students' intrinsic motivation, active engagement and help them have more control over their learning. But at the same time, other GAI properties such as the lack of uncertainty signalling even in cases of…
▽ More
Generative Artificial Intelligence (GAI) can be seen as a double-edged weapon in education. Indeed, it may provide personalized, interactive and empowering pedagogical sequences that could favor students' intrinsic motivation, active engagement and help them have more control over their learning. But at the same time, other GAI properties such as the lack of uncertainty signalling even in cases of failure (particularly with Large Language Models (LLMs)) could lead to opposite effects, e.g. over-estimation of one's own competencies, passiveness, loss of curious and critical-thinking sense, etc.
These negative effects are due in particular to the lack of a pedagogical stance in these models' behaviors. Indeed, as opposed to standard pedagogical activities, GAI systems are often designed to answers users' inquiries easily and conveniently, without asking them to make an effort, and without focusing on their learning process and/or outcomes.
This article starts by outlining some of these opportunities and challenges surrounding the use of GAI in education, with a focus on the effects on students' active learning strategies and related metacognitive skills. Then, we present a framework for introducing pedagogical transparency in GAI-based educational applications. This framework presents 1) training methods to include pedagogical principles in the models, 2) methods to ensure controlled and pedagogically-relevant interactions when designing activities with GAI and 3) educational methods enabling students to acquire the relevant skills to properly benefit from the use of GAI in their learning activities (meta-cognitive skills, GAI litteracy).
△ Less
Submitted 10 November, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
SBMLtoODEjax: Efficient Simulation and Optimization of Biological Network Models in JAX
Authors:
Mayalen Etcheverry,
Michael Levin,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
Advances in bioengineering and biomedicine demand a deep understanding of the dynamic behavior of biological systems, ranging from protein pathways to complex cellular processes. Biological networks like gene regulatory networks and protein pathways are key drivers of embryogenesis and physiological processes. Comprehending their diverse behaviors is essential for tackling diseases, including canc…
▽ More
Advances in bioengineering and biomedicine demand a deep understanding of the dynamic behavior of biological systems, ranging from protein pathways to complex cellular processes. Biological networks like gene regulatory networks and protein pathways are key drivers of embryogenesis and physiological processes. Comprehending their diverse behaviors is essential for tackling diseases, including cancer, as well as for engineering novel biological constructs. Despite the availability of extensive mathematical models represented in Systems Biology Markup Language (SBML), researchers face significant challenges in exploring the full spectrum of behaviors and optimizing interventions to efficiently shape those behaviors. Existing tools designed for simulation of biological network models are not tailored to facilitate interventions on network dynamics nor to facilitate automated discovery. Leveraging recent developments in machine learning (ML), this paper introduces SBMLtoODEjax, a lightweight library designed to seamlessly integrate SBML models with ML-supported pipelines, powered by JAX. SBMLtoODEjax facilitates the reuse and customization of SBML-based models, harnessing JAX's capabilities for efficient parallel simulations and optimization, with the aim to accelerate research in biological network analysis.
△ Less
Submitted 29 October, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents
Authors:
Grgur Kovač,
Rémy Portelas,
Peter Ford Dominey,
Pierre-Yves Oudeyer
Abstract:
Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should…
▽ More
Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school.
△ Less
Submitted 23 November, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Large Language Models as Superpositions of Cultural Perspectives
Authors:
Grgur Kovač,
Masataka Sawayama,
Rémy Portelas,
Cédric Colas,
Peter Ford Dominey,
Pierre-Yves Oudeyer
Abstract:
Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personal…
▽ More
Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions .
△ Less
Submitted 7 November, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Augmenting Autotelic Agents with Large Language Models
Authors:
Cédric Colas,
Laetitia Teodorescu,
Pierre-Yves Oudeyer,
Xingdi Yuan,
Marc-Alexandre Côté
Abstract:
Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal re…
▽ More
Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal representations are culturally evolved and transmitted across individuals, in particular using language. Current artificial agents mostly rely on predefined goal representations corresponding to goal spaces that are either bounded (e.g. list of instructions), or unbounded (e.g. the space of possible visual inputs) but are rarely endowed with the ability to reshape their goal representations, to form new abstractions or to imagine creative goals. In this paper, we introduce a language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals. The LM is used as an imperfect model of human cultural transmission; an attempt to capture aspects of humans' common-sense, intuitive physics and overall interests. Specifically, it supports three key components of the autotelic architecture: 1)~a relabeler that describes the goals achieved in the agent's trajectories, 2)~a goal generator that suggests new high-level goals along with their decomposition into subgoals the agent already masters, and 3)~reward functions for each of these goals. Without relying on any hand-coded goal representations, reward functions or curriculum, we show that LMA3 agents learn to master a large diversity of skills in a task-agnostic text-based environment.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding
Authors:
Ziang Xiao,
Xingdi Yuan,
Q. Vera Liao,
Rania Abdelghani,
Pierre-Yves Oudeyer
Abstract:
Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-spe…
▽ More
Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
A Song of Ice and Fire: Analyzing Textual Autotelic Agents in ScienceWorld
Authors:
Laetitia Teodorescu,
Xingdi Yuan,
Marc-Alexandre Côté,
Pierre-Yves Oudeyer
Abstract:
Building open-ended agents that can autonomously discover a diversity of behaviours is one of the long-standing goals of artificial intelligence. This challenge can be studied in the framework of autotelic RL agents, i.e. agents that learn by selecting and pursuing their own goals, self-organizing a learning curriculum. Recent work identified language as a key dimension of autotelic learning, in p…
▽ More
Building open-ended agents that can autonomously discover a diversity of behaviours is one of the long-standing goals of artificial intelligence. This challenge can be studied in the framework of autotelic RL agents, i.e. agents that learn by selecting and pursuing their own goals, self-organizing a learning curriculum. Recent work identified language as a key dimension of autotelic learning, in particular because it enables abstract goal sampling and guidance from social peers for hindsight relabelling. Within this perspective, we study the following open scientific questions: What is the impact of hindsight feedback from a social peer (e.g. selective vs. exhaustive)? How can the agent learn from very rare language goal examples in its experience replay? How can multiple forms of exploration be combined, and take advantage of easier goals as stepping stones to reach harder ones? To address these questions, we use ScienceWorld, a textual environment with rich abstract and combinatorial physics. We show the importance of selectivity from the social peer's feedback; that experience replay needs to over-sample examples of rare goals; and that following self-generated goal sequences where the agent's competence is intermediate leads to significant improvements in final performance.
△ Less
Submitted 24 February, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Authors:
Thomas Carta,
Clément Romac,
Thomas Wolf,
Sylvain Lamprier,
Olivier Sigaud,
Pierre-Yves Oudeyer
Abstract:
Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding:…
▽ More
Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.
△ Less
Submitted 17 October, 2024; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization
Authors:
Erwan Plantec,
Gautier Hamon,
Mayalen Etcheverry,
Pierre-Yves Oudeyer,
Clément Moulin-Frier,
Bert Wang-Chak Chan
Abstract:
The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate.…
▽ More
The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate. Among those, some spatially localized patterns (SLPs) resemble life-like artificial creatures and display complex behaviors. However, those creatures are found in only a small subspace of the Lenia parameter space and are not trivial to discover, necessitating advanced search algorithms. Furthermore, each of these creatures exist only in worlds governed by specific update rules and thus cannot interact in the same one. This paper proposes as mass-conservative extension of Lenia, called Flow Lenia, that solve both of these issues. We present experiments demonstrating its effectiveness in generating SLPs with complex behaviors and show that the update rule parameters can be optimized to generate SLPs showing behaviors of interest. Finally, we show that Flow Lenia enables the integration of the parameters of the CA update rules within the CA dynamics, making them dynamic and localized, allowing for multi-species simulations, with locally coherent update rules that define properties of the emerging creatures, and that can be mixed with neighbouring rules. We argue that this paves the way for the intrinsic evolution of self-organized artificial life forms within continuous CAs.
△ Less
Submitted 24 March, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
GPT-3-driven pedagogical agents for training children's curious question-asking skills
Authors:
Rania Abdelghani,
Yen-Hsiang Wang,
Xingdi Yuan,
Tong Wang,
Pauline Lucas,
Hélène Sauzéon,
Pierre-Yves Oudeyer
Abstract:
In order to train children's ability to ask curiosity-driven questions, previous research has explored designing specific exercises relying on providing semantic and linguistic cues to help formulate such questions. But despite showing pedagogical efficiency, this method is still limited as it relies on generating the said cues by hand, which can be a very costly process. In this context, we propo…
▽ More
In order to train children's ability to ask curiosity-driven questions, previous research has explored designing specific exercises relying on providing semantic and linguistic cues to help formulate such questions. But despite showing pedagogical efficiency, this method is still limited as it relies on generating the said cues by hand, which can be a very costly process. In this context, we propose to leverage advances in the natural language processing field (NLP) and investigate the efficiency of using a large language model (LLM) for automating the production of the pedagogical content of a curious question-asking (QA) training. We study generating the said content using the "prompt-based" method that consists of explaining the task to the LLM in natural text. We evaluate the output using human experts annotations and comparisons with hand-generated content. Results suggested indeed the relevance and usefulness of this content. We also conduct a field study in primary school (75 children aged 9-10), where we evaluate children's QA performance when having this training. We compare 3 types of content : 1) hand-generated content that proposes "closed" cues leading to predefined questions; 2) GPT-3-generated content that proposes the same type of cues; 3) GPT-3-generated content that proposes "open" cues leading to several possible questions. We see a similar QA performance between the two "closed" trainings (showing the scalability of the approach using GPT-3), and a better one for participants with the "open" training. These results suggest the efficiency of using LLMs to support children in generating more curious questions, using a natural language prompting approach that affords usability by teachers and other users not specialists of AI techniques. Furthermore, results also show that open-ended content may be more suitable for training curious question-asking skills.
△ Less
Submitted 30 May, 2023; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication
Authors:
Tristan Karch,
Yoann Lemesle,
Romain Laroche,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, gi…
▽ More
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes to feature compositions never seen during training. In addition to evaluating the communication performance of our approach, we also explore the structure of the emerging language. Specifically, we show that the resulting language forms a coherent lexicon shared between agents and that basic compositional rules on the graphical productions could not explain the compositional generalization.
△ Less
Submitted 14 February, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation
Authors:
Xingdi Yuan,
Tong Wang,
Yen-Hsiang Wang,
Emery Fine,
Rania Abdelghani,
Pauline Lucas,
Hélène Sauzéon,
Pierre-Yves Oudeyer
Abstract:
Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-b…
▽ More
Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-based approaches to selecting high-quality questions from a set of LLM-generated candidates. Our method works under the constraints of 1) a black-box (non-modifiable) question generation model and 2) lack of access to human-annotated references -- both of which are realistic limitations for real-world deployment of LLMs. With automatic as well as human evaluations, we empirically demonstrate that our approach can effectively select questions of higher qualities than greedy generation.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents
Authors:
Laetitia Teodorescu,
Eric Yuan,
Marc-Alexandre Côté,
Pierre-Yves Oudeyer
Abstract:
In this extended abstract we discuss the opportunities and challenges of studying intrinsically-motivated agents for exploration in textual environments. We argue that there is important synergy between text environments and autonomous agents. We identify key properties of text worlds that make them suitable for exploration by autonmous agents, namely, depth, breadth, progress niches and the ease…
▽ More
In this extended abstract we discuss the opportunities and challenges of studying intrinsically-motivated agents for exploration in textual environments. We argue that there is important synergy between text environments and autonomous agents. We identify key properties of text worlds that make them suitable for exploration by autonmous agents, namely, depth, breadth, progress niches and the ease of use of language goals; we identify drivers of exploration for such agents that are implementable in text worlds. We discuss the opportunities of using autonomous agents to make progress on text environment benchmarks. Finally we list some specific challenges that need to be overcome in this area.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
Authors:
Thomas Carta,
Pierre-Yves Oudeyer,
Olivier Sigaud,
Sylvain Lamprier
Abstract:
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities f…
▽ More
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory. When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal. Our experimental study shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing exploration.
△ Less
Submitted 13 October, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS
Authors:
Eleni Nisioti,
Mateo Mahaut,
Pierre-Yves Oudeyer,
Ida Momennejad,
Clément Moulin-Frier
Abstract:
Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innov…
▽ More
Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fully-connected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights.
△ Less
Submitted 18 November, 2022; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Language and Culture Internalisation for Human-Like Autotelic AI
Authors:
Cédric Colas,
Tristan Karch,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitatio…
▽ More
Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalisation or skill composition. This perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents - agents able to internalise their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian Autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development.
△ Less
Submitted 16 November, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Asking for Knowledge: Training RL Agents to Query External Knowledge Using Language
Authors:
Iou-Jen Liu,
Xingdi Yuan,
Marc-Alexandre Côté,
Pierre-Yves Oudeyer,
Alexander G. Schwing
Abstract:
To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two n…
▽ More
To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld. In addition to physical interactions, an agent can query an external knowledge source specialized for these environments to gather information. Second, we propose the "Asking for Knowledge" (AFK) agent, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks. AFK leverages a non-parametric memory, a pointer mechanism and an episodic exploration bonus to tackle (1) irrelevant information, (2) a large query language space, (3) delayed reward for making meaningful queries. Extensive experiments demonstrate that the AFK agent outperforms recent baselines on the challenging Q-BabyAI and Q-TextWorld environments.
△ Less
Submitted 3 July, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Conversational agents for fostering curiosity-driven learning in children
Authors:
Rania Abdelghani,
Pierre-Yves Oudeyer,
Edith Law,
Catherine de Vulpillières,
Hélène Sauzéon
Abstract:
Curiosity is an important factor that favors independent and individualized learning in children. Research suggests that it is also a competence that can be fostered by training specific metacognitive skills and information-searching behaviors. In this light, we develop a conversational agent that helps children generate curiosity-driven questions, and encourages their use to lead autonomous explo…
▽ More
Curiosity is an important factor that favors independent and individualized learning in children. Research suggests that it is also a competence that can be fostered by training specific metacognitive skills and information-searching behaviors. In this light, we develop a conversational agent that helps children generate curiosity-driven questions, and encourages their use to lead autonomous explorations and gain new knowledge. The study was conducted with 51 primary school students who interacted with either a neutral agent or an incentive agent that helped curiosity-driven questioning by offering specific semantic cues. Results showed a significant increase in the number and the quality of the questions generated with the incentive agent. This interaction also resulted in longer explorations and stronger learning progress. Together, our results suggest that the more our agent is able to train children's curiosity-related metacognitive skills, the better they can maintain their information-searching behaviors and the more new knowledge they are likely to acquire.
△ Less
Submitted 12 April, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Language-biased image classification: evaluation based on semantic representations
Authors:
Yoann Lemesle,
Masataka Sawayama,
Guillermo Valle-Perez,
Maxime Adolphe,
Hélène Sauzéon,
Pierre-Yves Oudeyer
Abstract:
Humans show language-biased image recognition for a word-embedded image, known as picture-word interference. Such interference depends on hierarchical semantic categories and reflects that human language processing highly interacts with visual processing. Similar to humans, recent artificial models jointly trained on texts and images, e.g., OpenAI CLIP, show language-biased image classification. E…
▽ More
Humans show language-biased image recognition for a word-embedded image, known as picture-word interference. Such interference depends on hierarchical semantic categories and reflects that human language processing highly interacts with visual processing. Similar to humans, recent artificial models jointly trained on texts and images, e.g., OpenAI CLIP, show language-biased image classification. Exploring whether the bias leads to interference similar to those observed in humans can contribute to understanding how much the model acquires hierarchical semantic representations from joint learning of language and vision. The present study introduces methodological tools from the cognitive science literature to assess the biases of artificial models. Specifically, we introduce a benchmark task to test whether words superimposed on images can distort the image classification across different category levels and, if it can, whether the perturbation is due to the shared semantic representation between language and vision. Our dataset is a set of word-embedded images and consists of a mixture of natural image datasets and hierarchical word labels with superordinate/basic category levels. Using this benchmark test, we evaluate the CLIP model. We show that presenting words distorts the image classification by the model across different category levels, but the effect does not depend on the semantic relationship between images and embedded words. This suggests that the semantic word representation in the CLIP visual processing is not shared with the image representation, although the word representation strongly dominates for word-embedded images.
△ Less
Submitted 12 March, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Learning to Guide and to Be Guided in the Architect-Builder Problem
Authors:
Paul Barde,
Tristan Karch,
Derek Nowrouzezahrai,
Clément Moulin-Frier,
Christopher Pal,
Pierre-Yves Oudeyer
Abstract:
We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task, i.e. has no access to rewards -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at…
▽ More
We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task, i.e. has no access to rewards -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at the same time evolving a shared communication protocol. Ideally, such learning should only rely on high-level communication priors and be able to handle a large variety of tasks and meanings while deriving communication protocols that can be reused across tasks. We present the Architect-Builder Problem (ABP): an asymmetrical setting in which an architect must learn to guide a builder towards constructing a specific structure. The architect knows the target structure but cannot act in the environment and can only send arbitrary messages to the builder. The builder on the other hand can act in the environment, but receives no rewards nor has any knowledge about the task, and must learn to solve it relying only on the messages sent by the architect. Crucially, the meaning of messages is initially not defined nor shared between the agents but must be negotiated throughout learning. Under these constraints, we propose Architect-Builder Iterated Guiding (ABIG), a solution to ABP where the architect leverages a learned model of the builder to guide it while the builder uses self-imitation learning to reinforce its guided behavior. We analyze the key learning mechanisms of ABIG and test it in 2D tasks involving grasping cubes, placing them at a given location, or building various shapes. ABIG results in a low-level, high-frequency, guiding communication protocol that not only enables an architect-builder pair to solve the task at hand, but that can also generalize to unseen tasks.
△ Less
Submitted 11 April, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Identifying Functions and Behaviours of Social Robots during Learning Activities: Teachers' Perspective
Authors:
Jessy Ceha,
Edith Law,
Dana Kulić,
Pierre-Yves Oudeyer,
Didier Roy
Abstract:
With advances in artificial intelligence, research is increasingly exploring the potential functions that social robots can play in education. As teachers are a critical stakeholder in the use and application of educational technologies, we conducted a study to understand teachers' perspectives on how a social robot could support a variety of learning activities in the classroom. Through interview…
▽ More
With advances in artificial intelligence, research is increasingly exploring the potential functions that social robots can play in education. As teachers are a critical stakeholder in the use and application of educational technologies, we conducted a study to understand teachers' perspectives on how a social robot could support a variety of learning activities in the classroom. Through interviews, robot puppeteering, and group brainstorming sessions with five elementary and middle school teachers from a local school in Canada, we take a socio-technical perspective to conceptualize possible robot functions and behaviours, and the effects they may have on the current way learning activities are designed, planned, and executed. Overall, the teachers responded positively to the idea of introducing a social robot as a technological tool for learning activities, envisioning differences in usage for teacher-robot and student-robot interactions. Further, Engeström's Activity System Model -- a framework for analyzing human needs, tasks, and outcomes -- illustrated a number of tensions associated with learning activities in the classroom. We discuss the fine-grained robot functions and behaviours conceived by teachers, and how they address the current tensions -- providing suggestions for improving the design of social robots for learning activities.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents
Authors:
Grgur Kovač,
Rémy Portelas,
Katja Hofmann,
Pierre-Yves Oudeyer
Abstract:
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of lang…
▽ More
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.
△ Less
Submitted 1 September, 2021; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Causal Reinforcement Learning using Observational and Interventional Data
Authors:
Maxime Gasse,
Damien Grasset,
Guillaume Gaudron,
Pierre-Yves Oudeyer
Abstract:
Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interac…
▽ More
Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas from the well-established causal framework of do-calculus, and we express model-based reinforcement learning as a causal inference problem. Then, we propose a general yet simple methodology for leveraging offline data during learning. In a nutshell, the method relies on learning a latent-based causal transition model that explains both the interventional and observational regimes, and then using the recovered latent variable to infer the standard POMDP transition model via deconfounding. We prove our method is correct and efficient in the sense that it attains better generalization guarantees due to the offline data (in the asymptotic case), and we illustrate its effectiveness empirically on synthetic toy problems. Our contribution aims at bridging the gap between the fields of reinforcement learning and causality.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Transflower: probabilistic autoregressive dance generation with multimodal attention
Authors:
Guillermo Valle-Pérez,
Gustav Eje Henter,
Jonas Beskow,
André Holzapfel,
Pierre-Yves Oudeyer,
Simon Alexanderson
Abstract:
Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic au…
▽ More
Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.
△ Less
Submitted 11 June, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Grounding Spatio-Temporal Language with Transformers
Authors:
Tristan Karch,
Laetitia Teodorescu,
Katja Hofmann,
Clément Moulin-Frier,
Pierre-Yves Oudeyer
Abstract:
Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temp…
▽ More
Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.
△ Less
Submitted 11 October, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Towards Teachable Autotelic Agents
Authors:
Olivier Sigaud,
Ahmed Akakzia,
Hugo Caselles-Dupré,
Cédric Colas,
Pierre-Yves Oudeyer,
Mohamed Chetouani
Abstract:
Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught…
▽ More
Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught by their teachers. In between should stand teachable autotelic agents (TAA): agents that learn from both internal and teaching signals to benefit from the higher efficiency of assisted discovery. Designing such agents will enable real-world non-expert users to orient the learning trajectories of agents towards their expectations. More fundamentally, this may also be a key step to build agents with human-level intelligence. This paper presents a roadmap towards the design of teachable autonomous agents. Building on developmental psychology and education sciences, we start by identifying key features enabling assisted discovery processes in child-tutor interactions. This leads to the production of a checklist of features that future TAA will need to demonstrate. The checklist allows us to precisely pinpoint the various limitations of current reinforcement learning agents and to identify the promising first steps towards TAA. It also shows the way forward by highlighting key research directions towards the design or autonomous agents that can be taught by ordinary people via natural pedagogy.
△ Less
Submitted 20 March, 2023; v1 submitted 25 May, 2021;
originally announced May 2021.
-
SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents
Authors:
Grgur Kovač,
Rémy Portelas,
Katja Hofmann,
Pierre-Yves Oudeyer
Abstract:
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary siz…
▽ More
Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. In this work we explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. We then study the limits of a recent SOTA Deep RL approach when tested on a first grid-world environment from the upcoming SocialAI, a benchmark to assess the social skills of Deep RL agents. Videos and code are available at https://sites.google.com/view/socialai01 .
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL
Authors:
Clément Romac,
Rémy Portelas,
Katja Hofmann,
Pierre-Yves Oudeyer
Abstract:
Training autonomous agents able to generalize to multiple tasks is a key target of Deep Reinforcement Learning (DRL) research. In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities. While multiple standard benchmarks exist to compare DRL agents…
▽ More
Training autonomous agents able to generalize to multiple tasks is a key target of Deep Reinforcement Learning (DRL) research. In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities. While multiple standard benchmarks exist to compare DRL agents, there is currently no such thing for ACL algorithms. Thus, comparing existing approaches is difficult, as too many experimental parameters differ from paper to paper. In this work, we identify several key challenges faced by ACL algorithms. Based on these, we present TeachMyAgent (TA), a benchmark of current ACL algorithms leveraging procedural task generation. It includes 1) challenge-specific unit-tests using variants of a procedural Box2D bipedal walker environment, and 2) a new procedural Parkour environment combining most ACL challenges, making it ideal for global performance assessment. We then use TeachMyAgent to conduct a comparative study of representative existing approaches, showcasing the competitiveness of some ACL algorithms that do not use expert knowledge. We also show that the Parkour environment remains an open problem. We open-source our environments, all studied ACL algorithms (collected from open-source code or re-implemented), and DRL students in a Python package available at https://github.com/flowersteam/TeachMyAgent.
△ Less
Submitted 9 June, 2021; v1 submitted 17 March, 2021;
originally announced March 2021.