-
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Authors:
Adam Dahlgren Lindström,
Leila Methnani,
Lea Krause,
Petter Ericson,
Íñigo Martínez de Rituerto de Troya,
Dimitri Coelho Mollo,
Roel Dobbe
Abstract:
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and he…
▽ More
This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback (RLxF) methods, involving either human feedback (RLHF) or AI feedback (RLAIF). Specifically, we show the shortcomings of the broadly pursued alignment goals of honesty, harmlessness, and helpfulness. Through a multidisciplinary sociotechnical critique, we examine both the theoretical underpinnings and practical implementations of RLxF techniques, revealing significant limitations in their approach to capturing the complexities of human ethics and contributing to AI safety. We highlight tensions and contradictions inherent in the goals of RLxF. In addition, we discuss ethically-relevant issues that tend to be neglected in discussions about alignment and RLxF, among which the trade-offs between user-friendliness and deception, flexibility and interpretability, and system safety. We conclude by urging researchers and practitioners alike to critically assess the sociotechnical ramifications of RLxF, advocating for a more nuanced and reflective approach to its application in AI development.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
A Shift In Artistic Practices through Artificial Intelligence
Authors:
Kıvanç Tatar,
Petter Ericson,
Kelsey Cotton,
Paola Torres Núñez del Prado,
Roser Batlle-Roca,
Beatriz Cabrero-Daniel,
Sara Ljungblad,
Georgios Diapoulis,
Jabbar Hussain
Abstract:
The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globall…
▽ More
The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globally, how does this new paradigm shift challenge the status quo in artistic practices? What kind of changes will AI technology bring to music, arts, and new media?
△ Less
Submitted 10 April, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
ACROCPoLis: A Descriptive Framework for Making Sense of Fairness
Authors:
Andrea Aler Tubella,
Dimitri Coelho Mollo,
Adam Dahlgren Lindström,
Hannah Devinney,
Virginia Dignum,
Petter Ericson,
Anna Jonsson,
Timotheus Kampik,
Tom Lenaerts,
Julian Alfredo Mendez,
Juan Carlos Nieves
Abstract:
Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions…
▽ More
Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions and allow for the inclusion of societal and relational aspects to represent how the effects of AI systems impact and are experienced by individuals and social groups. In this paper, we do this by means of proposing the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects. The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit, as well as their interrelationships. This enables us to compare analogous situations, to highlight the differences in dissimilar situations, and to capture differing interpretations of the same situation by different stakeholders.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.