-
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
Authors:
Joel Z. Leibo,
Alexander Sasha Vezhnevets,
William A. Cunningham,
Sébastien Krier,
Manfred Diaz,
Simon Osindero
Abstract:
Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ensuring they operate in a safe and ethically acceptable fashion is thus critical. However, most solutions have been a form of one-size-fits-all "alignment". We are worried that such systems, which overl…
▽ More
Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ensuring they operate in a safe and ethically acceptable fashion is thus critical. However, most solutions have been a form of one-size-fits-all "alignment". We are worried that such systems, which overlook enduring moral diversity, will spark resistance, erode trust, and destabilize our institutions. This paper traces the underlying problem to an often-unstated Axiom of Rational Convergence: the idea that under ideal conditions, rational agents will converge in the limit of conversation on a single ethics. Treating that premise as both optional and doubtful, we propose what we call the appropriateness framework: an alternative approach grounded in conflict theory, cultural evolution, multi-agent systems, and institutional economics. The appropriateness framework treats persistent disagreement as the normal case and designs for it by applying four principles: (1) contextual grounding, (2) community customization, (3) continual adaptation, and (4) polycentric governance. We argue here that adopting these design principles is a good way to shift the main alignment metaphor from moral unification to a more productive metaphor of conflict management, and that taking this step is both desirable and urgent.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
An Approach to Technical AGI Safety and Security
Authors:
Rohin Shah,
Alex Irpan,
Alexander Matt Turner,
Anna Wang,
Arthur Conmy,
David Lindner,
Jonah Brown-Cohen,
Lewis Ho,
Neel Nanda,
Raluca Ada Popa,
Rishub Jain,
Rory Greig,
Samuel Albanie,
Scott Emmons,
Sebastian Farquhar,
Sébastien Krier,
Senthooran Rajamanoharan,
Sophie Bridgers,
Tobi Ijitoye,
Tom Everitt,
Victoria Krakovna,
Vikrant Varma,
Vladimir Mikulik,
Zachary Kenton,
Dave Orr
, et al. (5 additional authors not shown)
Abstract:
Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, we focus on technical approaches to misuse and misalignment. For misuse, our strategy aims…
▽ More
Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, we focus on technical approaches to misuse and misalignment. For misuse, our strategy aims to prevent threat actors from accessing dangerous capabilities, by proactively identifying dangerous capabilities, and implementing robust security, access restrictions, monitoring, and model safety mitigations. To address misalignment, we outline two lines of defense. First, model-level mitigations such as amplified oversight and robust training can help to build an aligned model. Second, system-level security measures such as monitoring and access control can mitigate harm even if the model is misaligned. Techniques from interpretability, uncertainty estimation, and safer design patterns can enhance the effectiveness of these mitigations. Finally, we briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
AGI, Governments, and Free Societies
Authors:
Justin B. Bullock,
Samuel Hammond,
Seb Krier
Abstract:
This paper examines how artificial general intelligence (AGI) could fundamentally reshape the delicate balance between state capacity and individual liberty that sustains free societies. Building on Acemoglu and Robinson's 'narrow corridor' framework, we argue that AGI poses distinct risks of pushing societies toward either a 'despotic Leviathan' through enhanced state surveillance and control, or…
▽ More
This paper examines how artificial general intelligence (AGI) could fundamentally reshape the delicate balance between state capacity and individual liberty that sustains free societies. Building on Acemoglu and Robinson's 'narrow corridor' framework, we argue that AGI poses distinct risks of pushing societies toward either a 'despotic Leviathan' through enhanced state surveillance and control, or an 'absent Leviathan' through the erosion of state legitimacy relative to AGI-empowered non-state actors. Drawing on public administration theory and recent advances in AI capabilities, we analyze how these dynamics could unfold through three key channels: the automation of discretionary decision-making within agencies, the evolution of bureaucratic structures toward system-level architectures, and the transformation of democratic feedback mechanisms. Our analysis reveals specific failure modes that could destabilize liberal institutions. Enhanced state capacity through AGI could enable unprecedented surveillance and control, potentially entrenching authoritarian practices. Conversely, rapid diffusion of AGI capabilities to non-state actors could undermine state legitimacy and governability. We examine how these risks manifest differently at the micro level of individual bureaucratic decisions, the meso level of organizational structure, and the macro level of democratic processes. To preserve the narrow corridor of liberty, we propose a governance framework emphasizing robust technical safeguards, hybrid institutional designs that maintain meaningful human oversight, and adaptive regulatory mechanisms.
△ Less
Submitted 13 March, 2025; v1 submitted 13 February, 2025;
originally announced March 2025.
-
Multi-Agent Risks from Advanced AI
Authors:
Lewis Hammond,
Alan Chan,
Jesse Clifton,
Jason Hoelscher-Obermaier,
Akbir Khan,
Euan McLean,
Chandler Smith,
Wolfram Barfuss,
Jakob Foerster,
Tomáš Gavenčiak,
The Anh Han,
Edward Hughes,
Vojtěch Kovařík,
Jan Kulveit,
Joel Z. Leibo,
Caspar Oesterheld,
Christian Schroeder de Witt,
Nisarg Shah,
Michael Wellman,
Paolo Bova,
Theodor Cimpeanu,
Carson Ezell,
Quentin Feuillade-Montixi,
Matija Franklin,
Esben Kran
, et al. (19 additional authors not shown)
Abstract:
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a…
▽ More
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Morphology-assisted galaxy mass-to-light predictions using deep learning
Authors:
Wouter Dobbels,
Serge Krier,
Stephan Pirson,
Sébastien Viaene,
Gert De Geyter,
Samir Salim,
Maarten Baes
Abstract:
One of the most important properties of a galaxy is the total stellar mass, or equivalently the stellar mass-to-light ratio (M/L). It is not directly observable, but can be estimated from stellar population synthesis. Currently, a galaxy's M/L is typically estimated from global fluxes. For example, a single global g - i colour correlates well with the stellar M/L. Spectral energy distribution (SED…
▽ More
One of the most important properties of a galaxy is the total stellar mass, or equivalently the stellar mass-to-light ratio (M/L). It is not directly observable, but can be estimated from stellar population synthesis. Currently, a galaxy's M/L is typically estimated from global fluxes. For example, a single global g - i colour correlates well with the stellar M/L. Spectral energy distribution (SED) fitting can make use of all available fluxes and their errors to make a Bayesian estimate of the M/L. We want to investigate the possibility of using morphology information to assist predictions of M/L. Our first goal is to develop and train a method that only requires a g-band image and redshift as input. This will allows us to study the correlation between M/L and morphology. Next, we can also include the i-band flux, and determine if morphology provides additional constraints compared to a method that only uses g- and i-band fluxes. We used a machine learning pipeline that can be split in two steps. First, we detected morphology features with a convolutional neural network. These are then combined with redshift, pixel size and g-band luminosity features in a gradient boosting machine. Our training target was the M/L acquired from the GALEX-SDSS-WISE Legacy Catalog, which uses global SED fitting and contains galaxies with z ~ 0.1. Morphology is a useful attribute when no colour information is available, but can not outperform colour methods on its own. When we combine the morphology features with global g- and i-band luminosities, we find an improved estimate compared to a model which does not make use of morphology. While our method was trained to reproduce global SED fitted M/L, galaxy morphology gives us an important additional constraint when using one or two bands. Our framework can be extended to other problems to make use of morphological information.
△ Less
Submitted 12 March, 2019;
originally announced March 2019.