Safeguarding the safeguards: How best to promote AI alignment in the public interest

Guest, Oliver; Aird, Michael; hÉigeartaigh, Seán Ó

Computer Science > Computers and Society

arXiv:2312.08039 (cs)

[Submitted on 13 Dec 2023 (v1), last revised 15 Dec 2023 (this version, v2)]

Title:Safeguarding the safeguards: How best to promote AI alignment in the public interest

Authors:Oliver Guest, Michael Aird, Seán Ó hÉigeartaigh

View PDF

Abstract:AI alignment work is important from both a commercial and a safety lens. With this paper, we aim to help actors who support alignment efforts to make these efforts as effective as possible, and to avoid potential adverse effects. We begin by suggesting that institutions that are trying to act in the public interest (such as governments) should aim to support specifically alignment work that reduces accident or misuse risks. We then describe four problems which might cause alignment efforts to be counterproductive, increasing large-scale AI risks. We suggest mitigations for each problem. Finally, we make a broader recommendation that institutions trying to act in the public interest should think systematically about how to make their alignment efforts as effective, and as likely to be beneficial, as possible.

Comments:	Update Dec-15: Added a missing acknowledgement and fixed minor formatting errors
Subjects:	Computers and Society (cs.CY)
Cite as:	arXiv:2312.08039 [cs.CY]
	(or arXiv:2312.08039v2 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.2312.08039

Submission history

From: Oliver Guest [view email]
[v1] Wed, 13 Dec 2023 10:36:10 UTC (418 KB)
[v2] Fri, 15 Dec 2023 07:55:48 UTC (418 KB)

Computer Science > Computers and Society

Title:Safeguarding the safeguards: How best to promote AI alignment in the public interest

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:Safeguarding the safeguards: How best to promote AI alignment in the public interest

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators