CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Alberts, Lize; Ellis, Benjamin; Lupu, Andrei; Foerster, Jakob

Computer Science > Human-Computer Interaction

arXiv:2410.21159 (cs)

[Submitted on 28 Oct 2024 (v1), last revised 30 Jan 2025 (this version, v2)]

Title:CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Authors:Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

View PDF HTML (experimental)

Abstract:We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (with 337 use cases each) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising desires above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of personalised thinking. We find that prompting LLMs to consider safety-critical context significantly improves performance, unlike a generic 'harmless and helpful' instruction. Based on these findings, we propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants. Our work emphasises the need for nuanced, context-aware approaches to alignment in systems designed for persistent human interaction, aiding the development of safe and considerate AI assistants.

Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
MSC classes:	68T05
ACM classes:	I.2.0; I.2.7; K.4.2; H.5.2; I.2.6
Cite as:	arXiv:2410.21159 [cs.HC]
	(or arXiv:2410.21159v2 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2410.21159

Submission history

From: Lize Alberts [view email]
[v1] Mon, 28 Oct 2024 15:59:31 UTC (717 KB)
[v2] Thu, 30 Jan 2025 01:29:03 UTC (763 KB)

Computer Science > Human-Computer Interaction

Title:CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators