Evaluating the Prompt Steerability of Large Language Models

Miehling, Erik; Desmond, Michael; Ramamurthy, Karthikeyan Natesan; Daly, Elizabeth M.; Dognin, Pierre; Rios, Jesus; Bouneffouf, Djallel; Liu, Miao

Computer Science > Computation and Language

arXiv:2411.12405 (cs)

[Submitted on 19 Nov 2024 (v1), last revised 15 Feb 2025 (this version, v2)]

Title:Evaluating the Prompt Steerability of Large Language Models

Authors:Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu

View PDF HTML (experimental)

Abstract:Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model's joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited -- due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at this https URL.

Comments:	Short version appeared at the Pluralistic Alignment workshop at NeurIPS 2024; extended version appeared at NAACL 2025
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2411.12405 [cs.CL]
	(or arXiv:2411.12405v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.12405

Submission history

From: Erik Miehling [view email]
[v1] Tue, 19 Nov 2024 10:41:54 UTC (4,271 KB)
[v2] Sat, 15 Feb 2025 15:34:26 UTC (5,887 KB)

Computer Science > Computation and Language

Title:Evaluating the Prompt Steerability of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluating the Prompt Steerability of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators