PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

Deng, Wesley Hanwen; Kim, Sunnie S. Y.; Jha, Akshita; Holstein, Ken; Eslami, Motahhare; Wilcox, Lauren; Gatys, Leon A

Computer Science > Artificial Intelligence

arXiv:2509.03728v2 (cs)

[Submitted on 3 Sep 2025 (v1), revised 5 Sep 2025 (this version, v2), latest version 27 Oct 2025 (v3)]

Title:PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

Authors:Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys

View PDF

Abstract:Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming approaches promise to complement human red-teaming by enabling larger-scale exploration of model behavior, current approaches do not consider the role of identity. As an initial step towards incorporating people's background and identities in automated red-teaming, we develop and evaluate a novel method, PersonaTeaming, that introduces personas in the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. In particular, we first introduce a methodology for mutating prompts based on either "red-teaming expert" personas or "regular AI user" personas. We then develop a dynamic persona-generating algorithm that automatically generates various persona types adaptive to different seed prompts. In addition, we develop a set of new metrics to explicitly measure the "mutation distance" to complement existing diversity measurements of adversarial prompts. Our experiments show promising improvements (up to 144.1%) in the attack success rates of adversarial prompts through persona mutation, while maintaining prompt diversity, compared to RainbowPlus, a state-of-the-art automated red-teaming method. We discuss the strengths and limitations of different persona types and mutation methods, shedding light on future opportunities to explore complementarities between automated and human red-teaming approaches.

Subjects:	Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2509.03728 [cs.AI]
	(or arXiv:2509.03728v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2509.03728

Submission history

From: Wesley Deng [view email]
[v1] Wed, 3 Sep 2025 21:20:38 UTC (88 KB)
[v2] Fri, 5 Sep 2025 02:25:11 UTC (88 KB)
[v3] Mon, 27 Oct 2025 04:01:19 UTC (88 KB)

Computer Science > Artificial Intelligence

Title:PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators