-
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Authors:
Shayne Longpre,
Kevin Klyman,
Ruth E. Appel,
Sayash Kapoor,
Rishi Bommasani,
Michelle Sahar,
Sean McGregor,
Avijit Ghosh,
Borhane Blili-Hamelin,
Nathan Butters,
Alondra Nelson,
Amit Elazari,
Andrew Sellars,
Casey John Ellis,
Dane Sherrets,
Dawn Song,
Harley Geiger,
Ilona Cohen,
Lauren McIlvenny,
Madhulika Srikumar,
Mark M. Jaycox,
Markus Anderljung,
Nadine Farid Johnson,
Nicholas Carlini,
Nicolas Miailhe
, et al. (9 additional authors not shown)
Abstract:
The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and…
▽ More
The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and policy, we identify key gaps in the evaluation and reporting of flaws in GPAI systems. We call for three interventions to advance system safety. First, we propose using standardized AI flaw reports and rules of engagement for researchers in order to ease the process of submitting, reproducing, and triaging flaws in GPAI systems. Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs, borrowing from bug bounties, with legal safe harbors to protect researchers. Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports across the many stakeholders who may be impacted. These interventions are increasingly urgent, as evidenced by the prevalence of jailbreaks and other flaws that can transfer across different providers' GPAI systems. By promoting robust reporting and coordination in the AI ecosystem, these proposals could significantly improve the safety, security, and accountability of GPAI systems.
△ Less
Submitted 25 March, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
-
Lessons From Red Teaming 100 Generative AI Products
Authors:
Blake Bullwinkel,
Amanda Minnich,
Shiven Chawla,
Gary Lopez,
Martin Pouliot,
Whitney Maxwell,
Joris de Gruyter,
Katherine Pratt,
Saphir Qi,
Nina Chikanov,
Roman Lutz,
Raja Sekhar Rao Dheekonda,
Bolor-Erdene Jagdagdorj,
Eugenia Kim,
Justin Song,
Keegan Hines,
Daniel Jones,
Giorgio Severi,
Richard Lundeen,
Sam Vaughan,
Victoria Westerhoff,
Pete Bryan,
Ram Shankar Siva Kumar,
Yonatan Zunger,
Chang Kawaguchi
, et al. (1 additional authors not shown)
Abstract:
In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our internal threat model ontology and eight main lessons we have lea…
▽ More
In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted. Based on our experience red teaming over 100 generative AI products at Microsoft, we present our internal threat model ontology and eight main lessons we have learned:
1. Understand what the system can do and where it is applied
2. You don't have to compute gradients to break an AI system
3. AI red teaming is not safety benchmarking
4. Automation can help cover more of the risk landscape
5. The human element of AI red teaming is crucial
6. Responsible AI harms are pervasive but difficult to measure
7. LLMs amplify existing security risks and introduce new ones
8. The work of securing AI systems will never be complete
By sharing these insights alongside case studies from our operations, we offer practical recommendations aimed at aligning red teaming efforts with real world risks. We also highlight aspects of AI red teaming that we believe are often misunderstood and discuss open questions for the field to consider.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Control of Spatially Heterogeneous and Time-Varying Cellular Reaction Networks: A New Summation Law
Authors:
Mark A. Peletier,
Hans V. Westerhoff,
Boris N. Kholodenko
Abstract:
A hallmark of a plethora of intracellular signaling pathways is the spatial separation of activation and deactivation processes that potentially results in precipitous gradients of activated proteins. The classical Metabolic Control Analysis (MCA), which quantifies the influence of an individual process on a system variable as the control coefficient, cannot be applied to spatially separated pro…
▽ More
A hallmark of a plethora of intracellular signaling pathways is the spatial separation of activation and deactivation processes that potentially results in precipitous gradients of activated proteins. The classical Metabolic Control Analysis (MCA), which quantifies the influence of an individual process on a system variable as the control coefficient, cannot be applied to spatially separated protein networks. The present paper unravels the principles that govern the control over the fluxes and intermediate concentrations in spatially heterogeneous reaction networks. Our main results are two types of the control summation theorems. The first type is a non-trivial generalization of the classical theorems to systems with spatially and temporally varying concentrations. In this generalization, the process of diffusion, which enters as the result of spatial concentration gradients, plays a role similar to other processes such as chemical reactions and membrane transport. The second summation theorem is completely novel. It states that the control by the membrane transport, the diffusion control coefficient multiplied by two, and a newly introduced control coefficient associated with changes in the spatial size of a system (e.g., cell), all add up to one and zero for the control over flux and concentration. Using a simple example of a kinase/phosphatase system in a spherical cell, we speculate that unless active mechanisms of intracellular transport are involved, the threshold cell size is limited by the diffusion control, when it is beginning to exceed the spatial control coefficient significantly.
△ Less
Submitted 6 November, 2002;
originally announced November 2002.