In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Authors:
Shayne Longpre,
Kevin Klyman,
Ruth E. Appel,
Sayash Kapoor,
Rishi Bommasani,
Michelle Sahar,
Sean McGregor,
Avijit Ghosh,
Borhane Blili-Hamelin,
Nathan Butters,
Alondra Nelson,
Amit Elazari,
Andrew Sellars,
Casey John Ellis,
Dane Sherrets,
Dawn Song,
Harley Geiger,
Ilona Cohen,
Lauren McIlvenny,
Madhulika Srikumar,
Mark M. Jaycox,
Markus Anderljung,
Nadine Farid Johnson,
Nicholas Carlini,
Nicolas Miailhe
, et al. (9 additional authors not shown)
Abstract:
The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and…
▽ More
The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and policy, we identify key gaps in the evaluation and reporting of flaws in GPAI systems. We call for three interventions to advance system safety. First, we propose using standardized AI flaw reports and rules of engagement for researchers in order to ease the process of submitting, reproducing, and triaging flaws in GPAI systems. Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs, borrowing from bug bounties, with legal safe harbors to protect researchers. Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports across the many stakeholders who may be impacted. These interventions are increasingly urgent, as evidenced by the prevalence of jailbreaks and other flaws that can transfer across different providers' GPAI systems. By promoting robust reporting and coordination in the AI ecosystem, these proposals could significantly improve the safety, security, and accountability of GPAI systems.
△ Less
Submitted 25 March, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation
Authors:
Greg Durrett,
Jonathan K. Kummerfeld,
Taylor Berg-Kirkpatrick,
Rebecca S. Portnoff,
Sadia Afroz,
Damon McCoy,
Kirill Levchenko,
Vern Paxson
Abstract:
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate d…
▽ More
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own "fine-grained domain" in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this data, which suggests the need to improve these techniques. We release a dataset of 1,938 annotated posts from across the four forums.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.