-
PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System
Authors:
Gary D. Lopez Munoz,
Amanda J. Minnich,
Roman Lutz,
Richard Lundeen,
Raja Sekhar Rao Dheekonda,
Nina Chikanov,
Bolor-Erdene Jagdagdorj,
Martin Pouliot,
Shiven Chawla,
Whitney Maxwell,
Blake Bullwinkel,
Katherine Pratt,
Joris de Gruyter,
Charlotte Siska,
Pete Bryan,
Tori Westerhoff,
Chang Kawaguchi,
Christian Seifert,
Ram Shankar Siva Kumar,
Yonatan Zunger
Abstract:
Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit…
▽ More
Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems. PyRIT is a model- and platform-agnostic tool that enables red teamers to probe for and identify novel harms, risks, and jailbreaks in multimodal generative AI models. Its composable architecture facilitates the reuse of core building blocks and allows for extensibility to future models and modalities. This paper details the challenges specific to red teaming generative AI systems, the development and features of PyRIT, and its practical applications in real-world scenarios.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle
Authors:
Emman Haider,
Daniel Perez-Becker,
Thomas Portet,
Piyush Madan,
Amit Garg,
Atabak Ashfaq,
David Majercak,
Wen Wen,
Dongwoo Kim,
Ziyi Yang,
Jianwen Zhang,
Hiteshi Sharma,
Blake Bullwinkel,
Martin Pouliot,
Amanda Minnich,
Shiven Chawla,
Solianna Herrera,
Shahed Warreth,
Maggie Engler,
Gary Lopez,
Nina Chikanov,
Raja Sekhar Rao Dheekonda,
Bolor-Erdene Jagdagdorj,
Roman Lutz,
Richard Lundeen
, et al. (6 additional authors not shown)
Abstract:
Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3…
▽ More
Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3 series of language models. We utilized a "break-fix" cycle, performing multiple rounds of dataset curation, safety post-training, benchmarking, red teaming, and vulnerability identification to cover a variety of harm areas in both single and multi-turn scenarios. Our results indicate that this approach iteratively improved the performance of the Phi-3 models across a wide range of responsible AI benchmarks. Finally, we include additional red teaming strategies and evaluations that were used to test the safety behavior of Phi-3.5-mini and Phi-3.5-MoE, which were optimized for multilingual capabilities.
△ Less
Submitted 22 August, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
The Habitable Exoplanet Observatory (HabEx) Mission Concept Study Final Report
Authors:
B. Scott Gaudi,
Sara Seager,
Bertrand Mennesson,
Alina Kiessling,
Keith Warfield,
Kerri Cahoy,
John T. Clarke,
Shawn Domagal-Goldman,
Lee Feinberg,
Olivier Guyon,
Jeremy Kasdin,
Dimitri Mawet,
Peter Plavchan,
Tyler Robinson,
Leslie Rogers,
Paul Scowen,
Rachel Somerville,
Karl Stapelfeldt,
Christopher Stark,
Daniel Stern,
Margaret Turnbull,
Rashied Amini,
Gary Kuan,
Stefan Martin,
Rhonda Morgan
, et al. (161 additional authors not shown)
Abstract:
The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Su…
▽ More
The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Such a mission can also be equipped with instrumentation that will enable broad and exciting general astrophysics and planetary science not possible from current or planned facilities. HabEx is a space telescope with unique imaging and multi-object spectroscopic capabilities at wavelengths ranging from ultraviolet (UV) to near-IR. These capabilities allow for a broad suite of compelling science that cuts across the entire NASA astrophysics portfolio. HabEx has three primary science goals: (1) Seek out nearby worlds and explore their habitability; (2) Map out nearby planetary systems and understand the diversity of the worlds they contain; (3) Enable new explorations of astrophysical systems from our own solar system to external galaxies by extending our reach in the UV through near-IR. This Great Observatory science will be selected through a competed GO program, and will account for about 50% of the HabEx primary mission. The preferred HabEx architecture is a 4m, monolithic, off-axis telescope that is diffraction-limited at 0.4 microns and is in an L2 orbit. HabEx employs two starlight suppression systems: a coronagraph and a starshade, each with their own dedicated instrument.
△ Less
Submitted 26 January, 2020; v1 submitted 18 January, 2020;
originally announced January 2020.