Search | arXiv e-print repository

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Authors: Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami , et al. (77 additional authors not shown)

Abstract: The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance… ▽ More The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories, including violent crimes, nonviolent crimes, sex-related crimes, child sexual exploitation, indiscriminate weapons, suicide and self-harm, intellectual property, privacy, defamation, hate, sexual content, and specialized advice (election, financial, health, legal). Our method incorporates a complete assessment standard, extensive prompt datasets, a novel evaluation framework, a grading and reporting system, and the technical as well as organizational infrastructure for long-term support and evolution. In particular, the benchmark employs an understandable five-tier grading scale (Poor to Excellent) and incorporates an innovative entropy-based system-response evaluation. In addition to unveiling the benchmark, this report also identifies limitations of our method and of building safety benchmarks generally, including evaluator uncertainty and the constraints of single-turn interactions. This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories. Our findings provide valuable insights for model developers, system integrators, and policymakers working to promote safer AI deployment. △ Less

Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced March 2025.

Comments: 51 pages, 8 figures and an appendix

arXiv:2501.10057 [pdf, other]

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Authors: Paul Röttger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma Jeretič, Flor Miriam Plaza-del-Arco, Donya Rooein, Patrick Schramowski, Anastassia Shaitarova, Xudong Shen, Richard Willats, Andrea Zugarini, Bertie Vidgen

Abstract: Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created b… ▽ More Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking. △ Less

Submitted 17 January, 2025; originally announced January 2025.

Comments: under review

arXiv:2212.14105 [pdf, other]

Supercompliers

Authors: Matthew L. Comey, Amanda R. Eng, Pauline Leung, Zhuan Pei

Abstract: In a binary-treatment instrumental variable framework, we define supercompliers as the subpopulation whose treatment take-up positively responds to eligibility and whose outcome positively responds to take-up. Supercompliers are the only subpopulation to benefit from treatment eligibility and, hence, are important for policy. We provide tools to characterize supercompliers under a set of jointly t… ▽ More In a binary-treatment instrumental variable framework, we define supercompliers as the subpopulation whose treatment take-up positively responds to eligibility and whose outcome positively responds to take-up. Supercompliers are the only subpopulation to benefit from treatment eligibility and, hence, are important for policy. We provide tools to characterize supercompliers under a set of jointly testable assumptions. Specifically, we require standard assumptions from the local average treatment effect literature plus an outcome monotonicity assumption. Estimation and inference can be conducted with instrumental variable regression. In two job-training experiments, we demonstrate our machinery's utility, particularly in incorporating social welfare weights into marginal-value-of-public-funds analysis. △ Less

Submitted 20 December, 2024; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: This version substantially revises v2. Pauline Leung has made significant contributions and is now a coauthor. We expand the non-binary outcome case, essential in the new connection to MVPF (Section 3). We replace the original empirical application with two job training experiments (Section 4), add new theoretical results in Remark 5, Appendix A.3, and A.7. References are updated

arXiv:2001.06683 [pdf]

The Habitable Exoplanet Observatory (HabEx) Mission Concept Study Final Report

Authors: B. Scott Gaudi, Sara Seager, Bertrand Mennesson, Alina Kiessling, Keith Warfield, Kerri Cahoy, John T. Clarke, Shawn Domagal-Goldman, Lee Feinberg, Olivier Guyon, Jeremy Kasdin, Dimitri Mawet, Peter Plavchan, Tyler Robinson, Leslie Rogers, Paul Scowen, Rachel Somerville, Karl Stapelfeldt, Christopher Stark, Daniel Stern, Margaret Turnbull, Rashied Amini, Gary Kuan, Stefan Martin, Rhonda Morgan , et al. (161 additional authors not shown)

Abstract: The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Su… ▽ More The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Such a mission can also be equipped with instrumentation that will enable broad and exciting general astrophysics and planetary science not possible from current or planned facilities. HabEx is a space telescope with unique imaging and multi-object spectroscopic capabilities at wavelengths ranging from ultraviolet (UV) to near-IR. These capabilities allow for a broad suite of compelling science that cuts across the entire NASA astrophysics portfolio. HabEx has three primary science goals: (1) Seek out nearby worlds and explore their habitability; (2) Map out nearby planetary systems and understand the diversity of the worlds they contain; (3) Enable new explorations of astrophysical systems from our own solar system to external galaxies by extending our reach in the UV through near-IR. This Great Observatory science will be selected through a competed GO program, and will account for about 50% of the HabEx primary mission. The preferred HabEx architecture is a 4m, monolithic, off-axis telescope that is diffraction-limited at 0.4 microns and is in an L2 orbit. HabEx employs two starlight suppression systems: a coronagraph and a starshade, each with their own dedicated instrument. △ Less

Submitted 26 January, 2020; v1 submitted 18 January, 2020; originally announced January 2020.

Comments: Full report: 498 pages. Executive Summary: 14 pages. More information about HabEx can be found here: https://www.jpl.nasa.gov/habex/

arXiv:1809.09674 [pdf]

The Habitable Exoplanet Observatory (HabEx) Mission Concept Study Interim Report

Authors: B. Scott Gaudi, Sara Seager, Bertrand Mennesson, Alina Kiessling, Keith Warfield, Gary Kuan, Kerri Cahoy, John T. Clarke, Shawn Domagal-Goldman, Lee Feinberg, Olivier Guyon, Jeremy Kasdin, Dimitri Mawet, Tyler Robinson, Leslie Rogers, Paul Scowen, Rachel Somerville, Karl Stapelfeldt, Christopher Stark, Daniel Stern, Margaret Turnbull, Stefan Martin, Oscar Alvarez-Salazar, Rashied Amini, William Arnold , et al. (57 additional authors not shown)

Abstract: For the first time in human history, technologies have matured sufficiently to enable a mission capable of discovering and characterizing habitable planets like Earth orbiting sunlike stars other than the Sun. At the same time, such a platform would enable unique science not possible from ground-based facilities. This science is broad and exciting, ranging from new investigations of our own solar… ▽ More For the first time in human history, technologies have matured sufficiently to enable a mission capable of discovering and characterizing habitable planets like Earth orbiting sunlike stars other than the Sun. At the same time, such a platform would enable unique science not possible from ground-based facilities. This science is broad and exciting, ranging from new investigations of our own solar system to a full range of astrophysics disciplines. The Habitable Exoplanet Observatory, or HabEx, is one of four studies currently being undertaken by NASA in preparation for the 2020 Astrophysics Decadal Survey. HabEx has been designed to be the Great Observatory of the 2030s, with community involvement through a competed and funded Guest Observer (GO) program. This interim report describes the HabEx baseline concept, which is a space-based 4-meter diameter telescope mission concept with ultraviolet (UV), optical, and near-infrared (near-IR) imaging and spectroscopy capabilities. More information on HabEx can be found at https://www.jpl.nasa.gov/habex △ Less

Submitted 25 September, 2018; originally announced September 2018.

Comments: 212 Pages

arXiv:1805.06880 [pdf, other]

It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

Authors: Matteo Ruggero Ronchi, Oisin Mac Aodha, Robert Eng, Pietro Perona

Abstract: We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms… ▽ More We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose algorithms. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed algorithm opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses. △ Less

Submitted 27 July, 2018; v1 submitted 17 May, 2018; originally announced May 2018.

Comments: BMVC 2018. Project page available at http://www.vision.caltech.edu/~mronchi/projects/RelativePose

Showing 1–6 of 6 results for author: Eng, R