Skip to main content

Showing 1–3 of 3 results for author: Gabrieli, N

.
  1. arXiv:2410.13042  [pdf, ps, other

    cs.CY

    How Do AI Companies "Fine-Tune" Policy? Examining Regulatory Capture in AI Governance

    Authors: Kevin Wei, Carson Ezell, Nick Gabrieli, Chinmay Deshpande

    Abstract: Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI develope… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 39 pages (14 pages main text), 3 figures, 9 tables. To be published in the Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, & Society (AIES)

    Journal ref: Proc. AAAI/ACM Conf. AI, Ethics & Soc., 7 (2024) 1539-1555

  2. arXiv:2403.10462  [pdf, other

    cs.CY cs.AI

    Safety Cases: How to Justify the Safety of Advanced AI Systems

    Authors: Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen

    Abstract: As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2312.06681  [pdf, other

    cs.CL cs.AI cs.LG

    Steering Llama 2 via Contrastive Activation Addition

    Authors: Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

    Abstract: We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes. CAA computes "steering vectors" by averaging the difference in residual stream activations between pairs of positive and negative examples of a particular behavior, such as factual versus hallucinatory responses. During inference, these steerin… ▽ More

    Submitted 5 July, 2024; v1 submitted 8 December, 2023; originally announced December 2023.