Skip to main content

Showing 1–5 of 5 results for author: Woodside, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02675  [pdf, other

    cs.CY cs.AI

    Responsible Reporting for Frontier AI Development

    Authors: Noam Kolt, Markus Anderljung, Joslyn Barnhart, Asher Brass, Kevin Esvelt, Gillian K. Hadfield, Lennart Heim, Mikel Rodriguez, Jonas B. Sandbrink, Thomas Woodside

    Abstract: Mitigating the risks from frontier AI systems requires up-to-date and reliable information about those systems. Organizations that develop and deploy frontier systems have significant access to such information. By reporting safety-critical information to actors in government, industry, and civil society, these organizations could improve visibility into new and emerging risks posed by frontier sy… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  2. arXiv:2306.12001  [pdf, other

    cs.CY cs.AI cs.LG

    An Overview of Catastrophic AI Risks

    Authors: Dan Hendrycks, Mantas Mazeika, Thomas Woodside

    Abstract: Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitig… ▽ More

    Submitted 9 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  3. arXiv:2304.03279  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

    Authors: Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

    Abstract: Artificial agents have traditionally been trained to maximize reward, which may incentivize power-seeking and deception, analogous to how next-token prediction in language models (LMs) may incentivize toxicity. So do agents naturally learn to be Machiavellian? And how do we measure these behaviors in general-purpose models such as GPT-4? Towards answering these questions, we introduce MACHIAVELLI,… ▽ More

    Submitted 12 June, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Comments: ICML 2023 Oral (camera-ready); 31 pages, 5 figures

  4. arXiv:2303.08721  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Artificial Influence: An Analysis Of AI-Driven Persuasion

    Authors: Matthew Burtell, Thomas Woodside

    Abstract: Persuasion is a key aspect of what it means to be human, and is central to business, politics, and other endeavors. Advancements in artificial intelligence (AI) have produced AI systems that are capable of persuading humans to buy products, watch videos, click on search results, and more. Even systems that are not explicitly designed to persuade may do so in practice. In the future, increasingly a… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 8 pages

  5. arXiv:2301.00876  [pdf, other

    cs.CL

    MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

    Authors: Steven H. Wang, Antoine Scardigli, Leonard Tang, Wei Chen, Dimitry Levkin, Anya Chen, Spencer Ball, Thomas Woodside, Oliver Zhang, Dan Hendrycks

    Abstract: Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 3… ▽ More

    Submitted 24 November, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: EMNLP 2023. 5 pages + appendix. Code and dataset are available at https://github.com/TheAtticusProject/maud