Skip to main content

Showing 1–3 of 3 results for author: Goodfriend, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.18837  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

    Authors: Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin , et al. (18 additional authors not shown)

    Abstract: Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by promptin… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  2. A Competition Winning Deep Reinforcement Learning Agent in microRTS

    Authors: Scott Goodfriend

    Abstract: Scripted agents have predominantly won the five previous iterations of the IEEE microRTS ($μ$RTS) competitions hosted at CIG and CoG. Despite Deep Reinforcement Learning (DRL) algorithms making significant strides in real-time strategy (RTS) games, their adoption in this primarily academic competition has been limited due to the considerable training resources required and the complexity inherent… ▽ More

    Submitted 2 January, 2025; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Best paper award nominee at IEEE Conference on Games 2024. 19 pages, 6 figures. Source code at https://github.com/sgoodfriend/rl-algo-impls

    Journal ref: 2024 IEEE Conference on Games (CoG), Milan, Italy, 2024, pp. 1-8

  3. arXiv:1301.2626  [pdf, other

    cs.DS cs.CC cs.CG

    Active Self-Assembly of Algorithmic Shapes and Patterns in Polylogarithmic Time

    Authors: Damien Woods, Ho-Lin Chen, Scott Goodfriend, Nadine Dabby, Erik Winfree, Peng Yin

    Abstract: We describe a computational model for studying the complexity of self-assembled structures with active molecular components. Our model captures notions of growth and movement ubiquitous in biological systems. The model is inspired by biology's fantastic ability to assemble biomolecules that form systems with complicated structure and dynamics, from molecular motors that walk on rigid tracks and pr… ▽ More

    Submitted 11 January, 2013; originally announced January 2013.