Skip to main content

Showing 1–2 of 2 results for author: Carlsmith, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.08379  [pdf, other

    cs.CY cs.AI cs.LG

    Scheming AIs: Will AIs fake alignment during training in order to get power?

    Authors: Joe Carlsmith

    Abstract: This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment"). I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an out… ▽ More

    Submitted 27 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 127 pages, 8 figures. Revised again to correct typos

  2. arXiv:2206.13353  [pdf, other

    cs.CY cs.AI cs.LG

    Is Power-Seeking AI an Existential Risk?

    Authors: Joseph Carlsmith

    Abstract: This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives… ▽ More

    Submitted 13 August, 2024; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: 57 pages, 1 figure. Edited to fix link to audio version, add links to short version and reviews, and fix a typo in section 2.1.2