Skip to main content

Showing 1–24 of 24 results for author: Critch, A

.
  1. arXiv:2408.07892  [pdf, other

    cs.CY

    Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online

    Authors: Steven Adler, Zoë Hitzig, Shrey Jain, Catherine Brewer, Wayne Chang, Renée DiResta, Eddy Lazzarin, Sean McGregor, Wendy Seltzer, Divya Siddarth, Nouran Soliman, Tobin South, Connor Spelliscy, Manu Sporny, Varya Srivastava, John Bailey, Brian Christian, Andrew Critch, Ronnie Falcon, Heather Flanagan, Kim Hamilton Duffy, Eric Ho, Claire R. Leibowicz, Srikanth Nadhamuni, Alan Z. Rozenshtein , et al. (7 additional authors not shown)

    Abstract: Anonymity is an important principle online. However, malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes. With the advent of increasingly capable AI, bad actors can amplify the potential scale and effectiveness of their operations, intensifying the challenge of balancing anonymity and trustworthiness online. In this p… ▽ More

    Submitted 17 January, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 63 pages, 7 figures, 5 tables; minor additions to acknowledgments and wording changes for clarity; corrected typo; updated email address reference for author

  2. arXiv:2306.06924  [pdf, other

    cs.AI cs.CR cs.CY cs.LG

    TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

    Authors: Andrew Critch, Stuart Russell

    Abstract: While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive taxonomies are possible, and some are useful -- particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose act… ▽ More

    Submitted 14 June, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    MSC Class: 68T01 ACM Class: I.2.0

  3. arXiv:2208.07006  [pdf, ps, other

    cs.GT cs.LO cs.MA

    Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory

    Authors: Andrew Critch, Michael Dennis, Stuart Russell

    Abstract: It is increasingly possible for real-world agents, such as software-based agents or human institutions, to view the internal programming of other such agents that they interact with. For instance, a company can read the bylaws of another company, or one software system can read the source code of another. Game-theoretic equilibria between the designers of such agents are called \emph{program equil… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: 41 pages

    MSC Class: 93A14; 93A16; 91-08; 91A11; 91A35; 91A68; 91A44; 91B06; 91B41; 91B52 ACM Class: F.3.1; F.4.1; I.2.3; J.4

  4. arXiv:2207.10806  [pdf, other

    cs.CR cs.AI cs.CY

    WordSig: QR streams enabling platform-independent self-identification that's impossible to deepfake

    Authors: Andrew Critch

    Abstract: Deepfakes can degrade the fabric of society by limiting our ability to trust video content from leaders, authorities, and even friends. Cryptographically secure digital signatures may be used by video streaming platforms to endorse content, but these signatures are applied by the content distributor rather than the participants in the video. We introduce WordSig, a simple protocol allowing video p… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    MSC Class: 68P25; 68T01; 94A62 ACM Class: E.3; I.2; K.4

  5. arXiv:2207.03470  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

    Authors: Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

    Abstract: Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the comm… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

  6. arXiv:2111.06956  [pdf, other

    cs.LG

    Human irrationality: both bad and good for reward inference

    Authors: Lawrence Chan, Andrew Critch, Anca Dragan

    Abstract: Assuming humans are (approximately) rational enables robots to infer reward functions by observing human behavior. But people exhibit a wide array of irrationalities, and our goal with this work is to better understand the effect they can have on reward inference. The challenge with studying this effect is that there are many types of irrationality, with varying degrees of mathematical formalizati… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 12 pages, 10 figures

  7. arXiv:2110.08058  [pdf, other

    cs.LG cs.AI cs.NE

    Quantifying Local Specialization in Deep Neural Networks

    Authors: Shlomi Hod, Daniel Filan, Stephen Casper, Andrew Critch, Stuart Russell

    Abstract: A neural network is locally specialized to the extent that parts of its computational graph (i.e. structure) can be abstractly represented as performing some comprehensible sub-task relevant to the overall task (i.e. functionality). Are modern deep neural networks locally specialized? How can this be quantified? In this paper, we consider the problem of taking a neural network whose neurons are pa… ▽ More

    Submitted 7 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 21 pages, 6 figures. Code is available at https://github.com/thestephencasper/detecting_nn_modularity

  8. arXiv:2103.03386  [pdf, other

    cs.NE

    Clusterability in Neural Networks

    Authors: Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

    Abstract: The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 20 pages, 22 figures. arXiv admin note: text overlap with arXiv:2003.04881

  9. arXiv:2101.10305  [pdf, other

    cs.MA cs.AI

    Accumulating Risk Capital Through Investing in Cooperation

    Authors: Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell

    Abstract: Recent work on promoting cooperation in multi-agent learning has resulted in many methods which successfully promote cooperation at the cost of becoming more vulnerable to exploitation by malicious actors. We show that this is an unavoidable trade-off and propose an objective which balances these concerns, promoting both safety and long-term cooperation. Moreover, the trade-off between safety and… ▽ More

    Submitted 20 April, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

  10. arXiv:2012.14536  [pdf, other

    cs.GT cs.AI

    Multi-Principal Assistance Games: Definition and Collegial Mechanisms

    Authors: Arnaud Fickinger, Simon Zhuang, Andrew Critch, Dylan Hadfield-Menell, Stuart Russell

    Abstract: We introduce the concept of a multi-principal assistance game (MPAG), and circumvent an obstacle in social choice theory, Gibbard's theorem, by using a sufficiently collegial preference inference mechanism. In an MPAG, a single agent assists N human principals who may have widely different preferences. MPAGs generalize assistance games, also known as cooperative inverse reinforcement learning game… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: text overlap with arXiv:2007.09540

  11. arXiv:2012.02096  [pdf, other

    cs.LG cs.AI cs.MA

    Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

    Authors: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine

    Abstract: A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environmen… ▽ More

    Submitted 3 February, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

  12. arXiv:2011.00401  [pdf, other

    cs.LG cs.AI

    The MAGICAL Benchmark for Robust Imitation

    Authors: Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell

    Abstract: Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: NeurIPS 2020 conference paper (poster)

  13. arXiv:2008.02275  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Aligning AI With Shared Human Values

    Authors: Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, Jacob Steinhardt

    Abstract: We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable… ▽ More

    Submitted 17 February, 2023; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: ICLR 2021; the ETHICS dataset is available at https://github.com/hendrycks/ethics/

  14. arXiv:2006.04948  [pdf, other

    cs.CY cs.AI cs.LG

    AI Research Considerations for Human Existential Safety (ARCHES)

    Authors: Andrew Critch, David Krueger

    Abstract: Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species. In negative terms, we ask what existential risks humanity might face from AI development in the next century, and by what principles contemporary technical research might be directed to address those risks. A key… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    MSC Class: 68T01 ACM Class: I.2.0

  15. arXiv:2003.04881  [pdf, other

    cs.NE cs.LG

    Pruned Neural Networks are Surprisingly Modular

    Authors: Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

    Abstract: The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images. Our notion of modularity comes from the graph clustering literature: a "module" is a set of ne… ▽ More

    Submitted 7 February, 2022; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 25 pages, 12 figures

  16. arXiv:1912.01683  [pdf, other

    cs.AI

    Optimal Policies Tend to Seek Power

    Authors: Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

    Abstract: Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decisio… ▽ More

    Submitted 28 January, 2023; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted to NeurIPS 2021 as spotlight paper. 12 pages, 44 pages with appendices. Since the 2021 acceptance, we updated the paper to point out that optimal policies can be qualitatively divorced from real-world learned policies

  17. arXiv:1711.00363  [pdf, ps, other

    cs.AI

    Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

    Authors: Andrew Critch, Stuart Russell

    Abstract: It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i.e., a policy that cannot be improved upon for one agent without making sacrifices for another. A famous theorem of Harsanyi shows that, when the principals have a common prior on the outcome distributions of all policies, a Pareto-… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

    Comments: 10 pages. arXiv admin note: substantial text overlap with arXiv:1701.01302

  18. A Formal Approach to the Problem of Logical Non-Omniscience

    Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, Jessica Taylor

    Abstract: We present the logical induction criterion for computable algorithms that assign probabilities to every logical statement in a given formal language, and refine those probabilities over time. The criterion is motivated by a series of stock trading analogies. Roughly speaking, each logical sentence phi is associated with a stock that is worth $1 per share if phi is true and nothing otherwise, and w… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: In Proceedings TARK 2017, arXiv:1707.08250

    ACM Class: F.4.0; G.3

    Journal ref: EPTCS 251, 2017, pp. 221-235

  19. arXiv:1701.01302  [pdf, ps, other

    cs.AI cs.GT cs.LG

    Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

    Authors: Andrew Critch

    Abstract: Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine's policy will prioritize each player's interests over… ▽ More

    Submitted 13 May, 2017; v1 submitted 5 January, 2017; originally announced January 2017.

  20. arXiv:1609.03543  [pdf, ps, other

    cs.AI cs.LO math.LO math.PR

    Logical Induction

    Authors: Scott Garrabrant, Tsvi Benson-Tilsen, Andrew Critch, Nate Soares, Jessica Taylor

    Abstract: We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that o… ▽ More

    Submitted 7 December, 2020; v1 submitted 12 September, 2016; originally announced September 2016.

  21. arXiv:1602.04184  [pdf, ps, other

    cs.GT cs.LO

    Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents

    Authors: Andrew Critch

    Abstract: Löb's theorem and Gödel's theorems make predictions about the behavior of systems capable of self-reference with unbounded computational resources with which to write and evaluate proofs. However, in the real world, systems capable of self-reference will have limited memory and processing speed, so in this paper we introduce an effective version of Löb's theorem which is applicable given such boun… ▽ More

    Submitted 24 August, 2016; v1 submitted 12 February, 2016; originally announced February 2016.

    Comments: Corrected typos, added grant acknowledgement, updated citation style to author-year

  22. arXiv:1210.2812  [pdf, other

    quant-ph math-ph

    Algebraic Geometry of Matrix Product States

    Authors: Andrew Critch, Jason Morton

    Abstract: We quantify the representational power of matrix product states (MPS) for entangled qubit systems by giving polynomial expressions in a pure quantum state's amplitudes which hold if and only if the state is a translation invariant matrix product state or a limit of such states. For systems with few qubits, we give these equations explicitly, considering both periodic and open boundary conditions.… ▽ More

    Submitted 9 September, 2014; v1 submitted 10 October, 2012; originally announced October 2012.

    MSC Class: 81R05; 81R50; 20C35; 22E70; 13P25; 13A50; 14J70; 14J81; 14L30; 14Q15; 14R20

    Journal ref: SIGMA 10 (2014), 095, 10 pages

  23. arXiv:1206.0500  [pdf, ps, other

    math.AG stat.ML

    Binary hidden Markov models and varieties

    Authors: Andrew J. Critch

    Abstract: The technological applications of hidden Markov models have been extremely diverse and successful, including natural language processing, gesture recognition, gene sequencing, and Kalman filtering of physical measurements. HMMs are highly non-linear statistical models, and just as linear models are amenable to linear algebraic techniques, non-linear models are amenable to commutative algebra and a… ▽ More

    Submitted 3 September, 2012; v1 submitted 3 June, 2012; originally announced June 2012.

    MSC Class: 14Q15

  24. arXiv:1203.6431  [pdf, ps, other

    stat.ME

    A note on the proportionality between some consistency indices in the AHP

    Authors: Matteo Brunelli, Andrew Critch, Michele Fedrizzi

    Abstract: Analyzing the consistency of preferences is an important step in decision making with pairwise comparison matrices, and several indices have been proposed in order to estimate it. In this paper we prove the proportionality between some consistency indices in the framework of the Analytic Hierarchy Process. Knowing such equivalences eliminates redundancy in the consideration of evidence for consist… ▽ More

    Submitted 29 March, 2012; originally announced March 2012.

    Comments: 9 pages

    MSC Class: 90B50 (Primary) 13P25 (Secondary)