-
Great Models Think Alike and this Undermines AI Oversight
Authors:
Shashwat Goel,
Joschka Struber,
Ilze Amanda Auzina,
Karuna K Chandra,
Ponnurangam Kumaraguru,
Douwe Kiela,
Ameya Prabhu,
Matthias Bethge,
Jonas Geiping
Abstract:
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as "AI Oversight". We study how model similarity affects both aspects of AI oversight by proposing a probabilistic metric for LM similarity based on overlap in model mistakes. Using this metric,…
▽ More
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as "AI Oversight". We study how model similarity affects both aspects of AI oversight by proposing a probabilistic metric for LM similarity based on overlap in model mistakes. Using this metric, we first show that LLM-as-a-judge scores favor models similar to the judge, generalizing recent self-preference results. Then, we study training on LM annotations, and find complementary knowledge between the weak supervisor and strong student model plays a crucial role in gains from "weak-to-strong generalization". As model capabilities increase, it becomes harder to find their mistakes, and we might defer more to AI oversight. However, we observe a concerning trend -- model mistakes are becoming more similar with increasing capabilities, pointing to risks from correlated failures. Our work underscores the importance of reporting and correcting for model similarity, especially in the emerging paradigm of AI oversight.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Modulated Neural ODEs
Authors:
Ilze Amanda Auzina,
Çağatay Yıldız,
Sara Magliacane,
Matthias Bethge,
Efstratios Gavves
Abstract:
Neural ordinary differential equations (NODEs) have been proven useful for learning non-linear dynamics of arbitrary trajectories. However, current NODE methods capture variations across trajectories only via the initial state value or by auto-regressive encoder updates. In this work, we introduce Modulated Neural ODEs (MoNODEs), a novel framework that sets apart dynamics states from underlying st…
▽ More
Neural ordinary differential equations (NODEs) have been proven useful for learning non-linear dynamics of arbitrary trajectories. However, current NODE methods capture variations across trajectories only via the initial state value or by auto-regressive encoder updates. In this work, we introduce Modulated Neural ODEs (MoNODEs), a novel framework that sets apart dynamics states from underlying static factors of variation and improves the existing NODE methods. In particular, we introduce $\textit{time-invariant modulator variables}$ that are learned from the data. We incorporate our proposed framework into four existing NODE variants. We test MoNODE on oscillating systems, videos and human walking trajectories, where each trajectory has trajectory-specific modulation. Our framework consistently improves the existing model ability to generalize to new dynamic parameterizations and to perform far-horizon forecasting. In addition, we verify that the proposed modulator variables are informative of the true unknown factors of variation as measured by $R^2$ scores.
△ Less
Submitted 13 November, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
ABC-Di: Approximate Bayesian Computation for Discrete Data
Authors:
Ilze Amanda Auzina,
Jakub M. Tomczak
Abstract:
Many real-life problems are represented as a black-box, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables likelihood-free inference problems can be solved by a group of methods under the name of Approximate Bayesian Computation (ABC). However, a similar approach for discrete random var…
▽ More
Many real-life problems are represented as a black-box, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables likelihood-free inference problems can be solved by a group of methods under the name of Approximate Bayesian Computation (ABC). However, a similar approach for discrete random variables is yet to be formulated. Here, we aim to fill this research gap. We propose to use a population-based MCMC ABC framework. Further, we present a valid Markov kernel, and propose a new kernel that is inspired by Differential Evolution. We assess the proposed approach on a problem with the known likelihood function, namely, discovering the underlying diseases based on a QMR-DT Network, and three likelihood-free inference problems: (i) the QMR-DT Network with the unknown likelihood function, (ii) learning binary neural network, and (iii) Neural Architecture Search. The obtained results indicate the high potential of the proposed framework and the superiority of the new Markov kernel.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.