-
When Can Proxies Improve the Sample Complexity of Preference Learning?
Authors:
Yuchen Zhu,
Daniel Augusto de Souza,
Zhengyan Shi,
Mengyue Yang,
Pasquale Minervini,
Alexander D'Amour,
Matt J. Kusner
Abstract:
We address the problem of reward hacking, where maximising a proxy reward does not necessarily increase the true reward. This is a key concern for Large Language Models (LLMs), as they are often fine-tuned on human preferences that may not accurately reflect a true objective. Existing work uses various tricks such as regularisation, tweaks to the reward model, and reward hacking detectors, to limi…
▽ More
We address the problem of reward hacking, where maximising a proxy reward does not necessarily increase the true reward. This is a key concern for Large Language Models (LLMs), as they are often fine-tuned on human preferences that may not accurately reflect a true objective. Existing work uses various tricks such as regularisation, tweaks to the reward model, and reward hacking detectors, to limit the influence that such proxy preferences have on a model. Luckily, in many contexts such as medicine, education, and law, a sparse amount of expert data is often available. In these cases, it is often unclear whether the addition of proxy data can improve policy learning. We outline a set of sufficient conditions on proxy feedback that, if satisfied, indicate that proxy data can provably improve the sample complexity of learning the ground truth policy. These conditions can inform the data collection process for specific tasks. The result implies a parameterisation for LLMs that achieves this improved sample complexity. We detail how one can adapt existing architectures to yield this improved sample complexity.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
On the Independence Assumption in Neurosymbolic Learning
Authors:
Emile van Krieken,
Pasquale Minervini,
Edoardo M. Ponti,
Antonio Vergari
Abstract:
State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder op…
▽ More
State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.
△ Less
Submitted 7 June, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Learning Discrete Directed Acyclic Graphs via Backpropagation
Authors:
Andrew J. Wren,
Pasquale Minervini,
Luca Franceschi,
Valentina Zantedeschi
Abstract:
Recently continuous relaxations have been proposed in order to learn Directed Acyclic Graphs (DAGs) from data by backpropagation, instead of using combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore that direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation. Based on the…
▽ More
Recently continuous relaxations have been proposed in order to learn Directed Acyclic Graphs (DAGs) from data by backpropagation, instead of using combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore that direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation. Based on the architecture of Implicit Maximum Likelihood Estimation [I-MLE, arXiv:2106.01798], DAG-DB adopts a probabilistic approach to the problem, sampling binary adjacency matrices from an implicit probability distribution. DAG-DB learns a parameter for the distribution from the loss incurred by each sample, performing competitively using either of two fully discrete backpropagation techniques, namely I-MLE and Straight-Through Estimation.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
Authors:
Joe Stacey,
Pasquale Minervini,
Haim Dubossarsky,
Sebastian Riedel,
Tim Rocktäschel
Abstract:
Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversari…
▽ More
Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversarial training, but this can lead to learned sentence representations that still suffer from the same biases. We show that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. This approach produces more robust NLI models, outperforming previous de-biasing efforts when generalised to 12 other datasets (Belinkov et al., 2019a; Mahabadi et al., 2020). In addition, we find that the optimal number of adversarial classifiers depends on the dimensionality of the sentence representations, with larger sentence representations being more difficult to de-bias while benefiting from using a greater number of adversaries.
△ Less
Submitted 27 May, 2021; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Undersensitivity in Neural Reading Comprehension
Authors:
Johannes Welbl,
Pasquale Minervini,
Max Bartolo,
Pontus Stenetorp,
Sebastian Riedel
Abstract:
Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs. Most prior work on adversarial inputs studies oversensitivity: semantically invariant text perturbations that cause a model's prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity, where…
▽ More
Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs. Most prior work on adversarial inputs studies oversensitivity: semantically invariant text perturbations that cause a model's prediction to change when it should not. In this work we focus on the complementary problem: excessive prediction undersensitivity, where input text is meaningfully changed but the model's prediction does not, even though it should. We formulate a noisy adversarial attack which searches among semantic variations of the question for which a model erroneously predicts the same answer, and with even higher probability. Despite comprising unanswerable questions, both SQuAD2.0 and NewsQA models are vulnerable to this attack. This indicates that although accurate, models tend to rely on spurious patterns and do not fully consider the information specified in a question. We experiment with data augmentation and adversarial training as defences, and find that both substantially decrease vulnerability to attacks on held out data, as well as held out attack spaces. Addressing undersensitivity also improves results on AddSent and AddOneSent, and models furthermore generalise better when facing train/evaluation distribution mismatch: they are less prone to overly rely on predictive cues present only in the training set, and outperform a conventional model by as much as 10.9% F1.
△ Less
Submitted 15 February, 2020;
originally announced March 2020.
-
Neural Variational Inference For Estimating Uncertainty in Knowledge Graph Embeddings
Authors:
Alexander I. Cowen-Rivers,
Pasquale Minervini,
Tim Rocktaschel,
Matko Bosnjak,
Sebastian Riedel,
Jun Wang
Abstract:
Recent advances in Neural Variational Inference allowed for a renaissance in latent variable models in a variety of domains involving high-dimensional data. While traditional variational methods derive an analytical approximation for the intractable distribution over the latent variables, here we construct an inference network conditioned on the symbolic representation of entities and relation typ…
▽ More
Recent advances in Neural Variational Inference allowed for a renaissance in latent variable models in a variety of domains involving high-dimensional data. While traditional variational methods derive an analytical approximation for the intractable distribution over the latent variables, here we construct an inference network conditioned on the symbolic representation of entities and relation types in the Knowledge Graph, to provide the variational distributions. The new framework results in a highly-scalable method. Under a Bernoulli sampling framework, we provide an alternative justification for commonly used techniques in large-scale stochastic variational inference, which drastically reduce training time at a cost of an additional approximation to the variational lower bound. We introduce two models from this highly scalable probabilistic framework, namely the Latent Information and Latent Fact models, for reasoning over knowledge graph-based representations. Our Latent Information and Latent Fact models improve upon baseline performance under certain conditions. We use the learnt embedding variance to estimate predictive uncertainty during link prediction, and discuss the quality of these learnt uncertainty estimates. Our source code and datasets are publicly available online at https://github.com/alexanderimanicowenrivers/Neural-Variational-Knowledge-Graphs.
△ Less
Submitted 18 August, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Authors:
Pasquale Minervini,
Sebastian Riedel
Abstract:
Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In NLP, however, most example generation strategies produce input text by using known, pre-specified semantic transformations, requiring significant manual effort…
▽ More
Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In NLP, however, most example generation strategies produce input text by using known, pre-specified semantic transformations, requiring significant manual effort and in-depth understanding of the problem and domain. In this paper, we investigate the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference (NLI). We reduce the problem of identifying such adversarial examples to a combinatorial optimisation problem, by maximising a quantity measuring the degree of violation of such constraints and by using a language model for generating linguistically-plausible examples. Furthermore, we propose a method for adversarially regularising neural NLI models for incorporating background knowledge. Our results show that, while the proposed method does not always improve results on the SNLI and MultiNLI datasets, it significantly and consistently increases the predictive accuracy on adversarially-crafted datasets -- up to a 79.6% relative improvement -- while drastically reducing the number of background knowledge violations. Furthermore, we show that adversarial examples transfer among model architectures, and that the proposed adversarial training procedure improves the robustness of NLI models to adversarial examples.
△ Less
Submitted 26 August, 2018;
originally announced August 2018.
-
Jack the Reader - A Machine Reading Framework
Authors:
Dirk Weissenborn,
Pasquale Minervini,
Tim Dettmers,
Isabelle Augenstein,
Johannes Welbl,
Tim Rocktäschel,
Matko Bošnjak,
Jeff Mitchell,
Thomas Demeester,
Pontus Stenetorp,
Sebastian Riedel
Abstract:
Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of relat…
▽ More
Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.