-
Protecting against simultaneous data poisoning attacks
Authors:
Neel Alex,
Shoaib Ahmed Siddiqui,
Amartya Sanyal,
David Krueger
Abstract:
Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model without subs…
▽ More
Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that simultaneously executed data poisoning attacks can effectively install multiple backdoors in a single model without substantially degrading clean accuracy. Furthermore, we show that existing backdoor defense methods do not effectively prevent attacks in this setting. Finally, we leverage insights into the nature of backdoor attacks to develop a new defense, BaDLoss, that is effective in the multi-attack setting. With minimal clean accuracy degradation, BaDLoss attains an average attack success rate in the multi-attack setting of 7.98% in CIFAR-10 and 10.29% in GTSRB, compared to the average of other defenses at 64.48% and 84.28% respectively.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Designing Artificial Intelligence Equipped Social Decentralized Autonomous Organizations for Tackling Sextortion Cases Version 0.7
Authors:
Norta Alex,
Makrygiannis Sotiris
Abstract:
With the rapid diffusion of social networks in combination with mobile phones, a new social threat of sextortion has emerged, in which vulnerable young women are essentially blackmailed with their explicit shared multimedia content. The phenomenon of sextortion is now widely studied by psychologists, sociologists, criminologists, etc. The findings have been translated into scattered help from NGOs…
▽ More
With the rapid diffusion of social networks in combination with mobile phones, a new social threat of sextortion has emerged, in which vulnerable young women are essentially blackmailed with their explicit shared multimedia content. The phenomenon of sextortion is now widely studied by psychologists, sociologists, criminologists, etc. The findings have been translated into scattered help from NGOs, specialized law enforcement units, and therapists, who usually do not coordinate their efforts among each other. This paper addresses the gap of lacking coordination systems to effectively and efficiently use modern information technologies that align the efforts of scattered and non-aligned sextortion help organizations. Consequently, this paper not only investigates the goals, incentives, and disincentives for a system design and development that not only governs effectively and efficiently diverse cases of sextortion victims, but also leverages artificial intelligence in a targeted manner. It explores how AI and, in particular, autonomous cognitive entities can improve victim profiles analysis, streamline support mechanisms, and provide intelligent insight into sextortion cases. Furthermore, the paper conceptually studies the extent to which such efforts can be monetized in a sustainable way. Following a novel design methodology for the design of trusted blockchain decentralized applications, the paper presents a set of conceptual requirements and system models based on which it is possible to deduce a best-practice technology stack for rapid implementation deployment.
△ Less
Submitted 15 January, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
An Empirical Investigation of Representation Learning for Imitation
Authors:
Xin Chen,
Sam Toyer,
Cody Wild,
Scott Emmons,
Ian Fischer,
Kuang-Huei Lee,
Neel Alex,
Steven H Wang,
Ping Luo,
Stuart Russell,
Pieter Abbeel,
Rohin Shah
Abstract:
Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific…
▽ More
Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data. Our Empirical Investigation of Representation Learning for Imitation (EIRLI) investigates whether similar benefits apply to imitation learning. We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites. In the settings we evaluate, we find that existing algorithms for image-based representation learning provide limited value relative to a well-tuned baseline with image augmentations. To explain this result, we investigate differences between imitation learning and other settings where representation learning has provided significant benefit, such as image classification. Finally, we release a well-documented codebase which both replicates our findings and provides a modular framework for creating new representation learning algorithms out of reusable components.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
RAFT: A Real-World Few-Shot Text Classification Benchmark
Authors:
Neel Alex,
Eli Lifland,
Lewis Tunstall,
Abhishek Thakur,
Pegah Maham,
C. Jess Riedel,
Emmie Hine,
Carolyn Ashurst,
Paul Sedille,
Alexis Carlier,
Michael Noetel,
Andreas Stuhlmüller
Abstract:
Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-wo…
▽ More
Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .
△ Less
Submitted 18 January, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
The MineRL BASALT Competition on Learning from Human Feedback
Authors:
Rohin Shah,
Cody Wild,
Steven H. Wang,
Neel Alex,
Brandon Houghton,
William Guss,
Sharada Mohanty,
Anssi Kanervisto,
Stephanie Milani,
Nicholay Topin,
Pieter Abbeel,
Stuart Russell,
Anca Dragan
Abstract:
The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have…
▽ More
The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations.
Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.