Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Qu, Xinghua; Sun, Zhu; Ong, Yew-Soon; Gupta, Abhishek; Wei, Pengfei

Computer Science > Machine Learning

arXiv:1911.03849 (cs)

[Submitted on 10 Nov 2019 (v1), last revised 29 Oct 2020 (this version, v5)]

Title:Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Authors:Xinghua Qu, Zhu Sun, Yew-Soon Ong, Abhishek Gupta, Pengfei Wei

View PDF

Abstract:Recent studies have revealed that neural network-based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation - with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining three key settings: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbation to only 1% frames.

Comments:	Accepted by IEEE Transactions on Cognitive and Developmental System
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1911.03849 [cs.LG]
	(or arXiv:1911.03849v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1911.03849

Submission history

From: Xinghua Qu [view email]
[v1] Sun, 10 Nov 2019 04:39:56 UTC (2,970 KB)
[v2] Fri, 22 Nov 2019 08:28:44 UTC (4,900 KB)
[v3] Fri, 21 Feb 2020 00:51:06 UTC (4,980 KB)
[v4] Fri, 6 Mar 2020 01:46:01 UTC (4,980 KB)
[v5] Thu, 29 Oct 2020 13:40:22 UTC (4,980 KB)

Computer Science > Machine Learning

Title:Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators