Adversarial Texts with Gradient Methods

Gong, Zhitao; Wang, Wenlu; Li, Bo; Song, Dawn; Ku, Wei-Shinn

Computer Science > Computation and Language

arXiv:1801.07175 (cs)

This paper has been withdrawn by Zhitao Gong

[Submitted on 22 Jan 2018 (v1), last revised 24 Jan 2018 (this version, v2)]

Title:Adversarial Texts with Gradient Methods

Authors:Zhitao Gong, Wenlu Wang, Bo Li, Dawn Song, Wei-Shinn Ku

No PDF available, click to view other formats

Abstract:Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 newswires, we show that our framework can leverage gradient attacking methods to generate very high-quality adversarial texts that are only a few words different from the original texts. There are many cases where we can change one word to alter the label of the whole piece of text. We successfully incorporate FGM and DeepFool into our framework. In addition, we empirically show that WMD is closely related to the quality of adversarial texts.

Comments:	This work lacks some crucial details. After careful discussion, we decided to withdraw it temporarily and resubmit a full version afterward
Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:1801.07175 [cs.CL]
	(or arXiv:1801.07175v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1801.07175

Submission history

From: Zhitao Gong [view email]
[v1] Mon, 22 Jan 2018 16:19:52 UTC (230 KB)
[v2] Wed, 24 Jan 2018 19:54:27 UTC (1 KB) (withdrawn)

Computer Science > Computation and Language

Title:Adversarial Texts with Gradient Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Adversarial Texts with Gradient Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators