Robustness of Explanation Methods for NLP Models

Atmakuri, Shriya; Chheda, Tejas; Kandula, Dinesh; Yadav, Nishant; Lee, Taesung; Tuinhof, Hessel

Computer Science > Computation and Language

arXiv:2206.12284 (cs)

[Submitted on 24 Jun 2022]

Title:Robustness of Explanation Methods for NLP Models

Authors:Shriya Atmakuri, Tejas Chheda, Dinesh Kandula, Nishant Yadav, Taesung Lee, Hessel Tuinhof

View PDF

Abstract:Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious manipulations. In this paper, we particularly aim to understand the robustness of explanation methods in the context of text modality. We provide initial insights and results towards devising a successful adversarial attack against text explanations. To our knowledge, this is the first attempt to evaluate the adversarial robustness of an explanation method. Our experiments show the explanation method can be largely disturbed for up to 86% of the tested samples with small changes in the input sentence and its semantics.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.12284 [cs.CL]
	(or arXiv:2206.12284v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2206.12284

Submission history

From: Taesung Lee [view email]
[v1] Fri, 24 Jun 2022 13:34:07 UTC (861 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-06

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Robustness of Explanation Methods for NLP Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Robustness of Explanation Methods for NLP Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators