Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Huang, Shanshan; Zhu, Kenny Q.

Computer Science > Computation and Language

arXiv:2102.04632 (cs)

[Submitted on 9 Feb 2021]

Title:Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Authors:Shanshan Huang, Kenny Q. Zhu

View PDF

Abstract:Recent work has indicated that many natural language understanding and reasoning datasets contain statistical cues that may be taken advantaged of by NLP models whose capability may thus be grossly overestimated. To discover the potential weakness in the models, some human-designed stress tests have been proposed but they are expensive to create and do not generalize to arbitrary models. We propose a light-weight and general statistical profiling framework, ICQ (I-See-Cue), which automatically identifies possible biases in any multiple-choice NLU datasets without the need to create any additional test cases, and further evaluates through blackbox testing the extent to which models may exploit these biases.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2102.04632 [cs.CL]
	(or arXiv:2102.04632v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2102.04632

Submission history

From: Shanshan Huang [view email]
[v1] Tue, 9 Feb 2021 03:51:53 UTC (7,451 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-02

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shanshan Huang
Kenny Q. Zhu

export BibTeX citation

Computer Science > Computation and Language

Title:Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators