Implications of Annotation Artifacts in Edge Probing Test Datasets

Choudhury, Sagnik Ray; Kalra, Jushaan

Computer Science > Computation and Language

arXiv:2310.13856 (cs)

[Submitted on 20 Oct 2023]

Title:Implications of Annotation Artifacts in Edge Probing Test Datasets

Authors:Sagnik Ray Choudhury, Jushaan Kalra

View PDF

Abstract:Edge probing tests are classification tasks that test for grammatical knowledge encoded in token representations coming from contextual encoders such as large language models (LLMs). Many LLM encoders have shown high performance in EP tests, leading to conjectures about their ability to encode linguistic knowledge. However, a large body of research claims that the tests necessarily do not measure the LLM's capacity to encode knowledge, but rather reflect the classifiers' ability to learn the problem. Much of this criticism stems from the fact that often the classifiers have very similar accuracy when an LLM vs a random encoder is used. Consequently, several modifications to the tests have been suggested, including information theoretic probes. We show that commonly used edge probing test datasets have various biases including memorization. When these biases are removed, the LLM encoders do show a significant difference from the random ones, even with the simple non-information theoretic probes.

Comments:	Accepted CoNLL 2023, code: this https URL
Subjects:	Computation and Language (cs.CL)
ACM classes:	I.2.7
Cite as:	arXiv:2310.13856 [cs.CL]
	(or arXiv:2310.13856v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.13856

Submission history

From: Sagnik Ray Choudhury [view email]
[v1] Fri, 20 Oct 2023 23:19:35 UTC (187 KB)

Computer Science > Computation and Language

Title:Implications of Annotation Artifacts in Edge Probing Test Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Implications of Annotation Artifacts in Edge Probing Test Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators