Unsupervised Pretraining for Fact Verification by Language Model Distillation

Bazaga, Adrián; Liò, Pietro; Micklem, Gos

Computer Science > Computation and Language

arXiv:2309.16540 (cs)

[Submitted on 28 Sep 2023 (v1), last revised 6 Mar 2024 (this version, v3)]

Title:Unsupervised Pretraining for Fact Verification by Language Model Distillation

Authors:Adrián Bazaga, Pietro Liò, Gos Micklem

View PDF HTML (experimental)

Abstract:Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.

Comments:	ICLR 2024 Camera Ready
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2309.16540 [cs.CL]
	(or arXiv:2309.16540v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.16540

Submission history

From: Adrián Bazaga [view email]
[v1] Thu, 28 Sep 2023 15:53:44 UTC (606 KB)
[v2] Tue, 16 Jan 2024 15:36:40 UTC (609 KB)
[v3] Wed, 6 Mar 2024 20:12:01 UTC (609 KB)

Computer Science > Computation and Language

Title:Unsupervised Pretraining for Fact Verification by Language Model Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unsupervised Pretraining for Fact Verification by Language Model Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators