Understanding tables with intermediate pre-training

Eisenschlos, Julian Martin; Krichene, Syrine; Müller, Thomas

Computer Science > Computation and Language

arXiv:2010.00571 (cs)

[Submitted on 1 Oct 2020 (v1), last revised 5 Oct 2020 (this version, v2)]

Title:Understanding tables with intermediate pre-training

Authors:Julian Martin Eisenschlos, Syrine Krichene, Thomas Müller

View PDF

Abstract:Table entailment, the binary classification task of finding if a sentence is supported or refuted by the content of a table, requires parsing language and table structure as well as numerical and discrete reasoning. While there is extensive work on textual entailment, table entailment is less well studied. We adapt TAPAS (Herzig et al., 2020), a table-based BERT model, to recognize entailment. Motivated by the benefits of data augmentation, we create a balanced dataset of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning. This new data is not only useful for table entailment, but also for SQA (Iyyer et al., 2017), a sequential table QA task. To be able to use long examples as input of BERT models, we evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency at a moderate drop in accuracy. The different methods set the new state-of-the-art on the TabFact (Chen et al., 2020) and SQA datasets.

Comments:	Accepted to EMNLP Findings 2020
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2010.00571 [cs.CL]
	(or arXiv:2010.00571v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.00571

Submission history

From: Julian Eisenschlos [view email]
[v1] Thu, 1 Oct 2020 17:43:27 UTC (449 KB)
[v2] Mon, 5 Oct 2020 12:26:40 UTC (305 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

cs
cs.AI
cs.IR
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Syrine Krichene
Thomas Müller

export BibTeX citation

Computer Science > Computation and Language

Title:Understanding tables with intermediate pre-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Understanding tables with intermediate pre-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators