Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Gatto, Joseph; Seegmiller, Parker; Sharif, Omar; Preum, Sarah M.

Computer Science > Computation and Language

arXiv:2403.03304 (cs)

[Submitted on 5 Mar 2024 (v1), last revised 12 Jun 2024 (this version, v2)]

Title:Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Authors:Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum

View PDF HTML (experimental)

Abstract:Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The need to model zero and few-shot roles (i.e. event roles with little to no training representation). In this work, we introduce two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data. Our highest performing methods provide a 16-pt increase in F1 score on extraction of zero shot role types.
To better facilitate analysis of cross-domain EAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that LLM-based augmentation can boost RDF1 performance by up to 11 F1 points compared to baseline methods.

Comments:	Paper in submission (8 pages)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2403.03304 [cs.CL]
	(or arXiv:2403.03304v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.03304

Submission history

From: Joseph Gatto [view email]
[v1] Tue, 5 Mar 2024 20:07:42 UTC (9,775 KB)
[v2] Wed, 12 Jun 2024 19:21:33 UTC (9,182 KB)

Computer Science > Computation and Language

Title:Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators