TweetNERD -- End to End Entity Linking Benchmark for Tweets

Mishra, Shubhanshu; Saini, Aman; Makki, Raheleh; Mehta, Sneha; Haghighi, Aria; Mollahosseini, Ali

Computer Science > Computation and Language

arXiv:2210.08129 (cs)

[Submitted on 14 Oct 2022]

Title:TweetNERD -- End to End Entity Linking Benchmark for Tweets

Authors:Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, Ali Mollahosseini

View PDF

Abstract:Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: this https URL under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at this https URL.

Comments:	19 pages, 2 figures. Accepted to Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2022. Data available at: this https URL under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
MSC classes:	68T50, 68T07
ACM classes:	I.2.7
Cite as:	arXiv:2210.08129 [cs.CL]
	(or arXiv:2210.08129v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.08129

Submission history

From: Shubhanshu Mishra [view email]
[v1] Fri, 14 Oct 2022 21:55:07 UTC (56 KB)

Computer Science > Computation and Language

Title:TweetNERD -- End to End Entity Linking Benchmark for Tweets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TweetNERD -- End to End Entity Linking Benchmark for Tweets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators