Skip to main content

Showing 1–34 of 34 results for author: Barrón-Cedeño, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.20215  [pdf, ps, other

    cs.CL

    Dependency Parsing is More Parameter-Efficient with Normalization

    Authors: Paolo Gajo, Domenic Rosati, Hassan Sajjad, Alberto Barrón-Cedeño

    Abstract: Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-attention in Transformers, where scores are calculated for every pair of words in a sentence. However, unlike Transformer attention, biaffine scoring does not use normalization prior to taking the softmax of th… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2412.06144  [pdf, other

    cs.CL

    Hate Speech According to the Law: An Analysis for Effective Detection

    Authors: Katerina Korre, John Pavlopoulos, Paolo Gajo, Alberto Barrón-Cedeño

    Abstract: The issue of hate speech extends beyond the confines of the online realm. It is a problem with real-life repercussions, prompting most nations to formulate legal frameworks that classify hate speech as a punishable offence. These legal frameworks differ from one country to another, contributing to the big chaos that online platforms have to face when addressing reported instances of hate speech. W… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  3. arXiv:2411.07417  [pdf, other

    cs.CL

    Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains

    Authors: Katerina Korre, Arianna Muti, Federico Ruggeri, Alberto Barrón-Cedeño

    Abstract: Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of hate speech definitions encompassing 493 definitions from more than 100 cultures, drawn from five key domains: online d… ▽ More

    Submitted 20 May, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  4. arXiv:2409.02519  [pdf, other

    cs.CL cs.SI

    Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts

    Authors: Arianna Muti, Federico Ruggeri, Khalid Al-Khatib, Alberto Barrón-Cedeño, Tommaso Caselli

    Abstract: We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to fo… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2406.14099  [pdf, other

    cs.CL

    Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology

    Authors: Federico Ruggeri, Eleonora Misino, Arianna Muti, Katerina Korre, Paolo Torroni, Alberto Barrón-Cedeño

    Abstract: We introduce the Guideline-Centered Annotation Methodology (GCAM), a novel data annotation methodology designed to report the annotation guidelines associated with each data sample. Our approach addresses three key limitations of the standard prescriptive annotation methodology by reducing the information loss during annotation and ensuring adherence to guidelines. Furthermore, GCAM enables the ef… ▽ More

    Submitted 10 December, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.12399  [pdf, other

    cs.CL cs.AI cs.CY

    QueerBench: Quantifying Discrimination in Language Models Toward Queer Identities

    Authors: Mae Sosto, Alberto Barrón-Cedeño

    Abstract: With the increasing role of Natural Language Processing (NLP) in various applications, challenges concerning bias and stereotype perpetuation are accentuated, which often leads to hate speech and harm. Despite existing studies on sexism and misogyny, issues like homophobia and transphobia remain underexplored and often adopt binary perspectives, putting the safety of LGBTQIA+ individuals at high r… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2404.02681  [pdf, other

    cs.CL cs.AI

    PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets

    Authors: Arianna Muti, Federico Ruggeri, Cagri Toraman, Lorenzo Musetti, Samuel Algherini, Silvia Ronchi, Gianmarco Saretto, Caterina Zapparoli, Alberto Barrón-Cedeño

    Abstract: Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny a… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  8. arXiv:2305.18034  [pdf

    cs.CL

    A Corpus for Sentence-level Subjectivity Detection on English News Articles

    Authors: Francesco Antici, Andrea Galassi, Federico Ruggeri, Katerina Korre, Arianna Muti, Alessandra Bardi, Alice Fedotova, Alberto Barrón-Cedeño

    Abstract: We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying o… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: LREC-COLING 2024, pages 273-285

  9. arXiv:2109.15118  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SI

    Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims

    Authors: Tamer Elsayed, Preslav Nakov, Alberto Barrón-Cedeño, Maram Hasanain, Reem Suwaileh, Giovanni Da San Martino, Pepa Atanasova

    Abstract: We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy cla… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

    Comments: Check-worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Fake News Detection, Computational Journalism, Disinformation, Misinformation. arXiv admin note: text overlap with arXiv:2012.09263 by other authors

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: CLEF-2019

  10. arXiv:2109.12987  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

    Authors: Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Watheq Mansour, Bayan Hamdan, Zien Sheikh Ali, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, Thomas Mandl, Mucahid Kutlu, Yavuz Selim Kartal

    Abstract: We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Detecting Previously Fact-Checked Claims, Social Media Verification, Computational Journalism, COVID-19

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: CLEF-2021

  11. arXiv:2103.07769  [pdf, other

    cs.AI cs.CL cs.CR cs.IR cs.LG

    Automated Fact-Checking for Assisting Human Fact-Checkers

    Authors: Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, Giovanni Da San Martino

    Abstract: The reporting and the analysis of current events around the globe has expanded from professional, editor-lead journalism all the way to citizen journalism. Nowadays, politicians and other key players enjoy direct access to their audiences through social media, bypassing the filters of official cables or traditional media. However, the multiple advantages of free speech and direct communication are… ▽ More

    Submitted 22 May, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: fact-checking, fact-checkers, check-worthiness, detecting previously fact-checked claims, evidence retrieval

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: IJCAI-2021

  12. arXiv:2009.02696  [pdf, other

    cs.CL cs.CY

    SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles

    Authors: G. Da San Martino, A. Barrón-Cedeño, H. Wachsmuth, R. Petrov, P. Nakov

    Abstract: We present the results and the main findings of SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. The task featured two subtasks. Subtask SI is about Span Identification: given a plain-text document, spot the specific text fragments containing propaganda. Subtask TC is about Technique Classification: given a specific text fragment, in the context of a full document, dete… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

    Comments: 37 pages, to be published in Proceedings of the 14th International Workshop on Semantic Evaluation

  13. arXiv:2007.08024  [pdf, other

    cs.CL cs.IR cs.LG

    A Survey on Computational Propaganda Detection

    Authors: Giovanni Da San Martino, Stefano Cresci, Alberto Barron-Cedeno, Seunghak Yu, Roberto Di Pietro, Preslav Nakov

    Abstract: Propaganda campaigns aim at influencing people's mindset with the purpose of advancing a specific agenda. They exploit the anonymity of the Internet, the micro-profiling ability of social networks, and the ease of automatically creating and managing coordinated networks of accounts, to reach millions of social network users with persuasive messages, specifically targeted to topics each individual… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: propaganda detection, disinformation, misinformation, fake news, media bias

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: IJCAI-2020

  14. arXiv:2007.07997  [pdf, other

    cs.CL cs.IR cs.LG

    Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media

    Authors: Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, Zien Sheikh Ali

    Abstract: We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. Th… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Detecting Previously Fact-Checked Claims, Social Media Verification, Computational Journalism, COVID-19

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: CLEF-2020

  15. arXiv:2005.05854  [pdf, other

    cs.CL cs.IR cs.LG cs.NE

    Prta: A System to Support the Analysis of Propaganda Techniques in the News

    Authors: Giovanni Da San Martino, Shaden Shaar, Yifan Zhang, Seunghak Yu, Alberto Barrón-Cedeño, Preslav Nakov

    Abstract: Recent events, such as the 2016 US Presidential Campaign, Brexit and the COVID-19 "infodemic", have brought into the spotlight the dangers of online disinformation. There has been a lot of research focusing on fact-checking and disinformation detection. However, little attention has been paid to the specific rhetorical and psychological techniques used to convey propaganda messages. Revealing the… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: propaganda, disinformation, fake news, media bias, COVID-19

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: ACL-2020

  16. arXiv:2005.01177  [pdf, other

    cs.CL cs.IR

    Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction

    Authors: Cristina España-Bonet, Alberto Barrón-Cedeño, Lluís Màrquez

    Abstract: We propose an automatic language-independent graph-based method to build à-la-carte article collections on user-defined domains from the Wikipedia. The core model is based on the exploration of the encyclopaedia's category graph and can produce both monolingual and multilingual comparable collections. We run thorough experiments to assess the quality of the obtained corpora in 10 languages and 743… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: 26 pages, 8 figures, 6 tables

  17. arXiv:2001.08546  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media

    Authors: Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari

    Abstract: We describe the third edition of the CheckThat! Lab, which is part of the 2020 Cross-Language Evaluation Forum (CLEF). CheckThat! proposes four complementary tasks and a related task from previous lab editions, offered in English, Arabic, and Spanish. Task 1 asks to predict which tweets in a Twitter stream are worth fact-checking. Task 2 asks to determine whether a claim posted in a tweet can be v… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: Computational journalism, Check-worthiness, Fact-checking, Veracity, CLEF-2020 CheckThat! Lab

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: CLEF-2018 ECIR-2020

  18. arXiv:1912.08084  [pdf, other

    cs.CL cs.IR cs.LG

    A Context-Aware Approach for Detecting Check-Worthy Claims in Political Debates

    Authors: Pepa Gencheva, Ivan Koychev, Lluís Màrquez, Alberto Barrón-Cedeño, Preslav Nakov

    Abstract: In the context of investigative journalism, we address the problem of automatically identifying which claims in a given document are most worthy and should be prioritized for fact-checking. Despite its importance, this is a relatively understudied problem. Thus, we create a new dataset of political debates, containing statements that have been fact-checked by nine reputable sources, and we train m… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

    Comments: Check-worthiness; Fact-Checking; Veracity; Neural Networks. arXiv admin note: substantial text overlap with arXiv:1908.01328

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: RANLP-2017

  19. arXiv:1912.06810  [pdf, other

    cs.CL cs.IR cs.LG

    Proppy: A System to Unmask Propaganda in Online News

    Authors: Alberto Barrón-Cedeño, Giovanni Da San Martino, Israa Jaradat, Preslav Nakov

    Abstract: We present proppy, the first publicly available real-world, real-time propaganda detection system for online news, which aims at raising awareness, thus potentially limiting the impact of propaganda and helping fight disinformation. The system constantly monitors a number of news sources, deduplicates and clusters the news into events, and organizes the articles about an event on the basis of the… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

    Comments: propaganda, disinformation, fake news

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-2019)

  20. arXiv:1911.08755  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LO

    Global Thread-Level Inference for Comment Classification in Community Question Answering

    Authors: Shafiq Joty, Alberto Barrón-Cedeño, Giovanni Da San Martino, Simone Filice, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov

    Abstract: Community question answering, a recent evolution of question answering in the Web context, allows a user to quickly consult the opinion of a number of people on a particular topic, thus taking advantage of the wisdom of the crowd. Here we try to help the user by deciding automatically which answers are good and which are bad for a given question. In particular, we focus on exploiting the output st… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: community question answering, thread-level inference, graph-cut, inductive logic programming

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: EMNLP-2015

  21. arXiv:1910.09982  [pdf, other

    cs.CL cs.SI

    Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection

    Authors: Giovanni Da San Martino, Alberto Barrón-Cedeño, Preslav Nakov

    Abstract: We present the shared task on Fine-Grained Propaganda Detection, which was organized as part of the NLP4IF workshop at EMNLP-IJCNLP 2019. There were two subtasks. FLC is a fragment-level task that asks for the identification of propagandist text fragments in a news article and also for the prediction of the specific propaganda technique used in each such fragment (18-way classification task). SLC… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: propaganda, disinformation, fake news. arXiv admin note: text overlap with arXiv:1910.02517

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: NLP4IF@EMNLP-2019

  22. arXiv:1910.02517  [pdf, other

    cs.CL cs.AI cs.IR

    Fine-Grained Analysis of Propaganda in News Articles

    Authors: Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov

    Abstract: Propaganda aims at influencing people's mindset with the purpose of advancing a specific agenda. Previous work has addressed propaganda detection at the document level, typically labelling all articles from a propagandistic news outlet as propaganda. Such noisy gold labels inevitably affect the quality of any learning system trained on them. A further issue with most existing systems is the lack o… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: EMNLP-2019

  23. arXiv:1910.02028  [pdf, other

    cs.CL cs.IR

    Tanbih: Get To Know What You Are Reading

    Authors: Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov

    Abstract: We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various c… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: EMNLP-2019

  24. arXiv:1908.07912  [pdf, other

    cs.CL cs.AI

    It Takes Nine to Smell a Rat: Neural Multi-Task Learning for Check-Worthiness Prediction

    Authors: Slavena Vasileva, Pepa Atanasova, Lluís Màrquez, Alberto Barrón-Cedeño, Preslav Nakov

    Abstract: We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the s… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: Check-worthiness; Fact-Checking; Veracity; Multi-task Learning; Neural Networks. arXiv admin note: text overlap with arXiv:1908.01328

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: RANLP-2019

  25. arXiv:1908.01328  [pdf, other

    cs.CL cs.AI

    Automatic Fact-Checking Using Context and Discourse Information

    Authors: Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass

    Abstract: We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms o… ▽ More

    Submitted 4 August, 2019; originally announced August 2019.

    Comments: JDIQ,Special Issue on Combating Digital Misinformation and Disinformation

    Journal ref: J. Data and Information Quality, Volume 11 Issue 3, July 2019, Article No. 12

  26. arXiv:1904.03513  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

    Authors: Abdelrhman Saleh, Ramy Baly, Alberto Barrón-Cedeño, Giovanni Da San Martino, Mitra Mohtarami, Preslav Nakov, James Glass

    Abstract: In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging fro… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

    Comments: Hyperpartisanship, propaganda, news media, fake news, SemEval-2018

  27. arXiv:1809.03891  [pdf, other

    cs.CL

    Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

    Authors: Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, Maxim Romanov

    Abstract: Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    ACM Class: I.2.7

  28. arXiv:1808.05542  [pdf, other

    cs.CL

    Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness

    Authors: Pepa Atanasova, Alberto Barron-Cedeno, Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, Preslav Nakov

    Abstract: We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Comments: Computational journalism, Check-worthiness, Fact-checking, Veracity

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: CLEF-2018

  29. arXiv:1804.07587  [pdf, other

    cs.CL

    ClaimRank: Detecting Check-Worthy Claims in Arabic and English

    Authors: Israa Jaradat, Pepa Gencheva, Alberto Barron-Cedeno, Lluis Marquez, Preslav Nakov

    Abstract: We present ClaimRank, an online system for detecting check-worthy claims. While originally trained on political debates, the system can work for any kind of text, e.g., interviews or regular news articles. Its aim is to facilitate manual fact-checking efforts by prioritizing the claims that fact-checkers should consider first. ClaimRank supports both Arabic and English, it is trained on actual ann… ▽ More

    Submitted 20 April, 2018; originally announced April 2018.

    Comments: Check-worthiness; Fact-Checking; Veracity; Community-Question Answering; Neural Networks; Arabic; English

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: NAACL-2018

  30. arXiv:1803.03178  [pdf, ps, other

    cs.CL

    Fact Checking in Community Forums

    Authors: Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass

    Abstract: Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a ne… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

    Comments: AAAI-2018; Fact-Checking; Veracity; Community-Question Answering; Neural Networks; Distributed Representations

    MSC Class: 68T50 ACM Class: I.2.7

  31. arXiv:1710.01487  [pdf, other

    cs.CL

    Cross-Language Question Re-Ranking

    Authors: Giovanni Da San Martino, Salvatore Romeo, Alberto Barron-Cedeno, Shafiq Joty, Lluis Marquez, Alessandro Moschitti, Preslav Nakov

    Abstract: We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representations

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: SIGIR 2017: 1145-1148

  32. arXiv:1710.00341  [pdf, other

    cs.CL

    Fully Automated Fact Checking Using External Sources

    Authors: Georgi Karadzhov, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Ivan Koychev

    Abstract: Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. O… ▽ More

    Submitted 1 October, 2017; originally announced October 2017.

    Comments: RANLP-2017

    MSC Class: 68T50 ACM Class: I.2.7

  33. An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

    Authors: Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith

    Abstract: End-to-end neural machine translation has overtaken statistical machine translation in terms of translation quality for some language pairs, specially those with large amounts of parallel data. Besides this palpable improvement, neural networks provide several new properties. A single system can be trained to translate between many languages at almost no additional cost other than training time. F… ▽ More

    Submitted 15 November, 2017; v1 submitted 18 April, 2017; originally announced April 2017.

    Comments: 11 pages, 4 figures

    Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1340-1350, December 2017

  34. arXiv:1610.05522  [pdf, other

    cs.CL

    Addressing Community Question Answering in English and Arabic

    Authors: Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Alessandro Moschitti, Shafiq Joty, Fahad A. Al Obaidli, Kateryna Tymoshenko, Antonio Uva

    Abstract: This paper studies the impact of different types of features applied to learning to re-rank questions in community Question Answering. We tested our models on two datasets released in SemEval-2016 Task 3 on "Community Question Answering". Task 3 targeted real-life Web fora both in English and Arabic. Our models include bag-of-words features (BoW), syntactic tree kernels (TKs), rank features, embed… ▽ More

    Submitted 18 October, 2016; originally announced October 2016.

    Comments: presented at Second WebQA workshop, SIGIR2016 (http://plg2.cs.uwaterloo.ca/~avtyurin/WebQA2016/)

    ACM Class: I.2.7; H.3.4