Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Authors:
Serra Sinem Tekiroglu,
Helena Bonaldi,
Margherita Fanton,
Marco Guerini
Abstract:
In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that…
▽ More
In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful `out of target' experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i.e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech
Authors:
Margherita Fanton,
Helena Bonaldi,
Serra Sinem Tekiroglu,
Marco Guerini
Abstract:
Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation,…
▽ More
Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
A New Open-Access Platform for Measuring and Sharing mTBI Data
Authors:
August G. Domel,
Samuel J. Raymond,
Chiara Giordano,
Yuzhe Liu,
Seyed Abdolmajid Yousefsani,
Michael Fanton,
Ileana Pirozzi,
Ali Kight,
Brett Avery,
Athanasia Boumis,
Tyler Fetters,
Simran Jandu,
William M Mehring,
Sam Monga,
Nicole Mouchawar,
India Rangel,
Eli Rice,
Pritha Roy,
Sohrab Sami,
Heer Singh,
Lyndia Wu,
Calvin Kuo,
Michael Zeineh,
Gerald Grant,
David B. Camarillo
Abstract:
Despite numerous research efforts, the precise mechanisms of concussion have yet to be fully uncovered. Clinical studies on high-risk populations, such as contact sports athletes, have become more common and give insight on the link between impact severity and brain injury risk through the use of wearable sensors and neurological testing. However, as the number of institutions operating these stud…
▽ More
Despite numerous research efforts, the precise mechanisms of concussion have yet to be fully uncovered. Clinical studies on high-risk populations, such as contact sports athletes, have become more common and give insight on the link between impact severity and brain injury risk through the use of wearable sensors and neurological testing. However, as the number of institutions operating these studies grows, there is a growing need for a platform to share these data to facilitate our understanding of concussion mechanisms and aid in the development of suitable diagnostic tools. To that end, this paper puts forth two contributions: 1) a centralized, open-source platform for storing and sharing head impact data, in collaboration with the Federal Interagency Traumatic Brain Injury Research informatics system (FITBIR), and 2) a deep learning impact detection algorithm (MiGNet) to differentiate between true head impacts and false positives for the previously biomechanically validated instrumented mouthguard sensor (MiG2.0), all of which easily interfaces with FITBIR. We report 96% accuracy using MiGNet, based on a neural network model, improving on previous work based on Support Vector Machines achieving 91% accuracy, on an out of sample dataset of high school and collegiate football head impacts. The integrated MiG2.0 and FITBIR system serve as a collaborative research tool to be disseminated across multiple institutions towards creating a standardized dataset for furthering the knowledge of concussion biomechanics.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.