Skip to main content

Showing 1–22 of 22 results for author: Dementieva, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04350  [pdf, ps, other

    cs.CL

    HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

    Authors: Naquee Rizwan, Seid Muhie Yimam, Daryna Dementieva, Florian Skupin, Tim Fischer, Daniil Moskovskiy, Aarushi Ajay Borkar, Robert Geislinger, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

    Abstract: Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like d… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.19543

  2. arXiv:2506.22623  [pdf, ps, other

    cs.CL cs.AI

    Temperature Matters: Enhancing Watermark Robustness Against Paraphrasing Attacks

    Authors: Badr Youbi Idrissi, Monica Millunzi, Amelia Sorrenti, Lorenzo Baraldi, Daryna Dementieva

    Abstract: In the present-day scenario, Large Language Models (LLMs) are establishing their presence as powerful instruments permeating various sectors of society. While their utility offers valuable support to individuals, there are multiple concerns over potential misuse. Consequently, some academic endeavors have sought to introduce watermarking techniques, characterized by the inclusion of markers within… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  3. arXiv:2505.23297  [pdf, ps, other

    cs.CL

    EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Fraser

    Abstract: While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce EmoBench-UA, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Moh… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  4. arXiv:2505.22327  [pdf, ps, other

    cs.CL cs.CY

    NLP for Social Good: A Survey of Challenges, Opportunities, and Responsible Deployment

    Authors: Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Rajarshi Ghosh, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Deniz Nazarova, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Dominik Stammbach, Anna Steinberg, David Tomás, Steven R Wilson, Bowen Yi, Jessica H Zhu , et al. (7 additional authors not shown)

    Abstract: Recent advancements in large language models (LLMs) have unlocked unprecedented possibilities across a range of applications. However, as a community, we believe that the field of Natural Language Processing (NLP) has a growing need to approach deployment with greater intentionality and responsibility. In alignment with the broader vision of AI for Social Good (Tomašev et al., 2020), this paper ex… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  5. arXiv:2505.14272  [pdf, ps, other

    cs.CL cs.CY cs.MM

    Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data

    Authors: Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser

    Abstract: Considering the importance of detecting hateful language, labeled hate speech data is expensive and time-consuming to collect, particularly for low-resource languages. Prior work has demonstrated the effectiveness of cross-lingual transfer learning and data augmentation in improving performance on tasks with limited labeled data. To develop an efficient and scalable cross-lingual transfer learning… ▽ More

    Submitted 24 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  6. arXiv:2505.06149  [pdf, other

    cs.CL cs.CY cs.MM

    Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study

    Authors: Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser

    Abstract: Despite growing interest in automated hate speech detection, most existing approaches overlook the linguistic diversity of online content. Multilingual instruction-tuned large language models such as LLaMA, Aya, Qwen, and BloomZ offer promising capabilities across languages, but their effectiveness in identifying hate speech through zero-shot and few-shot prompting remains underexplored. This work… ▽ More

    Submitted 24 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

  7. arXiv:2502.11926  [pdf, ps, other

    cs.CL

    BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

    Authors: Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine de Kock, Nirmal Surange, Daniela Teodorescu, Ibrahim Said Ahmad, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino D. M. A. Ali, Ilseyar Alimova, Vladimir Araujo, Nikolay Babakov, Naomi Baes, Ana-Maria Bucur, Andiswa Bukula, Guanqun Cao, Rodrigo Tufino Cardenas, Rendi Chevi, Chiamaka Ijeoma Chukwuneke, Alexandra Ciobotaru, Daryna Dementieva , et al. (23 additional authors not shown)

    Abstract: People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition--an umbrella term for several NLP tasks--impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which oft… ▽ More

    Submitted 29 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted at ACL2025 (Main)

  8. arXiv:2412.11691  [pdf, other

    cs.CL cs.AI

    Multilingual and Explainable Text Detoxification with Parallel Corpora

    Authors: Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, Alexander Panchenko

    Abstract: Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022, digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or no… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: COLING 2025, main conference, long

  9. arXiv:2408.10724  [pdf, other

    cs.CL

    Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian

    Authors: Cem Üyük, Danica Rovó, Shaghayegh Kolli, Rabia Varol, Georg Groh, Daryna Dementieva

    Abstract: In the era dominated by information overload and its facilitation with Large Language Models (LLMs), the prevalence of misinformation poses a significant threat to public discourse and societal well-being. A critical concern at present involves the identification of machine-generated news. In this work, we take a significant step by introducing a benchmark dataset designed for neural news detectio… ▽ More

    Submitted 4 November, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: EMNLP 2024 NLP4PI Workshop

  10. arXiv:2406.19543  [pdf, other

    cs.CL cs.SI

    Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

    Authors: Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

    Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  11. arXiv:2404.17841  [pdf, other

    cs.CL

    Toxicity Classification in Ukrainian

    Authors: Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, Georg Groh

    Abstract: The task of toxicity detection is still a relevant task, especially in the context of safe and fair LMs development. Nevertheless, labeled binary toxicity classification corpora are not available for all languages, which is understandable given the resource-intensive nature of the annotation process. Ukrainian, in particular, is among the languages lacking such resources. To our knowledge, there h… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted to WOAH, NAACL, 2024. arXiv admin note: text overlap with arXiv:2404.02043

  12. arXiv:2404.02043  [pdf, other

    cs.CL cs.AI

    Cross-lingual Text Classification Transfer: The Case of Ukrainian

    Authors: Daryna Dementieva, Valeriia Khylenko, Georg Groh

    Abstract: Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. To support further fair development of NLP models, exploring the possibilities of effective knowledge transfer to new languages is crucial. Ukrainian, in particular, stands as a language that still can benefit from the contin… ▽ More

    Submitted 4 February, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: COLING2025, main, short

  13. arXiv:2404.02037  [pdf, other

    cs.CL cs.AI

    MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register. Recently, text detoxification methods found their applications in various task such as detoxification of Large Language Models (LLMs) (Leong et al., 2023; He et al., 2024; Tang et al., 2023) and toxic speech combating in social networ… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL2024

  14. arXiv:2311.13937  [pdf, other

    cs.CL

    Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

    Authors: Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko

    Abstract: Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detox… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: AACL 2023, main conference, long paper

  15. arXiv:2305.08636  [pdf, other

    cs.CL cs.AI

    AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

    Authors: Adam Rydelek, Daryna Dementieva, Georg Groh

    Abstract: The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: One of the top solutions at the SemEval-2023 task "The Explainable Detection of Online Sexism"

  16. arXiv:2305.08625  [pdf, other

    cs.CL cs.AI

    Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models

    Authors: Daniel Schroter, Daryna Dementieva, Georg Groh

    Abstract: This paper presents the best-performing approach alias "Adam Smith" for the SemEval-2023 Task 4: "Identification of Human Values behind Arguments". The goal of the task was to create systems that automatically identify the values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling the models by selecting one global decisio… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: The winner of SemEval-2023 Task 4: "Identification of Human Values behind Arguments"

  17. arXiv:2303.03124  [pdf, other

    cs.CL cs.AI

    IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

    Authors: Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh

    Abstract: Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based in… ▽ More

    Submitted 2 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to AACL 2023 Demonstration systems Track

  18. arXiv:2211.14279  [pdf, other

    cs.CL cs.IR

    Multiverse: Multilingual Evidence for Fake News Detection

    Authors: Daryna Dementieva, Mikhail Kuimov, Alexander Panchenko

    Abstract: Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases. It is becoming essential to develop fake news detection technologies. While substantial work has been done in this direction, one of the limitations of the current approaches is that these models are focused only on one language and do not use multilingual information. I… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 24 pages, 10 figures, extended version of ACL SRW 2021 paper

  19. arXiv:2206.02252  [pdf, other

    cs.CL

    Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

    Authors: Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

    Abstract: Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting. Unlike previous works we aim to make large language models abl… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  20. arXiv:2204.08975  [pdf, ps, other

    cs.CL

    Detecting Text Formality: A Study of Text Classification Approaches

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation -- GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models… ▽ More

    Submitted 8 September, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Published at RANLP2023

  21. arXiv:2109.08914  [pdf, other

    cs.CL cs.LG

    Text Detoxification using Large Pre-trained Neural Models

    Authors: David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second… ▽ More

    Submitted 3 November, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted to the EMNLP 2021 conference

  22. arXiv:2105.09052  [pdf, other

    cs.CL cs.LG

    Methods for Detoxification of Texts for the Russian Language

    Authors: Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.