Skip to main content

Showing 1–4 of 4 results for author: Tufa, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13754  [pdf, other

    cs.CL

    Grounding Toxicity in Real-World Events across Languages

    Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

    Abstract: Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Paper accepted for at The 29th International Conference on Natural Language & Information Systems (NLDB 2024)

  2. arXiv:2404.18810  [pdf, other

    cs.CL

    Unknown Script: Impact of Script on Cross-Lingual Transfer

    Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

    Abstract: Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolin… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Paper accepted to NAACL Student Research Workshop (SRW) 2024

  3. arXiv:2404.18726  [pdf, other

    cs.CL

    The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages

    Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

    Abstract: Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politic… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to TRAC 2024

  4. arXiv:2306.09642  [pdf, ps, other

    cs.CL cs.LG

    Cross-Domain Toxic Spans Detection

    Authors: Stefan F. Schouten, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov

    Abstract: Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons perform… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: NLDB 2023