Skip to main content

Showing 1–7 of 7 results for author: Goldzycher, J

.
  1. arXiv:2501.10057  [pdf, other

    cs.CL

    MSTS: A Multimodal Safety Test Suite for Vision-Language Models

    Authors: Paul Röttger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma Jeretič, Flor Miriam Plaza-del-Arco, Donya Rooein, Patrick Schramowski, Anastassia Shaitarova, Xudong Shen, Richard Willats, Andrea Zugarini, Bertie Vidgen

    Abstract: Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created b… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

    Comments: under review

  2. arXiv:2406.08080  [pdf, other

    cs.CL cs.AI cs.CY

    AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection

    Authors: Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, Julia Neidhardt

    Abstract: Model interpretability in toxicity detection greatly profits from token-level annotations. However, currently such annotations are only available in English. We introduce a dataset annotated for offensive language detection sourced from a news forum, notable for its incorporation of the Austrian German dialect, comprising 4,562 user comments. In addition to binary offensiveness classification, we… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to Findings of the Association for Computational Linguistics: ACL 2024

    ACM Class: I.2.7

  3. arXiv:2406.03393  [pdf, other

    econ.GN

    Censorship in Democracy

    Authors: Marcel Caesmann, Janis Goldzycher, Matteo Grigoletto, Lorenz Gschwent

    Abstract: The spread of propaganda, misinformation, and biased narratives from autocratic regimes, especially on social media, is a growing concern in many democracies. Can censorship be an effective tool to curb the spread of such slanted narratives? In this paper, we study the European Union's ban on Russian state-led news outlets after the 2022 Russian invasion of Ukraine. We analyze 775,616 tweets from… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 25 pages, 8 figures, 5 tables

  4. arXiv:2403.19559  [pdf, other

    cs.CL

    Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

    Authors: Janis Goldzycher, Paul Röttger, Gerold Schneider

    Abstract: Hate speech detection models are only as good as the data they are trained on. Datasets sourced from social media suffer from systematic gaps and biases, leading to unreliable models with simplistic decision boundaries. Adversarial datasets, collected by exploiting model weaknesses, promise to fix this problem. However, adversarial data collection can be slow and costly, and individual annotators… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at NAACL 2024 (main conference)

  5. arXiv:2306.03907  [pdf, other

    cs.CL cs.CY

    CL-UZH at SemEval-2023 Task 10: Sexism Detection through Incremental Fine-Tuning and Multi-Task Learning with Label Descriptions

    Authors: Janis Goldzycher

    Abstract: The widespread popularity of social media has led to an increase in hateful, abusive, and sexist language, motivating methods for the automatic detection of such phenomena. The goal of the SemEval shared task \textit{Towards Explainable Detection of Online Sexism} (EDOS 2023) is to detect sexism in English social media posts (subtask A), and to categorize such posts into four coarse-grained sexism… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 11 pages, 4 figures, Accepted at The 17th International Workshop on Semantic Evaluation, ACL 2023

  6. arXiv:2306.03722  [pdf, other

    cs.CL cs.CY

    Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data

    Authors: Janis Goldzycher, Moritz Preisig, Chantal Amrhein, Gerold Schneider

    Abstract: Most research on hate speech detection has focused on English where a sizeable amount of labeled training data is available. However, to expand hate speech detection into more languages, approaches that require minimal training data are needed. In this paper, we test whether natural language inference (NLI) models which perform well in zero- and few-shot settings can benefit hate speech detection… ▽ More

    Submitted 10 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 15 pages, 7 figures, Accepted at the 7th Workshop on Online Abuse and Harms (WOAH), ACL 2023

  7. arXiv:2210.00910  [pdf, other

    cs.CL

    Hypothesis Engineering for Zero-Shot Hate Speech Detection

    Authors: Janis Goldzycher, Gerold Schneider

    Abstract: Standard approaches to hate speech detection rely on sufficient available hate speech annotations. Extending previous work that repurposes natural language inference (NLI) models for zero-shot text classification, we propose a simple approach that combines multiple hypotheses to improve English NLI-based zero-shot hate speech detection. We first conduct an error analysis for vanilla NLI-based zero… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Third Workshop on Threat, Aggression and Cyberbullying (COLING 2022)