Skip to main content

Showing 1–10 of 10 results for author: Semmann, M

.
  1. arXiv:2506.12071  [pdf, ps, other

    cs.IR

    T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation

    Authors: Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann

    Abstract: While most financial documents contain a combination of textual and tabular information, robust Retrieval-Augmented Generation (RAG) systems are essential for effectively accessing and reasoning over such content to perform complex numerical tasks. This paper introduces T$^2$-RAGBench, a benchmark comprising 32,908 question-context-answer triples, designed to evaluate RAG methods on real-world fin… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  2. arXiv:2506.10789  [pdf, ps, other

    cs.CY cs.CL

    FASCIST-O-METER: Classifier for Neo-fascist Discourse Online

    Authors: Rudy Alexandro Garrido Veliz, Martin Semmann, Chris Biemann, Seid Muhie Yimam

    Abstract: Neo-fascism is a political and societal ideology that has been having remarkable growth in the last decade in the United States of America (USA), as well as in other Western societies. It poses a grave danger to democracy and the minorities it targets, and it requires active actions against it to avoid escalation. This work presents the first-of-its-kind neo-fascist coding scheme for digital disco… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  3. arXiv:2506.09947  [pdf, ps, other

    cs.CY cs.SI

    KI4Demokratie: An AI-Based Platform for Monitoring and Fostering Democratic Discourse

    Authors: Rudy Alexandro Garrido Veliz, Till Nikolaus Schaland, Simon Bergmoser, Florian Horwege, Somya Bansal, Ritesh Nahar, Martin Semmann, Jörg Forthmann, Seid Muhie Yimam

    Abstract: Social media increasingly fuel extremism, especially right-wing extremism, and enable the rapid spread of antidemocratic narratives. Although AI and data science are often leveraged to manipulate political opinion, there is a critical need for tools that support effective monitoring without infringing on freedom of expression. We present KI4Demokratie, an AI-based platform that assists journalists… ▽ More

    Submitted 16 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  4. arXiv:2505.20624  [pdf, ps, other

    cs.CL

    POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization

    Authors: Usman Naseem, Juan Ren, Saba Anwar, Sarah Kohail, Rudy Alexandro Garrido Veliz, Robert Geislinger, Aisha Jabr, Idris Abdulmumin, Laiba Qureshi, Aarushi Ajay Borkar, Maryam Ibrahim Mukhtar, Abinew Ali Ayele, Ibrahim Said Ahmad, Adem Ali, Martin Semmann, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam

    Abstract: Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multievent dataset with over 23k instances in seven languages from diverse online platforms and real-world events. Polarization is annotated along three axes: presence… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Preprint

  5. arXiv:2504.07643  [pdf, other

    cs.IR cs.CL cs.CV

    CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections

    Authors: Florian Schneider, Narges Baba Ahmadi, Niloufar Baba Ahmadi, Iris Vogel, Martin Semmann, Chris Biemann

    Abstract: In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learner… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  6. arXiv:2503.18485  [pdf, other

    cs.CL cs.LG

    Whispering in Amharic: Fine-tuning Whisper for Low-resource Language

    Authors: Dawit Ketema Gete, Bedru Yimam Ahmed, Tadesse Destaw Belay, Yohannes Ayana Ejigu, Sukairaj Hafiz Imam, Alemu Belay Tessema, Mohammed Oumer Adem, Tadesse Amare Belay, Robert Geislinger, Umma Aliyu Musa, Martin Semmann, Shamsuddeen Hassan Muhammad, Henning Schreiber, Seid Muhie Yimam

    Abstract: This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whisp… ▽ More

    Submitted 28 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  7. arXiv:2412.01549  [pdf, other

    cs.CY cs.SI

    Silenced Voices: Exploring Social Media Polarization and Women's Participation in Peacebuilding in Ethiopia

    Authors: Adem Chanie Ali, Seid Muhie Yimam, Martin Semmann, Abinew Ali Ayele, Chris Biemann

    Abstract: This exploratory study highlights the significant threats of social media polarization and weaponization in Ethiopia, analyzing the Northern Ethiopia (Tigray) War (November 2020 to November 2022) as a case study. It further uncovers the lack of effective digital peacebuilding initiatives. These issues particularly impact women, who bear a disproportionate burden in the armed conflict. These reperc… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  8. arXiv:2408.02854  [pdf, other

    cs.IR

    Creating a Taxonomy for Retrieval Augmented Generation Applications

    Authors: Irina Nikishina, Özge Sevgili, Mahei Manhai Li, Chris Biemann, Martin Semmann

    Abstract: In this research, we develop a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define retrieval augmented generation (RAG) applications, facilitating the adoption of this technology for different application domains. To the best of our knowledge, no holistic RAG application taxonomies have been developed so far. We employ the method foreign to ACL and th… ▽ More

    Submitted 18 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

  9. arXiv:2406.19543  [pdf, other

    cs.CL cs.SI

    Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

    Authors: Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

    Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  10. arXiv:2312.07878  [pdf

    econ.GN

    New Kids on the Block: On the impact of information retrieval on contextual resource integration patterns

    Authors: Martin Semmann, Mahei Manhei Li

    Abstract: The rise of new modes of interaction with AI skyrocketed the popularity, applicability, and amount of use cases. Despite this evolution, conceptual integration is falling behind. Studies suggest that there is hardly a systematization in using AI in organizations. Thus, by taking a service-dominant logic perspective, specifically, the concept of resource integration patterns, the most potent applic… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 5 pages, 5 figures