-
T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation
Authors:
Jan Strich,
Enes Kutay Isgorur,
Maximilian Trescher,
Chris Biemann,
Martin Semmann
Abstract:
While most financial documents contain a combination of textual and tabular information, robust Retrieval-Augmented Generation (RAG) systems are essential for effectively accessing and reasoning over such content to perform complex numerical tasks. This paper introduces T$^2$-RAGBench, a benchmark comprising 32,908 question-context-answer triples, designed to evaluate RAG methods on real-world fin…
▽ More
While most financial documents contain a combination of textual and tabular information, robust Retrieval-Augmented Generation (RAG) systems are essential for effectively accessing and reasoning over such content to perform complex numerical tasks. This paper introduces T$^2$-RAGBench, a benchmark comprising 32,908 question-context-answer triples, designed to evaluate RAG methods on real-world financial data. Unlike typical QA datasets that operate under Oracle-context settings, where the relevant context is explicitly provided, T$^2$-RAGBench challenges models to first retrieve the correct context before conducting numerical reasoning. Existing QA datasets involving text and tables typically contain context-dependent questions, which may yield multiple correct answers depending on the provided context. To address this, we transform these datasets into a context-independent format, enabling reliable RAG evaluation. We conduct a comprehensive evaluation of popular RAG methods. Our analysis identifies Hybrid BM25, a technique that combines dense and sparse vectors, as the most effective approach for text-and-table data. However, results demonstrate that T$^2$-RAGBench remains challenging even for SOTA LLMs and RAG methods. Further ablation studies examine the impact of embedding models and corpus size on retrieval performance. T$^2$-RAGBench provides a realistic and rigorous benchmark for existing RAG methods on text-and-table data. Code and dataset are available online.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
FASCIST-O-METER: Classifier for Neo-fascist Discourse Online
Authors:
Rudy Alexandro Garrido Veliz,
Martin Semmann,
Chris Biemann,
Seid Muhie Yimam
Abstract:
Neo-fascism is a political and societal ideology that has been having remarkable growth in the last decade in the United States of America (USA), as well as in other Western societies. It poses a grave danger to democracy and the minorities it targets, and it requires active actions against it to avoid escalation. This work presents the first-of-its-kind neo-fascist coding scheme for digital disco…
▽ More
Neo-fascism is a political and societal ideology that has been having remarkable growth in the last decade in the United States of America (USA), as well as in other Western societies. It poses a grave danger to democracy and the minorities it targets, and it requires active actions against it to avoid escalation. This work presents the first-of-its-kind neo-fascist coding scheme for digital discourse in the USA societal context, overseen by political science researchers. Our work bridges the gap between Natural Language Processing (NLP) and political science against this phenomena. Furthermore, to test the coding scheme, we collect a tremendous amount of activity on the internet from notable neo-fascist groups (the forums of Iron March and Stormfront.org), and the guidelines are applied to a subset of the collected posts. Through crowdsourcing, we annotate a total of a thousand posts that are labeled as neo-fascist or non-neo-fascist. With this labeled data set, we fine-tune and test both Small Language Models (SLMs) and Large Language Models (LLMs), obtaining the very first classification models for neo-fascist discourse. We find that the prevalence of neo-fascist rhetoric in this kind of forum is ever-present, making them a good target for future research. The societal context is a key consideration for neo-fascist speech when conducting NLP research. Finally, the work against this kind of political movement must be pressed upon and continued for the well-being of a democratic society. Disclaimer: This study focuses on detecting neo-fascist content in text, similar to other hate speech analyses, without labeling individuals or organizations.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
KI4Demokratie: An AI-Based Platform for Monitoring and Fostering Democratic Discourse
Authors:
Rudy Alexandro Garrido Veliz,
Till Nikolaus Schaland,
Simon Bergmoser,
Florian Horwege,
Somya Bansal,
Ritesh Nahar,
Martin Semmann,
Jörg Forthmann,
Seid Muhie Yimam
Abstract:
Social media increasingly fuel extremism, especially right-wing extremism, and enable the rapid spread of antidemocratic narratives. Although AI and data science are often leveraged to manipulate political opinion, there is a critical need for tools that support effective monitoring without infringing on freedom of expression. We present KI4Demokratie, an AI-based platform that assists journalists…
▽ More
Social media increasingly fuel extremism, especially right-wing extremism, and enable the rapid spread of antidemocratic narratives. Although AI and data science are often leveraged to manipulate political opinion, there is a critical need for tools that support effective monitoring without infringing on freedom of expression. We present KI4Demokratie, an AI-based platform that assists journalists, researchers, and policymakers in monitoring right-wing discourse that may undermine democratic values. KI4Demokratie applies machine learning models to a large-scale German online data gathered on a daily basis, providing a comprehensive view of trends in the German digital sphere. Early analysis reveals both the complexity of tracking organized extremist behavior and the promise of our integrated approach, especially during key events.
△ Less
Submitted 16 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
Authors:
Usman Naseem,
Juan Ren,
Saba Anwar,
Sarah Kohail,
Rudy Alexandro Garrido Veliz,
Robert Geislinger,
Aisha Jabr,
Idris Abdulmumin,
Laiba Qureshi,
Aarushi Ajay Borkar,
Maryam Ibrahim Mukhtar,
Abinew Ali Ayele,
Ibrahim Said Ahmad,
Adem Ali,
Martin Semmann,
Shamsuddeen Hassan Muhammad,
Seid Muhie Yimam
Abstract:
Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multievent dataset with over 23k instances in seven languages from diverse online platforms and real-world events. Polarization is annotated along three axes: presence…
▽ More
Online polarization poses a growing challenge for democratic discourse, yet most computational social science research remains monolingual, culturally narrow, or event-specific. We introduce POLAR, a multilingual, multicultural, and multievent dataset with over 23k instances in seven languages from diverse online platforms and real-world events. Polarization is annotated along three axes: presence, type, and manifestation, using a variety of annotation platforms adapted to each cultural context. We conduct two main experiments: (1) we fine-tune six multilingual pretrained language models in both monolingual and cross-lingual setups; and (2) we evaluate a range of open and closed large language models (LLMs) in few-shot and zero-shot scenarios. Results show that while most models perform well on binary polarization detection, they achieve substantially lower scores when predicting polarization types and manifestations. These findings highlight the complex, highly contextual nature of polarization and the need for robust, adaptable approaches in NLP and computational social science. All resources will be released to support further research and effective mitigation of digital polarization globally.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections
Authors:
Florian Schneider,
Narges Baba Ahmadi,
Niloufar Baba Ahmadi,
Iris Vogel,
Martin Semmann,
Chris Biemann
Abstract:
In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learner…
▽ More
In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learners, educators, and researchers. CollEx addresses these limitations by employing state-of-the-art Large Vision-Language Models (LVLMs) as multimodal agents accessible through an intuitive chat interface. By abstracting complex interactions via specialized agents equipped with advanced tools, CollEx facilitates curiosity-driven exploration, significantly simplifying access to diverse scientific collections and records therein. Our system integrates textual and visual modalities, supporting educational scenarios that are helpful for teachers, pupils, students, and researchers by fostering independent exploration as well as scientific excitement and curiosity. Furthermore, CollEx serves the research community by discovering interdisciplinary connections and complementing visual data. We illustrate the effectiveness of our system through a proof-of-concept application containing over 64,000 unique records across 32 collections from a local scientific collection from a public university.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Whispering in Amharic: Fine-tuning Whisper for Low-resource Language
Authors:
Dawit Ketema Gete,
Bedru Yimam Ahmed,
Tadesse Destaw Belay,
Yohannes Ayana Ejigu,
Sukairaj Hafiz Imam,
Alemu Belay Tessema,
Mohammed Oumer Adem,
Tadesse Amare Belay,
Robert Geislinger,
Umma Aliyu Musa,
Martin Semmann,
Shamsuddeen Hassan Muhammad,
Henning Schreiber,
Seid Muhie Yimam
Abstract:
This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whisp…
▽ More
This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whispersmall-am, significantly improves when finetuned on a mix of existing FLEURS data and new, unseen Amharic datasets. Training solely on new data leads to poor performance, but combining it with FLEURS data reinforces the model, enabling better specialization in Amharic. We also demonstrate that normalizing Amharic homophones significantly enhances Word Error Rate (WER) and Bilingual Evaluation Understudy (BLEU) scores. This study underscores the importance of fine-tuning strategies and dataset composition for improving ASR in low-resource languages, providing insights for future Amharic speech recognition research.
△ Less
Submitted 28 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
Silenced Voices: Exploring Social Media Polarization and Women's Participation in Peacebuilding in Ethiopia
Authors:
Adem Chanie Ali,
Seid Muhie Yimam,
Martin Semmann,
Abinew Ali Ayele,
Chris Biemann
Abstract:
This exploratory study highlights the significant threats of social media polarization and weaponization in Ethiopia, analyzing the Northern Ethiopia (Tigray) War (November 2020 to November 2022) as a case study. It further uncovers the lack of effective digital peacebuilding initiatives. These issues particularly impact women, who bear a disproportionate burden in the armed conflict. These reperc…
▽ More
This exploratory study highlights the significant threats of social media polarization and weaponization in Ethiopia, analyzing the Northern Ethiopia (Tigray) War (November 2020 to November 2022) as a case study. It further uncovers the lack of effective digital peacebuilding initiatives. These issues particularly impact women, who bear a disproportionate burden in the armed conflict. These repercussions extend beyond the digital sphere, affecting women's socio-economic conditions, safety, and well-being. This reality was starkly evident during the war, where women faced gender-based and sexual violence. The research findings disclose the interface between social media polarization, conflict, and gender based violence. It also reveals the marginalization of women's voice in peacebuilding initiatives. This marginalization in peacebuilding efforts can be attributed to hostile online environments, the digital divide, cultural and societal norms, as well as top-down peace initiatives. The study highlights substantial gaps in leveraging digital media for sustainable peace and empowering women's participation. The unregulated landscape of social media in Ethiopia exacerbates these problems, necessitating heightened demands for accountability, especially from major social media platforms. The study recommends enhanced moderation and ethical considerations in algorithmic design gains traction, underlining the urgency for transparent and responsible social media frameworks. It is also recommended that digital peacebuilding initiatives should adopt a gender-sensitive and inclusive approach to address these complexities effectively and sustainably.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Creating a Taxonomy for Retrieval Augmented Generation Applications
Authors:
Irina Nikishina,
Özge Sevgili,
Mahei Manhai Li,
Chris Biemann,
Martin Semmann
Abstract:
In this research, we develop a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define retrieval augmented generation (RAG) applications, facilitating the adoption of this technology for different application domains. To the best of our knowledge, no holistic RAG application taxonomies have been developed so far. We employ the method foreign to ACL and th…
▽ More
In this research, we develop a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define retrieval augmented generation (RAG) applications, facilitating the adoption of this technology for different application domains. To the best of our knowledge, no holistic RAG application taxonomies have been developed so far. We employ the method foreign to ACL and thus contribute to the set of methods in the taxonomy creation. It comprises four iterative phases designed to refine and enhance our understanding and presentation of RAG's core dimensions. We have developed a total of five meta-dimensions and sixteen dimensions to comprehensively capture the concept of RAG applications. Thus, the taxonomy can be used to better understand RAG applications and to derive design knowledge for future solutions in specific application domains.
△ Less
Submitted 18 February, 2025; v1 submitted 5 August, 2024;
originally announced August 2024.
-
Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management
Authors:
Seid Muhie Yimam,
Daryna Dementieva,
Tim Fischer,
Daniil Moskovskiy,
Naquee Rizwan,
Punyajoy Saha,
Sarthak Roy,
Martin Semmann,
Alexander Panchenko,
Chris Biemann,
Animesh Mukherjee
Abstract:
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat…
▽ More
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale -- and suggesting more options of actions like detoxification, counter speech generation, blocking, or, as a final measure, human intervention. Through a thorough analysis of abusive speech regulations across diverse jurisdictions, platforms, and research papers we highlight the gap in preventing measures and advocate for tailored proactive steps to combat its multifaceted manifestations. Our work aims to inform future strategies for effectively addressing abusive speech online.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
New Kids on the Block: On the impact of information retrieval on contextual resource integration patterns
Authors:
Martin Semmann,
Mahei Manhei Li
Abstract:
The rise of new modes of interaction with AI skyrocketed the popularity, applicability, and amount of use cases. Despite this evolution, conceptual integration is falling behind. Studies suggest that there is hardly a systematization in using AI in organizations. Thus, by taking a service-dominant logic perspective, specifically, the concept of resource integration patterns, the most potent applic…
▽ More
The rise of new modes of interaction with AI skyrocketed the popularity, applicability, and amount of use cases. Despite this evolution, conceptual integration is falling behind. Studies suggest that there is hardly a systematization in using AI in organizations. Thus, by taking a service-dominant logic perspective, specifically, the concept of resource integration patterns, the most potent application of AI for organizational use - namely information retrieval - is analyzed. In doing so, we propose a systematization that can be applied to deepen understanding of core technical concepts, further investigate AI in contexts, and help explore research directions guided by SDL.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.